New Computational Algorithms Assisting Discovery of Drugs

BY JAN E. ODEGARD
Executive Director, CITI

In recent years, there has been considerable interest in the study of biomolecular interactions. One area of increasing interest and importance explores the interactions that occur when a ligand (atom, ion, or molecule) attaches itself to the docking site of another molecule (usually a larger protein) known as a receptor (much like the space shuttle docks with the space station). By understanding the interplay between a ligand and its receptor, scientists may one day be able to design drugs that would block the docking site of the receptor and thus disable the functional properties of the protein. Ultimately, this technology could lead to a cure of some of today's most devastating diseases.

Mile and SorensenResearchers affiliated with the Computer and Information Technology Institute (CITI) at Rice have partnered with colleagues at the School of Health Information Sciences at the University of Texas Health Science Center at Houston to develop new algorithms that assist in the study of biomolecular interactions. “What we are looking for are molecules that can block docking sites employed by disease related molecules,” said Mili Shah, a graduate student working on the problem in Computational and Applied Mathematics (CAAM) at Rice.

Proteins are composed of hundreds and sometimes thousands of atoms strung together much like the popular Rubik's Snake. The protein can take on different forms and functions but are nevertheless somewhat limited in the ways each bond (joint) can be moved. To deliver on the promise of designer-drugs, researchers need to gain a much deeper understanding of how proteins behave as a group and in particular how specific ligands bind to their receptors' docking site. The research team led by Danny Sorensen, the Noah Harding Professor and Department Chair of Computational and Applied Mathematics at Rice, focuses on new and promising computational algorithms that will isolate characteristics of the proteins in a more effective way. The team has developed new computational algorithms, using what is known as the Singular Value Decomposition (SVD), to help reduce the size of the search space from one having thousands of unknowns to one having a much smaller number of unknowns. They are able to do this while maintaining and even improving on the details necessary for understanding the molecular dynamics of the protein and hence the docking site of the receptor.

“The algorithms we have developed significantly compress the search we have to perform by focusing on the controlling behavior and preserving the symmetry of the protein while retaining the flexibility needed to gain deeper understanding of the biomolecular mechanisms involved,” said Danny Sorensen. “These algorithms will ultimately help reduce the time it takes to analyze proteins from days and months to minutes and hours using resources such as the Cray XD1 supercomputer at Rice.”

For many proteins, such as the EGF-EGFR complex (associated with organ morphogenesis, maintenance, and repair), there is an inherent form of symmetry that can be exploited by the algorithm. Using a traditional SVD approach would not be ideal because it does not preserve the symmetry of the molecule. Therefore, the team at Rice has created a Symmetry Preserving Singular Value Decomposition (SPSVD) that can compress the search space of the protein's motion substantially while preserving its essential symmetric movements. This reduction directly translates into a much faster algorithm that can be parallelized and run on a modern supercomputing cluster.

figure 1Using Rice’s Cray XD1, the team has tested both the SVD and SPSVD on the backbone of the EGF-EGFR protein complex and the resulting structure is shown Figure 1. Understanding the movements of this complex may provide answers to problems dealing with cancer treatment, organ repair, and cell production [Wells, 99].

figure 2EGF-EGFR is a relatively large protein complex, whose backbone alone consists of over 1,000 atoms. From the perspective of simulation, we are dealing with sizable data sets and running even the simplest computations on these molecular complexes with a powerful laptop or workstation would be nearly impossible. For instance, calculating the SVD using standard algorithms, such as QR, for a 2,000 frame (3,000 atom) trajectory on a 1.3 GHz PowerPC - a processor optimized for vector computations - takes five and a half hours. Extrapolating this time for a typical protein trajectory that would consist of 10,000 frames would take close to a month. Even if the large scale iterative eigensolver, ARPACK (http://www.caam.rice.edu/software/ARPACK), is used to calculate just the dominant 20 singular values and vectors, it would take four hours to compute a 2,000 frame trajectory and if extrapolated to 10,000 frames would take a day to compute. What is more, these are the best case estimates since we have not accounted for increased memory traffic and storage. However, using our large scale parallel iterative methods (P_ARPACK) to compute the 20 singular values and vectors on the Cray XD1 requires less than a minute.

 

The initial (cyan) and the final (blue) structure from the Molecular Dynamics simulation are seen in the figure below. It is apparent from this figure that the major motion of the complex consists of contracting and expanding. By comparing the first major modes calculated from the SVD and SPSVD shown in figure below, we see that the SPSVD (left column) captures these motions more accurately when compared to the SVD (right column). We hypothesize that this is due to the fact that the SPSVD forces symmetry onto the major modes and as a result helps eliminate noise that may be introduced with traditional SVD calculations.

“We are very excited to see how the Cray XD1 has contributed to the initial results on the EGF-EGFR computations,” said Moshe Vardi, Karen Ostrum George Professor in Computational Engineering and the Director of the Computer and Information Technology Institute (CITI) at Rice. “These results are very encouraging and we are excited to see how the partnership between CITI and the Research Computing Support Group (RCSG) has helped our researchers be more successful through the use of large scale shared computational resources.”

figuresA-D


••••••••••

This work was debugged and tested on a small Cray XD1 funded by a Computing Research Infrastructure (CRI) grant (CNS 0454333) from the National Science Foundation.

Large scale runs were supported the Cray XD1 acquired with a grant from the Major Research Instrumentation Program from National Science Foundation (CNS-0421109) in a partnership with AMD and Cray. The system has 336 2.2 Giga Hertz Dual-Core AMD Opteron processors, a total 1.4 TB of memory, and in excess of 20 TB of disk storage. The system clocks in at about 3 TeraFlop. The Computer and Information Technology Institute (CITI) at Rice University led this acquisition from its inception, coordinating the proposal development, system procurement, and deployment. A team of more than 30 faculty members from Rice spanning Engineering, Natural Sciences, and Social Sciences participated in the effort to bring this system to Rice to support large-scale computing.

The Computer and Information Technology Institute (CITI) is a research-centric institute dedicated to the advancement of applied interdisciplinary research in the areas of computation and information technology.

The Research Computing Support Group (RCSG) is part of Rice’s IT Division providing support for shared computing infrastructure.