``It is God's privilege to conceal things, but the kings' pride is
to research them.''
(Proverbs 25:2; ascribed to King Solomon of Israel, ca. 1000 B.C.)
The protein folding problem entails the mathematical prediction of (tertiary, 3-dimensional) protein structure given the (primary, linear) structure defined by the sequence of amino acids of the protein. It is one of the most challenging problems in current biochemistry, and is a very rich source of interesting problems in mathematical modeling and numerical analysis, requiring an interplay of techniques in eigenvalue calculations, stiff differential equations, stochastic differential equations, local and global optimization, nonlinear least squares, multidimensional approximation of functions, design of experiment, and statistical classification of data. Even topological concepts like the Morse index and invariants in knot theory (Jones polynomials) have been discussed in this context. An extensive recent report from the U.S. National Research Council on the mathematical challenges from theoretical and computational chemistry shows the protein folding problem embedded into a large variety of other mathematical challenges in chemistry.
Molecular biology is mankind's attempt to figure out how God engineered His greatest invention - life. As with all great inventions, details are top secret; however, even top secrets may become known. I find it a great privilege to live in a time where God allows us to gain some insight into His construction plans, only a short step away from giving us the power to control life processes genetically. I hope it will be to the benefit of mankind, and not to its destruction.
This document is updated only sporadically.
Molecular modeling of proteins and mathematical prediction of
protein structure
``The aims of the present paper are to introduce mathematicians to the
subject, to provide enough background that the problems in the
mathematical modeling of proteins become transparent, to
expose the merits and deficiencies of current models, to describe
the numerical difficulties in structure prediction when a model
is specified, and to point out possible ways of improving model
formulation and prediction techniques.''
New techniques for the construction of residue potentials for protein
folding,
A smooth empirical potential is constructed for use in off-lattice
protein folding studies. Our potential is a function of the amino acid
labels and of the distances between the C(alpha) atoms of a protein.
The potential is a sum of smooth surface potential terms that model
solvent interactions and of pair potentials that are functions of a
distance, with a smooth cutoff at 12 Ångstrøm.
Techniques include the use of a fully automatic and reliable estimator
for smooth densities, of cluster analysis to group together amino
acid pairs with similar distance distributions, and of quadratic
programming to find appropriate weights with which the various terms
enter the total potential.
Hydrophobicity Analysis of Amino Acids
``Based on a principal component analysis of 47 published attempts to
quantify hydrophobicity in terms of a single scale,
we define a representation of the 20 amino acids as
points in a 3-dimensional hydrophobicity space and display it by means
of a minimal spanning tree. The dominant scale is found to be close to
two scales derived from contact potentials.''
Mathematics and Molecules
(movies and images on molecular modeling)
``The objectives of MathMol are: 1) to provide students, teachers and
the general public with information about the rapidly growing fields
of molecular modeling and related areas; 2) to provide K-12 students
with basic concepts in mathematics and their connection to molecular
modeling...''
The Principles of Protein Structure
An internet course with many links in the course material
A Guide to Structure Prediction (by Robert B. Russell)
A red thread for the practitioner as a guide through the available
techniques
Sisyphus and protein structure prediction (by Burkhard Rost)
Molecular Surfaces: A Review (Network Science, April 1996)
Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology (A report by the National Research Council USA)
Directory containing all PDB Entries (directory is huge!!)
PDB Structures: Summary Information (from University College London)
STRIDE, secondary structure assignment from atomic coordinates, based on H-bond patterns and mainchain dihedral angles (Frishman and Argos)
DEFINE_S secondary and first level supersecondary structure from C_alpha trace (Richards and Kundrot)
Amino Acids from Wikipedia
ProStar Decoy Library (local copy; the original site no longer exists)
DSSP database
with C_alpha coordinates, secondary structure assignment and surface
accessibility for all protein entries in the Protein Data Bank (PDB).
SWISS-PROT + TrEMBL non redundant database
WWW services for sequence analysis (Schneider and Rost)
Rotamer Libraries (Lovell et al.)
Dirichlet Mixtures and other Regularizers
CAFASP3, Critical Assessment of Fully Automated Prediction
EVA: EValuation of Automatic protein structure prediction
EVA measures for secondary structure prediction accuracy
EMBL MaxSprout server
``MaxSprout is a fast database algorithm for generating protein backbone
and side chain co-ordinates from a C(alpha) trace. The backbone is
assembled from fragments taken from known structures. Side chain
conformations are optimized in rotamer space using a rough potential
energy function to avoid clashes.''
LiveBench benchmarking program for protein prediction
ProStar, a pool of potentials, test decoys, and potential evaluation reviews
Decoys 'R' Us, database of computer generated conformations of protein sequences that possess some characteristics of native proteins, but are not biologically real.
Protein Structure Prediction Links
Protein Secondary Structure Prediction Servers and FTP Sites
commented Web Resources for Protein Scientists (by the Protein Society)
Pedro's BioMolecular Research Tools (frozen 1995, many links are no longer working)
The Amber Molecular Dynamics Package
Structural Classification of Proteins (SCOP),
European mirror site
``The scop database ... aims to provide a detailed and comprehensive
description of the structural and evolutionary relationships between
all proteins whose structure is known. As such, it provides a broad
survey of all known protein folds, detailed information about the
close relatives of any particular protein, and a framework for future
research and classification.''
RasMol
Molecular Visualization for X11 and Windows
WHAT IF molecular modelling package
``WHAT IF allows the molecular engineer to sit in front of a computer
terminal or better, a graphics workstation, and ask
questions that start with "What if ...."''
PyMOL molecular visualization system (open source)
Martin Karplus and CHARMM
Tamar Schlick (New York University)
Research groups interested in protein folding
WWW Chemistry Sites (67K, a list of UCLA)
NIH Computational Structural Biology (links to home pages of NIH researchers)
Pittsburgh Supercomputing Center Projects in Scientific Computing
(with several protein projects)
ExPASy Molecular Biology Server (analysis of protein and nucleic acid sequences)
France - Institute of Biology and Chemistry of Proteins
Fred Cohen Laboratory - University of California, San Francisco
(Langevin dynamics, secondary and tertiary structure prediction)
Birkbeck College, Department of Crystallography
University College London, Structure and Modelling Group
NMRWeb: Links to NMR information
Vienna RNA Secondary Structure Prediction and Comparison
Computational and Molecular Biology Initiative at Caltech
Genetic Algorithms and Protein Folding (a paper by S. Schulze-Kremer)
Arnold Neumaier (Arnold.Neumaier@univie.ac.at)