Structure Calculation and Validation
Introduction
Protein structure determination by nuclear magnetic resonance (NMR) spectroscopy is a burdgeoning field of study that encompasses a wide variety of techniques and methodologies. In addition, validation of structures determined during and at the end of the structure determination process is critical to the accuracy of the final structures. Under Structure Calculation and Validation we describe the standard protocols for protein structure determination adopted by the NMR laboratories in the NESG. The section is broadly divided into 4 categories:
- Structure Calculation
- Structure Refinement
- Special Topics
- Structure Validation and Deposition
Structure Calculation
The Structure Calculation chapter features several sub-categories assigned on the basis of the program or approach used for protein structure calculation. Here is a brief description of each sub-category:
CYANA and AutoStructure
CYANA and AutoStructure are the two primary programs used in the NESG for initial protein structure calculation. CYANA is a torsion angle dynamics based approach and can be run with manually assigned NOEs and distance constraints or in fully automated NOESY assignment mode. Structures are computed in several cycles and structures those the lowest target function are retained at the conclusion of each cycle. AutoStructure uses a bottom-up approach and internal automated NOESYASSIGN module for iterative automated NOESY assignment. In each cycle of AutoStructure, distance and torsion angle constraints are fed into either CYANA or XPLOR for structure calculation. Again, structures qith the lowest target function or energy are collected for the subsequent cycle of calculations.
For automated structure calculations, required input for each program includes:
- protein sequence
- chemical shift assignment list
- NOESY peak lists
- torsional angle constraints (i.e., from TALOS)
- user defined options including: total number of structure calculated, number of "best" structures kept at the end of each cycle.
Optional input for the programs includes:
- manual distance constraints
- hydrogen bond constraints
- AutoStructure can also interpret J-coupling and slow N-H exchange data in its initial secondary structure and fold analysis
- The newest version of CYANA (3.0) can use orientational constraints (i.e., residual dipolar couplings); CYANA is also preferable for dimer structure calculations
In general, central to the use of these programs is the complete or near complete assignment of resonances in the protein of interest as well as careful analysis and peak picking of NOESY (2D, 3D, 4D) spectra.
The lab at UB has also explored a so-called consensus approach, where NOESY-based distance constraints from CYANA and AutoStructure are combined in a consensus fashion, and further refined in an iterative manner.
Computational Methods: Rosetta
In collaboration with the Bax and Baker laboratories, one of the fruitful areas of development in the NESG in recent years has been the use of Rosetta for structure refinement and structure calculation. Central to the philosophy of this area development is the use of minimal, rapidly obtained experimental data (i.e., backbone chemical shift assignments) for accurate structure determination. The Rosetta-based techniques fall into the following sub-caterogies:
CS-Rosetta: chemical shift-Rosetta. In this approach the user supplies backbone chemical shifts and the program calculates a user defined number of "decoys" which are then classified on the basis of their agrement with the experimental data. The approach is useful for small protein up to approximately 100-120 residues.
CS-DP-Rosetta: This is an extension of CS-Rosetta in which decoys are further filtered against the raw NOESY data. Here more complete resonance assignments are prefered, but the approach has been demonstrated to generate accurate structures in cases where CS-Rosetta alone fails.
CS-RDC-Rosetta: The next incremental development is the direct use of residual dipolar couplings as well as backbone (i.e., HN-HN) NOE-based disctance constraints to guide the CS-Rosetta calculations. This approach should be useful for proteins up to 200 residues. Again, only knowledge of the backbone resonances is required for this approach, meaning that accurate structures can in priniciple be obtained in a minimal period of time and circumventing the entire protein resonance assignment process.
Structure Refinement
After initial structure determination with CYANA or AutoStructure it is often desirable to further refine structures to optimal structure quality factors of the final structures. There are three main ways in the NESG for refining structures:
- CNS refinement using explicit water. This is easily accomplished using a special one-line script. Required input is the final coordinates and constraints from the initial structure calculations.
- Xplor-NIH refinement. A short molecular dynamics run and energy minimization.
- Rosetta refinement. This is an unrestrained refinement using the Rosetta force field.
Special Topics
Structure Validation and Quality Assessment
THIS PAGE IS UNDER CONSTRUCTION