Structure Calculation and Validation
Protein structure determination by nuclear magnetic resonance (NMR) spectroscopy is a burdgeoning field of study that encompasses a wide variety of techniques and methodologies. In addition, validation of structures determined during and at the end of the structure determination process is critical to the accuracy of the final structures. Under Structure Calculation and Validation we describe the standard protocols for protein structure determination adopted by the NMR laboratories in the NESG. The section is broadly divided into 4 categories:
- Structure Calculation
- Structure Refinement
- Special Topics
- Structure Validation and Deposition
The Structure Calculation chapter features several sub-categories assigned on the basis of the program or approach used for protein structure calculation. Here is a brief description of each sub-category:
CYANA and AutoStructure
CYANA and AutoStructure are the two primary programs used in the NESG for initial protein structure calculation. CYANA is a torsion angle dynamics based approach and can be run with manually assigned NOEs and distance constraints or in fully automated NOESY assignment mode. Structures are computed in several cycles and structures those the lowest target function are retained at the conclusion of each cycle. AutoStructure uses a bottom-up approach and internal automated NOESYASSIGN module for iterative automated NOESY assignment. In each cycle of AutoStructure, distance and torsion angle constraints are fed into either CYANA or XPLOR for structure calculation. Again, structures qith the lowest target function or energy are collected for the subsequent cycle of calculations.
For automated structure calculations, required input for each program includes:
- protein sequence
- chemical shift assignment list
- NOESY peak lists
- torsional angle constraints (i.e., from TALOS)
- user defined options including: total number of structure calculated, number of "best" structures kept at the end of each cycle.
Optional input for the programs includes:
- manual distance constraints
- hydrogen bond constraints
- AutoStructure can also interpret J-coupling and slow N-H exchange data in its initial secondary structure and fold analysis
- The newest version of CYANA (3.0) can use orientational constraints (i.e., residual dipolar couplings); CYANA is also preferable for dimer structure calculations
In general, central to the use of these programs is the complete or near complete assignment of resonances in the protein of interest as well as careful analysis and peak picking of NOESY (2D, 3D, 4D) spectra.
The lab at UB has also explored a so-called consensus approach, where NOESY-based distance constraints from CYANA and AutoStructure are combined in a consensus fashion, and further refined in an iterative manner.
Computational Methods: Rosetta
In collaboration with the Bax and Baker laboratories, one of the fruitful areas of development in the NESG in recent years has been the use of Rosetta for structure refinement and structure calculation. Central to the philosophy of this area development is the use of minimal, rapidly obtained experimental data (i.e., backbone chemical shift assignments) for accurate structure determination. The Rosetta-based techniques fall into the following sub-caterogies:
CS-Rosetta: chemical shift-Rosetta. In this approach the user supplies backbone chemical shifts and the program calculates a user defined number of "decoys" which are then classified on the basis of their agrement with the experimental data. The approach is useful for small protein up to approximately 100-120 residues.
CS-DP-Rosetta: This is an extension of CS-Rosetta in which decoys are further filtered against the raw NOESY data. Here more complete resonance assignments are prefered, but the approach has been demonstrated to generate accurate structures in cases where CS-Rosetta alone fails.
CS-RDC-Rosetta: The next incremental development is the direct use of residual dipolar couplings as well as backbone (i.e., HN-HN) NOE-based disctance constraints to guide the CS-Rosetta calculations. This approach should be useful for proteins up to 200 residues. Again, only knowledge of the backbone resonances is required for this approach, meaning that accurate structures can in priniciple be obtained in a minimal period of time and circumventing the entire protein resonance assignment process.
After initial structure determination with CYANA or AutoStructure it is often desirable to further refine structures to optimal structure quality factors of the final structures. There are three main ways in the NESG for refining structures:
- CNS refinement using explicit water. This is easily accomplished using a special one-line script. Required input is the final coordinates and constraints from the initial structure calculations.
- Xplor-NIH refinement. A short molecular dynamics run and energy minimization.
- Rosetta refinement. This is an unrestrained refinement using the Rosetta force field.
Special topics in structure refinement include protocols for handling the following situations:
- protein:small ligand complexes
- proteins with metal ions
- residual dipolar couplings and the programs associated with RDC validation and structure refinement (REDCAT and REDCRAFT)
- paramganetic constraints
Structure Validation and Quality Assessment
There are several methods available for assessing the quality of protein structures determined in the NESG.
- PDBStat: this is a useful program for overlaying and classifying pdb enesmbles and preparing them for analysis and deposition
- PSVS: Protein Structure Validation Suite is a web server for analysis an validation of protein structures. Input required includes the coordinate file, constraints, and files for RPF analysis (optional) and chemical shift analysis (optional). PSVS delivers numerous structure quality metrics including: scores for Verify3D, ProsaII, Procheck and MolProbity, a Ramachandran analysis and a comprehensive constraint violations analysis. Generally, Z-scores for Procheck and MolProbity of -2 and above are desirable for a high quality protein NMR structure.
- RPF analysis: provides a measure of the agreement between a structure or ensemble of structures with the NOESY peak lists. DP scores of > 0.7 are acceptable for high quality protein structures
- MolProbity server: Developed by the Richardson laboratory, this server offers in-depth analysis of Ramachandran and rotamer outliers as well as severe atomic clashes. A new feature of the server is visulaization of RDCs.
- Deposition: Descriptions of ADIT-NMR and HarvestDB for PDB and BMRB deposition is also provided in this section. In addition, SPINS is used at Rutgers for archiving of NMR data for structure determination projects.