Structure Calculation Using CS-DP ROSETTA
The CS-DP-Rosetta approach  merges the ideas of model generation using CS-Rosetta with model filtering by agreement to NOESY data via the DP-score from the RPF program to generate high accuracy protein structures. This hybrid approach uses both local backbone chemical shift data (CS-Rosetta) and unassigned NOESY data (DP-filtering) to direct Rosetta trajectories toward the native structure, producing more accurate models than CS-Rosetta alone. Given a raw (or refined) NOESY peak list and chemical shift (backbone and extensive sidechain) information, the DP-Score is used as a filter to effectively guide the trajectory of CS-Rosetta decoy generation, significantly reducing the search space. Since the NOESY peak list data are not directly included in structure calculation, CS-DP-Rosetta is much more robust with respect to the quality of these peak lists compared to methods which attempt to assign each NOESY peak to one or more specific interproton interactions.
- Complete 1H, 13C, and 15N resonance assignments using either conventional triple resonance or GFT approaches.
- 3D 13C- and 15N-edited NOESY spectra
Peak picking procedures
Generating raw NOESY peaks lists by automatic peak picking:
- The 13C and 15N-edited 3D-NOESY raw peak lists are prepared systematically using the program SPARKY by automatic peak picking the 3D spectrum using the 2D 1H,13C and 1H,15N-HSQC as root spectra and 0.02 and 0.2 ppm as the pick-picking tolerances in the indirect 1H and heavy atom dimensions, respectively.
Generating refined NOESY peaks lists:
- Refined NOESY peak lists are obtained by expert manual editing which involves artifact removal, picking of overlapped peaks and(or) picking starting from a script-based 3D-HcCH intra/sequential-residue NOE list that was manually extended to the long range unpicked resonances.
Input for CS-DP-Rosetta:
- Protein sequence file (fasta format)
- chemical shift file (BMRB2.1 format)
- unassigned raw or refined NOESY peak lists files (Xeasy format or Sparky format).
Step 1: Generating ~50000 CS-Rosetta decoys
I. Generate 3-mer & 9-mer fragments library based on sequence and chemical shift information by Rosetta Fragments server or Rosetta software suite
II. Use CS-Rosetta protocol to generate ~50,000 decoys, keep 1,000 decoys with lowest Rosetta energy
Step 2: Filter decoys by a linear combination of Rosetta energy and DP-Score
I. Prepare sequence file and chemical shift file in BMRB format, and peak list files can be Sparky, Xeasy or any other table format. Use AutoStructure 2.2.1 GUI to make control file for RPF calculation. By default, tolerance for 13C and 15N is set to 0.5, and tolerance for the 1H dimensions is set to 0.05.
Make sure sequence file, chemical shift file and all the peak list files in the project directory
(2) Select File->New Control File, a “Control File Display” window would appear.
(3) In the “General Section” tab, input “Protein Name”, select “Sequence File”, “Chemical Shift File”, set “Iterative Analysis Cycles” to 1, then select tab “PeakList Section”, add peak list files one by one. Save the control file after all the peak lists have been added.
II. Calculate DP-Score for the 1000 decoys with lowest Rosetta energy using RPF module of AutoStructure 2.2.1.
$Autostructure_install_path/bin/autostructure -c control_file –o rpf_output_path -q path_of_query_structure -s
Extract DP-score from rpf_output_path/*.ovw file
III. Calculate DP-Score for the 1,000 decoys with lowest Rosetta energy using RPF module of AutoStructure 2.2.1
IV. Calculate target function for each decoy:
ti = (CS-Rosetta all-atom energy)i + 1000*(1 – DP-score)i
V. Rank 1000 decoys based on ti , keep the first 20 lowest ti decoys for further model rebuild-and-refinement
Step 3: Model rebuilding and refinement for 20 lowest ti decoys
I. Identifying flexible regions which have largest C-alpha deviations within the 20 lowest ti decoys
II. Flexible regions are stochastically rebuilt by fragment insertion and CCD loop closure
III. Using physically realistic Rosetta force field to perform all-atom refinement for the whole structure
IV. The best 10 models with the lowest ti are saved as the final CS-DP-Rosetta models
Rosetta command line arguments
I. First stage CS-Rosetta:
-in::file::frag3 aat000_03_05.200_v1_3.gz -in::file::frag9 aat000_09_05.200_v1_3.gz - abinitio::rg_reweight 0.5 -abinitio::rsd_wt_helix 0.5 -abinitio::rsd_wt_loop 0.5 - abinitio::use_filters false -abinitio::increase_cycles 10 -in::file::fasta t000_.fasta.gz - in::file::psipred_ss2 t000_.psipred_ss2.gz -abinitio::fastrelax -score::weights score13_env_hb -silent_gz
II. Second stage rebuild-and-refine
aa t000 _ -relax -looprlx -nstruct 10 -fa_input -use_sspair -farlx -ex1 -ex2 -random_loop - termini -short_range_hb_weight 0.50 -long_range_hb_weight 1.0 -farlx_cycle_ratio 0.4 - idl_no_chain_break -loop_skip_rate 0.0 -loop_file t000.loop_file.gz -vary_omega - output_silent_gz -output_chi_silent -l s.list
1. Raman, S., Huang, Y. J., Mao, B., Rossi, P., Aramini, J. M., Liu, G., Montelione, G. T., and Baker, D. (2010) Accurate automated protein NMR structure determination using unassigned NOESY data. J. Am. Chem. Soc. 132, 202-207.