Structure Calculation Using CS-Rosetta: Difference between revisions

From NESG Wiki
Jump to navigation Jump to search
No edit summary
 
(19 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== '''Introduction'''  ==
== '''Introduction'''  ==
CS-ROSETTA is a framework for structure calculation of biological macromolecules on the basis of conformational information from NMR, which is build on top of the biomolecular modeling package ROSETTA. The name CS-ROSETTA for this branch of ROSETTA stems from its origin in combining backbone chemical shift (CS) data with ROSETTA structure prediction protocols, which allowed accurate predictions of 3D protein structures up to 15 kDa in size<ref>Y. Shen, O.F. Lange, F. Delaglio, P. Rossi, J.M. Aramini, G. Liu, A. Eletsky, Y. Wu, K.K. Singarapu, A. Lemak, A. Ignatchenko, C.H. Arrowsmith, T. Szyperski, G.T. Montelione, D. Baker, A. Bax, Consistent blind protein structure generation from NMR chemical shift data. Proceedings Of The National Academy Of Sciences Of The United States Of America, 2008, 105, 4685–4690.</ref> . The software package was later extended to include additional NMR conformational parameters, such as Residual Dipolar Couplings (RDC)<ref>S. Raman, O.F. Lange, P. Rossi, M. Tyka, X. Wang, J.M. Aramini, G. Liu, T.A. Ramelot, A. Eletsky, T. Szyperski, M.A. Kennedy, J. Prestegard, G.T. Montelione, D. Baker, NMR structure determination for larger proteins using backbone-only data. Science, 2010, 327, 1014–1018.</ref>, NOE distance restraints<ref>O.F. Lange, P. Rossi, N. G. Sgourakis, Y. Song, H. Lee, J. M. Aramini, A. Eretekin, R. Xiao, T. B. Acton, G. T. Montelione, and David Baker, <i>P. Natl. Acad. Sci. USA</i> EE (2012), 1-6</ref>, and pseudocontact chemical shifts (PCS)<ref>C. Schmitz, R. Vernon, G. Otting, D. Baker, T. Huber, Protein structure determination from pseudocontact shifts using ROSETTA. Journal of molecular biology, 2012, 416, 668–677.</ref>. The original CS-Rosetta protocol was based on a combination of ROSETTAs de-novo structure prediction protocol featuring Monte-Carlo assembly of molecular fragments and subsequent full-atom relax. Later an iterative protocol for '''R'''esolution '''A'''dapted '''S'''tructural '''Rec'''ombination ('''RASREC''') was developed<ref>O.F. Lange, D. Baker, Resolution-adapted recombination of structural features significantly improves sampling in restraint-guided structure calculation. Proteins: Structure, Function, and Bioinformatics, 2012, 80, 884–895. </ref> and shown to significantly extend the convergence of CS-ROSETTA towards larger structures. A combination of RASREC and sparse NOE distance restraints from ILV-labelled proteins was shown to allow accurate 3D structure determination for proteins up to 40kDa<ref>O.F. Lange, P. Rossi, N. G. Sgourakis, Y. Song, H. Lee, J. M. Aramini, A. Eretekin, R. Xiao, T. B. Acton, G. T. Montelione, and David Baker, <i>P. Natl. Acad. Sci. USA</i> EE (2012), 1-6</ref>.


The CS-ROSETTA approach (Ref. 1,2) combines the Monte Carlo based structure assembly program ROSETTA with empirical structural information obtained from backbone and <sup>13</sup>Cb chemical shift data.&nbsp; The robust CS-ROSETTA protocol is capable of successfully predicting 3D protein structures up to 15 kDa in size (Ref. 1).&nbsp; A complete description of the program along with downloads are available from the Bax laboratory web site:  
The software is freely available for academic use and can be licensed for commercial use ([http://www.csrosetta.org/installation installation guide]). A software [http://www.csrosetta.org/manual manual] and [http://www.csrosetta.org/tutorials tutorials] are provided on the supporting website [http://www.csrosetta.org www.csrosetta.org].


http://spin.niddk.nih.gov/bax/software/CSROSETTA/index.html
The ROSETTA software is written in C++. CS-Rosetta is distributed together with a [http://www.csrosetta.org/node/1151 toolbox] written in Python that facilitates preparation of input files, setting up of large-scale calculations and post-processing of simulation output. CS-ROSETTA calculations require a substantial computational effort and are usually carried out with 200-2000 parallel processes on Linux-Clusters using the Message Passing Interface (MPI) for communication.


<br>
== '''CS-ROSETTA Servers'''  ==
 
* http://condor.bmrb.wisc.edu/bbee/rosetta/  Server at BMRB


== '''CS-ROSETTA&nbsp;Protocol at UB'''  ==
== '''CS-ROSETTA&nbsp;Protocol at UB'''  ==
'''NOTE: the CS-ROSETTA protocol described here is outdated. With creation of the [http://www.csrosetta.org/node/1151 CS-Rosetta toolbox] the process has been greatly simplified. Follow instructions for installation and usage on [http://www.csrosetta.org www.csrosetta.org]
'''


=== '''Random Coil Index Prediction'''  ===
=== '''Random Coil Index Prediction'''  ===
Line 64: Line 70:
source /farm/software/cs-rosetta2.3.0/com/csrosettaInit.sh
source /farm/software/cs-rosetta2.3.0/com/csrosettaInit.sh
</pre>  
</pre>  
===  ===
=== Protocol for running CS-ROSETTA&nbsp;at CABM<br> ===


=== Protocol for running CS-ROSETTA&nbsp;at CABM<br>  ===
'''NOTE: the CS-ROSETTA protocol described here is outdated. With creation of the [http://www.csrosetta.org/node/1151 CS-Rosetta toolbox] the process has been greatly simplified. Follow instructions for installation and usage on [http://www.csrosetta.org www.csrosetta.org]
'''


Start from a chemical shift file in bmrb 2.1 format including the complete header. &nbsp;&nbsp; Here is an example bmrb file in the correct format, the scripts are rather unforgiving of format inconsistencies.<br>  
Start from a chemical shift file in bmrb 2.1 format including the complete header. &nbsp;&nbsp; Here is an example bmrb file in the correct format, the scripts are rather unforgiving of format inconsistencies.<br>  
Line 86: Line 93:
submits the job.&nbsp; Depending on the cluster usage, several instances of the above command can be launched to occupy as many cpu as possible.&nbsp; Rosetta handles the output bookkeeping and increments the decoy counter automatically so that the chosen number of decoys are calculated by the available CPUs.&nbsp; The number of decoys can be adjusted in the runRosetta script (e.g. -nstruct 1000).<br>  
submits the job.&nbsp; Depending on the cluster usage, several instances of the above command can be launched to occupy as many cpu as possible.&nbsp; Rosetta handles the output bookkeeping and increments the decoy counter automatically so that the chosen number of decoys are calculated by the available CPUs.&nbsp; The number of decoys can be adjusted in the runRosetta script (e.g. -nstruct 1000).<br>  


 
<br>


== '''Files for Download'''  ==
== '''Files for Download'''  ==


[[media:013008_ref_caps.bmrb|Input.bmrb]]:&nbsp; Bmrb file in 2.1 format.&nbsp; Input for bmrb2talos.com command.&nbsp; Note the formatting.  
[[Media:013008_ref_caps.bmrb|Input.bmrb]]:&nbsp; Bmrb file in 2.1 format.&nbsp; Input for bmrb2talos.com command.&nbsp; Note the formatting.  


Output_CS:&nbsp; Chemical shift file produced by bmrb2talos.com command.  
[[Media:RrR43_CS.txt|Output_CS]]:&nbsp; Chemical shift file produced by bmrb2talos.com command.  


lzRosetta:&nbsp; Script for sending CS-Rosetta calculations to a cluster.
[[Media:LzRosetta.txt|lzRosetta]]:&nbsp; Script for sending CS-Rosetta calculations to a cluster.  


== '''References'''  ==
== '''References'''  ==
 
<references/>
1.&nbsp; Shen, Y., Lange, O., Delaglio, F., Rossi, P., Aramini, J.M., Liu, G., Eletsky, A., Wu, Y., Singarapu, K.K., Lamak, A., Ignatchenko, A., Arrowsmith, C.H., Szyerpski, T., Montelione, G.T., Baker, D and Bax, A. (2008) Consistent blind protein structure generation from NMR chemical shift data. ''Proc. Natl Acad Sci. 105'', 4585-4590.
 
2.&nbsp; Shen, Y., Vernon, R., Baker, D. and Bax, A. (2009) De novo protein structure determination from incomplete chemical shift assignments.&nbsp; <span style="font-style: italic;">J</span>''. Biomol. NMR 43'', 63-78.
 
<br>
 
<br>
 
<br>  
 
-- AlexEletski - 17 Apr 2008
 
-- Updated by JimAramini - Nov 2009

Latest revision as of 12:43, 14 June 2012

Introduction

CS-ROSETTA is a framework for structure calculation of biological macromolecules on the basis of conformational information from NMR, which is build on top of the biomolecular modeling package ROSETTA. The name CS-ROSETTA for this branch of ROSETTA stems from its origin in combining backbone chemical shift (CS) data with ROSETTA structure prediction protocols, which allowed accurate predictions of 3D protein structures up to 15 kDa in size[1] . The software package was later extended to include additional NMR conformational parameters, such as Residual Dipolar Couplings (RDC)[2], NOE distance restraints[3], and pseudocontact chemical shifts (PCS)[4]. The original CS-Rosetta protocol was based on a combination of ROSETTAs de-novo structure prediction protocol featuring Monte-Carlo assembly of molecular fragments and subsequent full-atom relax. Later an iterative protocol for Resolution Adapted Structural Recombination (RASREC) was developed[5] and shown to significantly extend the convergence of CS-ROSETTA towards larger structures. A combination of RASREC and sparse NOE distance restraints from ILV-labelled proteins was shown to allow accurate 3D structure determination for proteins up to 40kDa[6].

The software is freely available for academic use and can be licensed for commercial use (installation guide). A software manual and tutorials are provided on the supporting website www.csrosetta.org.

The ROSETTA software is written in C++. CS-Rosetta is distributed together with a toolbox written in Python that facilitates preparation of input files, setting up of large-scale calculations and post-processing of simulation output. CS-ROSETTA calculations require a substantial computational effort and are usually carried out with 200-2000 parallel processes on Linux-Clusters using the Message Passing Interface (MPI) for communication.

CS-ROSETTA Servers

CS-ROSETTA Protocol at UB

NOTE: the CS-ROSETTA protocol described here is outdated. With creation of the CS-Rosetta toolbox the process has been greatly simplified. Follow instructions for installation and usage on www.csrosetta.org

Random Coil Index Prediction

Perform flexible region prediction on the RCI Web Page.

RCI will take a bmrb file in the old format, as produced by CYANA 1.0 (the new BMRB format has an extra "chain" column). Unlike AutoStructure input file, the sequence field should be left in place.

Use an init.cya file: 

	name:=XXXX             # Replace XXXX with NESG ID
	cyanalib                # Read the standard library
	pseudo=2              # Allows HB, HD, etc. pseudoatom names, use with CARA
	read seq $name        # Initialize

If you are using proton list from CARA, convert it first to "dyana" format with Cyana 2.1: 

	read prot XXXX.prot
	pseudo=0
	translate dyana
	write prot XXXX_dyana

Use cyana 1.0.5 to prepare a bmrb file:

	read prot XXXX_dyana.prot
	bmrblist XXXX.bmrb

Make sure you change the _Chem_shift_ambiguity_type tag to _Chem_shift_ambiguity_code; RCI will report an error if you don't do it.

Flexible N- and C-terminal tails should be removed for CS-ROSETTA calculation to reduce CPU time. Flexible loop regions will later be excluded from calculation of all-atom energy.

Generating MFR fragments on U2 cluster at SUNY Buffalo

Copy the runCSRjob.com file into the working directory and change the number of fragments to be generated.

Type qsub runCSRjob.pbs to submit the job into queue. This calculation takes ~2 hours for 1000 fragments for a small protein, therefore it cannot be run on a master node.

Running CS-Rosetta on U2 cluster at SUNY Buffalo

Go to the rosetta subdirectory. Figure out how many parallel Rosetta jobs you will need to run. Things to consider are:

  • The total number of fragments to be calculated
  • The maximum wall-time for a single job is 72 h
  • It takes ~10 min to calculate a single structure of a small protein on a single CPU


Type ./runRosetta.csh N, where =N= is the number of parallel Rosetta jobs


CS-ROSETTA Protocol at CABM

It is assumed that cs-rosetta2.3.0, rosetta2.3.0, NMRPipe-2008 are already installed and running in your cluster (see the Bax laboratory web site for instructions).  In addition, the the following activation commands may need to be issued in your local shell:

  • For c-shell (csh, tcsh)
source /farm/software/NMRPipe-2008/com/nmrInit.linux9.com

source /farm/software/cs-rosetta2.3.0/com/csrosettaInit.com
  • For bash shell (sh, bash)
source /farm/software/NMRPipe-2008/com/nmrInit.linux9.sh

source /farm/software/cs-rosetta2.3.0/com/csrosettaInit.sh

Protocol for running CS-ROSETTA at CABM

NOTE: the CS-ROSETTA protocol described here is outdated. With creation of the CS-Rosetta toolbox the process has been greatly simplified. Follow instructions for installation and usage on www.csrosetta.org

Start from a chemical shift file in bmrb 2.1 format including the complete header.    Here is an example bmrb file in the correct format, the scripts are rather unforgiving of format inconsistencies.

Chemical Shift rosetta uses TALOS format for the chemical shifts so first one needs to convert those into the right format. The right order of actions would be (the software mentioned is available through NMRPipe2008 and CS-rosetta and it will be accessible if the proper intialization was done –see preceeding paragraph).

bmrb2talos.com BMRB_cs_file > prot_CS

runCSRjob.com prot_CS

this is really time-consuming (somewhere between 1 and 4 hours in master2) it will produce a directory called 'rosetta' and under it you will have

■ aat000_03_05.200_v1_3
■ aat000_09_05.200_v1_3
■ paths.txt
■ runRosetta.com
■ t000_.fasta

the last step will be to run the runRosetta script which contains the Rosetta run instruction code:

runRosetta.com 

this runs a single cpu job. In order to make use of the cluster a launching template called lzRosetta was created that sends the calculations over a computer cluster.  The command:

qsub lzRosetta

submits the job.  Depending on the cluster usage, several instances of the above command can be launched to occupy as many cpu as possible.  Rosetta handles the output bookkeeping and increments the decoy counter automatically so that the chosen number of decoys are calculated by the available CPUs.  The number of decoys can be adjusted in the runRosetta script (e.g. -nstruct 1000).


Files for Download

Input.bmrb:  Bmrb file in 2.1 format.  Input for bmrb2talos.com command.  Note the formatting.

Output_CS:  Chemical shift file produced by bmrb2talos.com command.

lzRosetta:  Script for sending CS-Rosetta calculations to a cluster.

References

  1. Y. Shen, O.F. Lange, F. Delaglio, P. Rossi, J.M. Aramini, G. Liu, A. Eletsky, Y. Wu, K.K. Singarapu, A. Lemak, A. Ignatchenko, C.H. Arrowsmith, T. Szyperski, G.T. Montelione, D. Baker, A. Bax, Consistent blind protein structure generation from NMR chemical shift data. Proceedings Of The National Academy Of Sciences Of The United States Of America, 2008, 105, 4685–4690.
  2. S. Raman, O.F. Lange, P. Rossi, M. Tyka, X. Wang, J.M. Aramini, G. Liu, T.A. Ramelot, A. Eletsky, T. Szyperski, M.A. Kennedy, J. Prestegard, G.T. Montelione, D. Baker, NMR structure determination for larger proteins using backbone-only data. Science, 2010, 327, 1014–1018.
  3. O.F. Lange, P. Rossi, N. G. Sgourakis, Y. Song, H. Lee, J. M. Aramini, A. Eretekin, R. Xiao, T. B. Acton, G. T. Montelione, and David Baker, P. Natl. Acad. Sci. USA EE (2012), 1-6
  4. C. Schmitz, R. Vernon, G. Otting, D. Baker, T. Huber, Protein structure determination from pseudocontact shifts using ROSETTA. Journal of molecular biology, 2012, 416, 668–677.
  5. O.F. Lange, D. Baker, Resolution-adapted recombination of structural features significantly improves sampling in restraint-guided structure calculation. Proteins: Structure, Function, and Bioinformatics, 2012, 80, 884–895.
  6. O.F. Lange, P. Rossi, N. G. Sgourakis, Y. Song, H. Lee, J. M. Aramini, A. Eretekin, R. Xiao, T. B. Acton, G. T. Montelione, and David Baker, P. Natl. Acad. Sci. USA EE (2012), 1-6