Structure Calculation Using CS-Rosetta

2011-02-27T14:14:59Z

Olange: /* References */

== '''Introduction''' ==

The CS-ROSETTA approach [1,2] combines the Monte Carlo based structure assembly program ROSETTA with empirical structural information obtained from backbone and 13Cβ chemical shift data.  The robust CS-ROSETTA protocol is capable of successfully predicting 3D protein structures up to 15 kDa in size [1].  A complete description of the program along with downloads are available from the Bax laboratory web site:

http://spin.niddk.nih.gov/bax/software/CSROSETTA/index.html

 

== '''CS-ROSETTA Protocol at UB''' ==

=== '''Random Coil Index Prediction''' ===

Perform flexible region prediction on the [http://wishart.biology.ualberta.ca/rci/cgi-bin/rci_cgi_1_e.py RCI Web Page].

RCI will take a bmrb file in the old format, as produced by CYANA 1.0 (the new BMRB format has an extra "chain" column). Unlike AutoStructure input file, the sequence field should be left in place.

Use an init.cya file: 
<pre> name:=XXXX # Replace XXXX with NESG ID
cyanalib # Read the standard library
pseudo=2 # Allows HB, HD, etc. pseudoatom names, use with CARA
read seq $name # Initialize</pre>
If you are using proton list from CARA, convert it first to "dyana" format with Cyana 2.1: 
<pre> read prot XXXX.prot
pseudo=0
translate dyana
write prot XXXX_dyana</pre>
Use cyana 1.0.5 to prepare a bmrb file:
<pre> read prot XXXX_dyana.prot
bmrblist XXXX.bmrb
</pre>
Make sure you change the <tt>_Chem_shift_ambiguity_type</tt> tag to <tt>_Chem_shift_ambiguity_code</tt>; RCI will report an error if you don't do it.

Flexible N- and C-terminal tails should be removed for CS-ROSETTA calculation to reduce CPU time. Flexible loop regions will later be excluded from calculation of all-atom energy.

=== '''Generating MFR fragments on U2 cluster at SUNY Buffalo''' ===

Copy the <tt>runCSRjob.com</tt> file into the working directory and change the number of fragments to be generated.

Type <tt>qsub runCSRjob.pbs</tt> to submit the job into queue. This calculation takes ~2 hours for 1000 fragments for a small protein, therefore it cannot be run on a master node.

=== '''Running CS-Rosetta on U2 cluster at SUNY Buffalo''' ===

Go to the <tt>rosetta</tt> subdirectory. Figure out how many parallel Rosetta jobs you will need to run. Things to consider are:

*The total number of fragments to be calculated
*The maximum wall-time for a single job is 72 h
*It takes ~10 min to calculate a single structure of a small protein on a single CPU

 Type <tt>./runRosetta.csh N</tt>, where =N= is the number of parallel Rosetta jobs

 

'''CS-ROSETTA Protocol at CABM'''

It is assumed that cs-rosetta2.3.0, rosetta2.3.0, NMRPipe-2008 are already installed and running in your cluster (see the [http://spin.niddk.nih.gov/bax/software/CSROSETTA/index.html Bax laboratory web site] for instructions).  In addition, the the following activation commands may need to be issued in your local shell: 

*For c-shell (csh, tcsh)
<pre>source /farm/software/NMRPipe-2008/com/nmrInit.linux9.com

source /farm/software/cs-rosetta2.3.0/com/csrosettaInit.com</pre>
*For bash shell (sh, bash)
<pre>source /farm/software/NMRPipe-2008/com/nmrInit.linux9.sh

source /farm/software/cs-rosetta2.3.0/com/csrosettaInit.sh
</pre>
=== Protocol for running CS-ROSETTA at CABM ===

Start from a chemical shift file in bmrb 2.1 format including the complete header.    Here is an example bmrb file in the correct format, the scripts are rather unforgiving of format inconsistencies. 

Chemical Shift rosetta uses TALOS format for the chemical shifts so first one needs to convert those into the right format. The right order of actions would be (the software mentioned is available through NMRPipe2008 and CS-rosetta and it will be accessible if the proper intialization was done –see preceeding paragraph). 
<pre>bmrb2talos.com BMRB_cs_file > prot_CS

runCSRjob.com prot_CS</pre>
this is really time-consuming (somewhere between 1 and 4 hours in master2) it will produce a directory called 'rosetta' and under it you will have 

■ aat000_03_05.200_v1_3 ■ aat000_09_05.200_v1_3 ■ paths.txt ■ runRosetta.com ■ t000_.fasta 

the last step will be to run the runRosetta script which contains the Rosetta run instruction code:
<pre>runRosetta.com
</pre>
this runs a single cpu job. In order to make use of the cluster a launching template called lzRosetta was created that sends the calculations over a computer cluster.  The command:
<pre>qsub lzRosetta
</pre>
submits the job.  Depending on the cluster usage, several instances of the above command can be launched to occupy as many cpu as possible.  Rosetta handles the output bookkeeping and increments the decoy counter automatically so that the chosen number of decoys are calculated by the available CPUs.  The number of decoys can be adjusted in the runRosetta script (e.g. -nstruct 1000). 

 

== '''Files for Download''' ==

[[Media:013008_ref_caps.bmrb|Input.bmrb]]:  Bmrb file in 2.1 format.  Input for bmrb2talos.com command.  Note the formatting.

[[Media:RrR43_CS.txt|Output_CS]]:  Chemical shift file produced by bmrb2talos.com command.

[[Media:LzRosetta.txt|lzRosetta]]:  Script for sending CS-Rosetta calculations to a cluster.

== '''References''' ==

[http://www.ncbi.nlm.nih.gov/pubmed/18326625?itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum&ordinalpos=2 1.  Shen, Y., Lange, O., Delaglio, F., Rossi, P., Aramini, J.M., Liu, G., Eletsky, A., Wu, Y., Singarapu, K.K., Lamak, A., Ignatchenko, A., Arrowsmith, C.H., Szyerpski, T., Montelione, G.T., Baker, D and Bax, A. (2008) Consistent blind protein structure generation from NMR chemical shift data. ''Proc. Natl Acad Sci. 105'', 4585-4590.]

[http://www.ncbi.nlm.nih.gov/pubmed/19034676?itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum&ordinalpos=2 2.  Shen, Y., Vernon, R., Baker, D. and Bax, A. (2009) De novo protein structure determination from incomplete chemical shift assignments.  J''. Biomol. NMR 43'', 63-78.]

[http://www.ncbi.nlm.nih.gov/pubmed/20133520?itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum&ordinalpos=2 2.  Raman S, Lange OF, Rossi P, Tyka M, Wang X, Aramini J, Liu G, Ramelot TA, Eletsky A, Szyperski T, Kennedy MA, Prestegard J, Montelione GT, Baker D. (2009) NMR structure determination for larger proteins using backbone-only data.  Science''. 327'', 1014-8.]

NESG Wiki - User contributions [en]

Structure Calculation Using CS-Rosetta