Automated NOESY Assignment Using CYANA: Difference between revisions

From NESG Wiki
Jump to navigation Jump to search
(Created page with '== '''CYANA Run''' == Unfortunately, there is no comprehensive CYANA manual. Many features can be found in the original DYANA manual. For a summary of features consult the CYANA…')
 
No edit summary
 
(17 intermediate revisions by the same user not shown)
Line 1: Line 1:
== '''CYANA Run''' ==
== '''Introduction''' ==


Unfortunately, there is no comprehensive CYANA manual. Many features can be found in the original DYANA manual. For a summary of features consult the CYANA topic on this TWiki, or the CYANA mailing list archive (from the [http://www.las.jp/products/cyana/eg/index.html CYANA website])
Below is the description of how to run CYANA 2.1 for automated NOE assignment if you are working with CARA.  A tutorial for performing structure calculations with automated NOESY assignments using CYANA 3.0 is available [http://www.cyana.org/wiki/index.php/Structure_calculation_with_automated_NOESY_assignment on-line].


Below is the description of how to run CYANA 2.1 for automated NOE assignment if you are working with CARA.
== '''Input files'''  ==


=== '''Input files''' ===
Required files  


Required files
*Initialization file <tt>init.cya</tt>.  
* Initialization file <tt>init.cya</tt>.
*SequenceList in XEASY format - usually <tt>XXXX.seq</tt>, where XXXX is the NESG ID.  
* SequenceList in XEASY format - usually <tt>XXXX.seq</tt>, where XXXX is the NESG ID.
*AtomList in XEASY format <tt>XXXX.prot</tt> . Chemical shifts should be real, not folded. Make sure that you are using the most recent file. Atom labels should be swapped if using stereospecific assignments.  
* AtomList in XEASY format <tt>XXXX.prot</tt> . Chemical shifts should be real, not folded. Make sure that you are using the most recent file. Atom labels should be swapped if using stereospecific assignments.
*Separate unfolded PeakList for <sup>15</sup>N and <sup>13</sup>C NOESY: <tt>n.peaks</tt>, <tt>ali.peaks</tt>, <tt>aro.peaks</tt>.
* Separate unfolded PeakList for 15N and 13C NOESY: <tt>n.peaks</tt>, <tt>ali.peaks</tt>, <tt>aro.peaks</tt>.


Optional files
Optional files  
* Stereospecific assignment script (such as <tt>stereofound.cya</tt> from FOUND/HABAS). Note that this script should contain only <tt>atom stereo</tt> declarations, but no <tt>atom swap</tt> statements! Atom labels must be already swapped in the AtomList and external UPL files.
* External UPL files, such as <tt>short.upl</tt>. Atom labels should be swapped if using stereospecific assignments.
* External ACO files, such as <tt>gridsearch.aco</tt> output of FOUND/HABAS.


==== '''Format Conversion''' ====
*Stereospecific assignment script (such as <tt>stereofound.cya</tt> from FOUND/HABAS). Note that this script should contain only <tt>atom stereo</tt> declarations, but no <tt>atom swap</tt> statements! Atom labels must be already swapped in the AtomList and external UPL files.
*External UPL files, such as <tt>short.upl</tt>. Atom labels should be swapped if using stereospecific assignments.
*External ACO files, such as <tt>gridsearch.aco</tt> output of FOUND/HABAS.


The input files (sequence, atom list, ACOs and UPLs) must adhere to the IUPAC nomenclature used by CYANA 2.1 (i.e., <tt>H= instead of =HN</tt>, etc.). CARA is fully compatible with this nomenclature, while data from other programs may need to be converted.
== '''Format Conversion'''  ==


===== '''Conversion from XEASY/DYANA/CYANA 1.X''' =====
The input files (sequence, atom list, ACOs and UPLs) must adhere to the IUPAC nomenclature used by CYANA 2.1 (i.e., <tt>H instead of HN</tt>, etc.). CARA is fully compatible with this nomenclature, while data from other programs may need to be converted.  


For the automated / <tt>noesyassign</tt> runs of CYANA, please make sure that your chemical shift list conforms to the IUPAC nomenclature (i.e., <tt>H= instead of =HN</tt>). To update your atom names, do the following in CYANA<nowiki>
=== '''Conversion from XEASY/DYANA/CYANA 1.X'''  ===
translate dyana
 
For the automated / <tt>noesyassign</tt> runs of CYANA, please make sure that your chemical shift list conforms to the IUPAC nomenclature (i.e., <tt>H instead of HN</tt>). To update your atom names, do the following in CYANA:
<pre>translate dyana
read protein.prot
read protein.prot
translate off
translate off
write protein-cyana.prot
write protein-cyana.prot</pre>  
</nowiki>The <tt>protein-cyana.prot</tt> file now contains all of the correct atom names for CYANA.
The <tt>protein-cyana.prot</tt> file now contains all of the correct atom names for CYANA.  


You may need to do the same with UPLs created in DYANA or CYANA 1.X
You may need to do the same with UPLs created in DYANA or CYANA 1.X  


See the <tt>~/demo/details/MigrateFromDyanaCyana1.cya</tt> example script in the CYANA 2.1 installation directory for details.
See the <tt>~/demo/details/MigrateFromDyanaCyana1.cya</tt> example script in the CYANA 2.1 installation directory for details.  


===== '''Conversion from Sparky''' =====
=== '''Conversion from Sparky''' ===


CYANA can also read BMRB format chemical shift by using following commands: <br/> <nowiki>  
CYANA can also read BMRB format chemical shift by using following commands: <br>  
...
<pre>...
read bmrb protein.bmrb
read bmrb protein.bmrb
write prot protein.prot </nowiki>
write prot protein.prot </pre>  
 
For Sparky users, please use Sparky command <tt>xe</tt> to write out XEASY format peaklists.  
For Sparky users, please use Sparky command <tt>xe</tt> to write out XEASY format peaklists.
 
===== '''Splitting the simultaneous NOESY peaklist''' =====


When working with CARA it is not necessary to provide external ACO and UPL files. In CARA spin assignments are not derived from peak lists, and there is less impact from CYANA modifying existing peaks assignments. When external constraints are employed there are usually fewer peaks assigned and fewer UPLs derived. Thus it is recommended to use external UPL and ACO files only if there are convergence problems without them.
=== '''Splitting the simultaneous NOESY peaklist'''  ===


When using a simultaneous 3D NOESY peaklsit XEASY, you need to generate separate peaklists with UBNMR. The following UBNMR macro is provided as an example. It calculates proper 15N chemical shifts and peak positions, and writes out separate <tt>nnoe.peaks</tt> and <tt>cnoe.peaks</tt> peaklists. Modify the numbers to reflect the proper 15N and 13C carrier offsets (in ppm) and the spectral width ratios (<tt>sw2/sw2N</tt>).
When working with CARA it is not necessary to provide external ACO and UPL files. In CARA spin assignments are not derived from peak lists, and there is less impact from CYANA modifying existing peaks assignments. When external constraints are employed there are usually fewer peaks assigned and fewer UPLs derived. Thus it is recommended to use external UPL and ACO files only if there are convergence problems without them.  


<nowiki>
When using a simultaneous 3D NOESY peaklsit XEASY, you need to generate separate peaklists with UBNMR. The following UBNMR macro is provided as an example. It calculates proper <sup>15</sup>N chemical shifts and peak positions, and writes out separate <tt>nnoe.peaks</tt> and <tt>cnoe.peaks</tt> peaklists. Modify the numbers to reflect the proper <sup>15</sup>N and <sup>13</sup>C carrier offsets (in ppm) and the spectral width ratios (<tt>sw2/sw2N</tt>).<br>  
init
<pre>init
read seq xxx.seq
read seq xxx.seq
write seq xxxseq.bmrb autoBMRB
write seq xxxseq.bmrb autoBMRB
Line 70: Line 67:
update proton shift NE2 117.273 1
update proton shift NE2 117.273 1
update proton shift NE1 117.273 1
update proton shift NE1 117.273 1
write prot noe.prot
write prot noe.prot</pre>  
</nowiki>
=== '''External UPL Files''' ===
 
===== '''External UPL Files''' =====
 
<tt>noeassign</tt> employs so-called "sum of r^-6" averaging method (<tt>peaks calibrate</tt>) to calibrate peaklists and interpret UPLs during calculation. Therefore, external UPLs should ideally be calibrated with the same method.
 
If you supply UPL constraints created with CALIBA (CALIBA uses "center" averaging), you should be aware that these constraints will be too loose.
 
===== '''Using Unassigned Peaklists''' =====


If you are using completely unassigned peaklist (for example, picke from scratch in CARA), then you will need to add the following line to the peaklist header:
<tt>noeassign</tt> employs so-called "sum of r<sup>-6</sup>" averaging method (<tt>peaks calibrate</tt>) to calibrate peaklists and interpret UPLs during calculation. Therefore, external UPLs should ideally be calibrated with the same method.


<tt>#CYANAFORMAT HNh</tt>
If you supply UPL constraints created with CALIBA (CALIBA uses "center" averaging), you should be aware that these constraints will be too loose.


or
=== '''Using Unassigned Peaklists'''  ===


<tt>#CYANAFORMAT HCh</tt>
If you are using completely unassigned peaklist (for example, picke from scratch in CARA), then you will need to add the following line to the peaklist header:
<pre>#CYANAFORMAT HNh</pre>
or
<pre>#CYANAFORMAT HCh</pre>
The lowercase h denotes the indirect (NOE) <sup>1</sup>H dimension.


The lowercase =h= denotes the indirect (NOE) 1H dimension.
If your peaklist contains assigned peaks, then CYANA will be able to determine the peaklist dimensions based on these assignments.  


If your peaklist contains assigned peaks, then CYANA will be able to determine the peaklist dimensions based on these assignments.
== '''Running Automated Structure Calculation with CYANA 2.1'''  ==


=== '''Running Automated Structure Calculation with CYANA 2.1''' ===
#Create a working subdirectory (for example, <tt>structure/cyana21/calc1</tt>).
#Create an init.cya file as described in [[CYANA|Getting Started or]] copy a previously used file. Set an appropriate RMSD calculation range.
#Copy the latest sequence (<tt>XXXX.seq</tt>) and peaklist files (<tt>n.peaks</tt>, <tt>ali.peaks</tt> and <tt>aro.peaks</tt>) into the working directory. The sequence file and peaklist should in principle be the same as those used to [[FOUND|run FOUND]].
#Copy the updated atomlist (<tt>XXXX.prot</tt>). The spin labels in it should be swapped according to the [[FOUND|output of FOUND]].
#If you used FOUND, then copy the <tt>gridsearch.aco</tt> file from the previous FOUND run.
#If you used FOUND, then copy the <tt>stereofound.cya</tt> file from the previous FOUND run. Make sure that incorrect stereospecific assignments have been commented out or removed.
#(Optional) Generate the short-range UPL (<tt>short.upl</tt>) file based on the existing peak assignments. This is more convenient to do on a workstation. You can use the [[Media:Make_short.cya|make_short.cya]] script (see below). Alternatively, you can define a <tt>KEEP</tt> subroutine in the <tt>CALC.cya</tt> file.
#Download the [[Media:CALC_noeassign.cya|CALC.cya]] script (see below) and modify it according to the input data.


# Create a working subdirectory (for example, <tt>structure/cyana21/calc1</tt>).
You can choose whether you want to run structure calculation on a local Linux workstation or on the U2 Linux cluster. The typical machine times on a single workstation are 1.5 - 3 hours, depending on the protein size. Calculations on the cluster take only 15-30 minutes, but there my be additional queue waiting time. On weekdays during working hours (9 a.m. - 4 p.m.) there are 10 dual-processor nodes reserved for us only, and there is no waiting time.  
# Create an init.cya file as described in "[[NESG:CYANAInitFile|Creating an init.cya file for CYANA 2.1]]" or copy a previously used file. Set an appropriate RMSD calculation range.
# Copy the latest sequence (<tt>XXXX.seq</tt>) and peaklist files (<tt>n.peaks</tt>, <tt>ali.peaks</tt> and <tt>aro.peaks</tt>) into the working directory. The sequence file and peaklist should in principle be the same as those used to [[NESG:UsingFOUNDakaHABAS|run FOUND]].
# Copy the updated atomlist (<tt>XXXX.prot</tt>). The spin labels in it should be swapped according to the [[NESG:UsingFOUNDakaHABAS|output of FOUND]].
# If you used FOUND, then copy the <tt>gridsearch.aco</tt> file from the previous FOUND run.
# If you used FOUND, then copy the <tt>stereofound.cya</tt> file from the previous FOUND run. Make sure that incorrect stereospecific assignments have been commented out or removed.
# (Optional) Generate the short-range UPL (<tt>short.upl</tt>) file based on the existing peak assignments. This is more convenient to do on a workstation. You can use the [[NESG:%ATTACHURL%/make_short.cya|make_short.cya]] script (see below). Alternatively, you can define a <tt>KEEP</tt> subroutine in the <tt>CALC.cya</tt> file.
# Download the [[NESG:%ATTACHURL%/CALC.cya|CALC.cya]] script (see below) and modify it according to the input data.  


You can choose whether you want to run structure calculation on a local Linux workstation or on the U2 Linux cluster. The typical machine times on a single workstation are 1.5 - 3 hours, depending on the protein size. Calculations on the cluster take only 15-30 minutes, but there my be additional queue waiting time. On weekdays during working hours (9 a.m. - 4 p.m.) there are 10 dual-processor nodes reserved for us only, and there is no waiting time.
Check the [http://www.ccr.buffalo.edu/hotpages/content/u2/queue.htm queue status page] and the [http://www.ccr.buffalo.edu/hotpages/content/u2/nodes.htm nodemap page] to see the current system loads on U2.  


Check the [http://www.ccr.buffalo.edu/hotpages/content/u2/queue.htm queue status page] and the [http://www.ccr.buffalo.edu/hotpages/content/u2/nodes.htm nodemap page] to see the current system loads on U2.
To run calculations on the U2 Linux cluster:  


To run calculations on the U2 Linux cluster:
#Log in to <tt>u2.ccr.buffalo.edu</tt>  
# Log in to <tt>u2.ccr.buffalo.edu</tt>
#Change directory to <tt>/san/projects1/szypersk/</tt>.  
# Change directory to <tt>/san/projects1/szypersk/</tt>.
#Create a working subdirectory (like <tt>username/XXXX/cyana21</tt>)  
# Create a working subdirectory (like <tt>username/XXXX/cyana21</tt>)
#Copy the entire subdirectory <tt>calc1</tt>. You can use <tt>gftp</tt>, <tt>scp</tt> or <tt>sftp</tt>.  
# Copy the entire subdirectory <tt>calc1</tt>. You can use <tt>gftp</tt>, <tt>scp</tt> or <tt>sftp</tt>.
#Download the PBS submission script [[Media:Cyana.pbs|cyana.pbs]] (see below). Modify it if needed.  
# Download the PBS submission script [[NESG:%ATTACHURL%/cyana.pbs|cyana.pbs]] (see below). Modify it if needed.
#Type <tt>qsub cyana.pbs</tt> to submit you job.
# Type <tt>qsub cyana.pbs</tt> to submit you job.


To run calculations on a workstation:
To run calculations on a workstation:  
# Start CYANA 2.1 by typing <tt>cyana21</tt>
# Enter <tt>CALC</tt> at the cyana prompt.


#Start CYANA 2.1 by typing <tt>cyana21</tt>
#Enter <tt>CALC</tt> at the cyana prompt.


=== '''Output files''' ===
<br>


* <tt>final.pdb</tt> - resulting structure
== '''Output files'''  ==
* <tt>final.ovw</tt> - final overview file
* <tt>final.upl</tt> - final UPL file (unambiguous constraints; atom labels may be swapped)
* <tt>*-final.prot</tt> - final atom list (chemical shifts unchanged?; atom labels may be swapped)
* <tt>finalstereo.cya</tt> - stereospecific assignment file (to find swapped atom pairs see calculation log)
* <tt>*-cycle7.peaks</tt> - assigned peaklists (in CYANA 2.1 format with multiple assignments)
* <tt>cycleX.*</tt> - UPL, OVW, PDB and NOA files for cycle X (ambiguous constraints in UPL files)


Macro <tt>noeassign</tt> in CYANA 2.1 performs 7 routine calculation cycles and one final cycle. The output files are labeled <tt>cycle1.*</tt>, <tt>cycle2.*</tt> ... <tt>cycle7.*</tt> and <tt>final.*</tt> with appropriate extensions. Additional stereospecific assignment search is performed after cycle 7, therefore files, <tt>final.upl</tt> and <tt>*-final.prot</tt> likely have some labels swapped.
*<tt>final.pdb</tt> - resulting structure
*<tt>final.ovw</tt> - final overview file
*<tt>final.upl</tt> - final UPL file (unambiguous constraints; atom labels may be swapped)
*<tt>*-final.prot</tt> - final atom list (chemical shifts unchanged?; atom labels may be swapped)
*<tt>finalstereo.cya</tt> - stereospecific assignment file (to find swapped atom pairs see calculation log)
*<tt>*-cycle7.peaks</tt> - assigned peaklists (in CYANA 2.1 format with multiple assignments)
*<tt>cycleX.*</tt> - UPL, OVW, PDB and NOA files for cycle X (ambiguous constraints in UPL files)


Assigned peak lists are saved after cycle 7. They may have multiple assignments for some peaks thus not being fully compatible with XEASY.
Macro <tt>noeassign</tt> in CYANA 2.1 performs 7 routine calculation cycles and one final cycle. The output files are labeled <tt>cycle1.*</tt>, <tt>cycle2.*</tt> ... <tt>cycle7.*</tt> and <tt>final.*</tt> with appropriate extensions. Additional stereospecific assignment search is performed after cycle 7, therefore files, <tt>final.upl</tt> and <tt>*-final.prot</tt> likely have some labels swapped.  


Always check the output of CYANA calculation for the results of <tt>peakcheck</tt> command. It is executed before the first calculation cycle and reports various inconsistencies in the atom list and peak lists. In the end, many UPL violations can be traced back to mistakes in assignment or mis-picked peaks.
Assigned peak lists are saved after cycle 7. They may have multiple assignments for some peaks thus not being fully compatible with XEASY.  


=== '''Example scripts''' ===
Always check the output of CYANA calculation for the results of <tt>peakcheck</tt> command. It is executed before the first calculation cycle and reports various inconsistencies in the atom list and peak lists. In the end, many UPL violations can be traced back to mistakes in assignment or mis-picked peaks.


Below are the key scripts for running CYANA. See the demo subdirectory of CYANA installation for more details.
== '''Example scripts'''  ==


==== '''make_short.cya''' ====
Below are the key scripts for running CYANA. See the demo subdirectory of CYANA installation for more details.  


<nowiki>
=== '''make_short.cya'''  ===
peaks     := n,ali,aro              # names of peak lists
<pre>peaks     &nbsp;:= n,ali,aro              # names of peak lists
prot       := $name                  # names of proton lists
prot     &nbsp;:= $name                  # names of proton lists
tolerance := 0.05,0.02,0.3          # chemical shift tolerances
tolerance &nbsp;:= 0.05,0.02,0.3          # chemical shift tolerances
                                     # order: 1H(a), 1H(b), 13C/15N(b), 13C/15N(a)
                                     # order: 1H(a), 1H(b), 13C/15N(b), 13C/15N(a)
calibration:= 1.7E6,1.7E6,1.7E6      # calibration constants (will be determined
calibration:= 1.7E6,1.7E6,1.7E6      # calibration constants (will be determined
                                     # automatically, if commented out)
                                     # automatically, if commented out)
dref       := 4.2                    # average upper distance limit for
dref     &nbsp;:= 4.2                    # average upper distance limit for
                                     # automatic calibration
                                     # automatic calibration
peakcheck peaks=$peaks prot=$prot
peakcheck peaks=$peaks prot=$prot
Line 156: Line 148:
peaks calibrate "**" simple
peaks calibrate "**" simple
write upl short.upl
write upl short.upl
</nowiki>
</pre>  
 
<br> For the <tt>calibration</tt> parameter you can provide the list of calibration constants you have derived for the "backbone" class with <tt>caliba</tt>, when you calibrated the initial peak lists for use with FOUND/HABAS. Do not comment or delete this line, leave it blank if you want automatic calibration. Automatic calibration uses the <tt>dref</tt> parameter as the presumed average distance for all peaks in a peaklist (not just for backbone, like <tt>caliba</tt>).<br>
For the <tt>calibration</tt> parameter you can provide the list of calibration constants you have derived for the "backbone" class with <tt>caliba</tt>, when you calibrated the initial peak lists for use with FOUND/HABAS. Do not comment or delete this line, leave it blank if you want automatic calibration. Automatic calibration uses the <tt>dref</tt> parameter as the presumed average distance for all peaks in a peaklist (not just for backbone, like <tt>caliba</tt>).
 


==== '''CALC.cya''' ====
<br>


<nowiki>
=== '''CALC.cya'''  ===
peaks       := n,ali,aro                # names of NOESY peak lists
<pre>peaks     &nbsp;:= n,ali,aro                # names of NOESY peak lists
prot       := $name                    # names of chemical shift lists
prot       &nbsp;:= $name                    # names of chemical shift lists
constraints := gridsearch.aco,short.upl,stereofound.cya            # additional (non-NOE) constraints
constraints&nbsp;:= gridsearch.aco,short.upl,stereofound.cya            # additional (non-NOE) constraints
tolerance   := 0.05,0.02,0.4            # chemical shift tolerances
tolerance &nbsp;:= 0.05,0.02,0.4            # chemical shift tolerances
                                         # order: 1H(a), 1H(b), 13C/15N(b), 13C/15N(a)
                                         # order: 1H(a), 1H(b), 13C/15N(b), 13C/15N(a)
#upl_values := 2.4,6.0                  # calibration cutoffs
#upl_values &nbsp;:= 2.4,6.0                  # calibration cutoffs
calibration := 1.7E6,1.7E6,1.7E6        # NOE calibration parameters
calibration&nbsp;:= 1.7E6,1.7E6,1.7E6        # NOE calibration parameters
structures := 100,20                  # number of initial, final structures
structures &nbsp;:= 100,20                  # number of initial, final structures
steps       := 10000                    # number of torsion angle dynamics steps
steps     &nbsp;:= 10000                    # number of torsion angle dynamics steps
rmsdrange   := 10..100                  # residue range for RMSD calculation
rmsdrange &nbsp;:= 10..100                  # residue range for RMSD calculation
randomseed := 434726                  # random number generator seed
randomseed &nbsp;:= 434726                  # random number generator seed
dref       := 4.0                      # average distance for calibration, default 4.0
dref       &nbsp;:= 4.0                      # average distance for calibration, default 4.0
keep       :=                          # set to KEEP to retain existing assignments
keep       &nbsp;:=                          # set to KEEP to retain existing assignments
   
   
subroutine KEEP
subroutine KEEP
Line 182: Line 172:
end
end
   
   
#protocol := noeassign.out              # output logging on
#protocol&nbsp;:= noeassign.out              # output logging on
noeassign peaks=$peaks prot=$prot calibration=$calibration keep=$keep autoaco
noeassign peaks=$peaks prot=$prot calibration=$calibration keep=$keep autoaco
#protocol :=
#protocol&nbsp;:=
</nowiki>
</pre>  
 
<br> Parameter <tt>constraints</tt> can be a comma-separated list of all kinds of external constraints, which can be read by <tt>read data</tt> command in CYANA. You can have UPLs, ACOs and even .cya scripts, for example, defining stereospecific assignments of methyl groups. Do not comment this line, leave it blank if you are not providing external constraints.  
Parameter <tt>constraints</tt> can be a comma-separated list of all kinds of external constraints, which can be read by <tt>read data</tt> command in CYANA. You can have UPLs, ACOs and even .cya scripts, for example, defining stereospecific assignments of methyl groups. Do not comment this line, leave it blank if you are not providing external constraints.


If you a providing stereospecific assignments, do not use <tt>atom swap</tt> in the <tt>stereofound.cya</tt> script. Use atom list and <tt>short.upl</tt> with all required labels swapped, then the <tt>stereo.cya</tt> should only contain <tt>atom stereo</tt> declarations.
If you a providing stereospecific assignments, do not use <tt>atom swap</tt> in the <tt>stereofound.cya</tt> script. Use atom list and <tt>short.upl</tt> with all required labels swapped, then the <tt>stereo.cya</tt> should only contain <tt>atom stereo</tt> declarations.  


For the <tt>tolerance</tt> parameter pay attention to the unintuitive dimension order. The recommended tolerances are: 0.03 ppm or less for 1H (0.02 ppm or less for 2D homonuclear peaklists) and 0.6 ppm or less for 15N and 13C.
For the <tt>tolerance</tt> parameter pay attention to the unintuitive dimension order. The recommended tolerances are: 0.03 ppm or less for 1H (0.02 ppm or less for 2D homonuclear peaklists) and 0.6 ppm or less for 15N and 13C.  


Lower and upper limit cutoffs can be changed by applying <tt>upl_values</tt>. The default values are 2.4 and 5.5 A, respectively.
Lower and upper limit cutoffs can be changed by applying <tt>upl_values</tt>. The default values are 2.4 and 5.5 A, respectively.  


For the <tt>calibration</tt> parameter you can provide the list of calibration constants you have derived for the "backbone" class with <tt>caliba</tt>, when you calibrated the initial peak lists for use with FOUND/HABAS. Do not comment or delete this line, leave it blank if you want automatic calibration. Automatic calibration uses the <tt>dref</tt> parameter as the presumed average distance for all peaks in a peaklist (not just for backbone, like <tt>caliba</tt>). Having initial calibration too tight is less of an issue with <tt>noeassign</tt>, because by default it "elastically" relaxes constrains, which are consistently violated.
For the <tt>calibration</tt> parameter you can provide the list of calibration constants you have derived for the "backbone" class with <tt>caliba</tt>, when you calibrated the initial peak lists for use with FOUND/HABAS. Do not comment or delete this line, leave it blank if you want automatic calibration. Automatic calibration uses the <tt>dref</tt> parameter as the presumed average distance for all peaks in a peaklist (not just for backbone, like <tt>caliba</tt>). Having initial calibration too tight is less of an issue with <tt>noeassign</tt>, because by default it "elastically" relaxes constrains, which are consistently violated.  


Use the <tt>protocol</tt> keywords to enable output logging when running CYANA on a workstation. They may not be necessary on a cluster, because the queue system generates its own log.
Use the <tt>protocol</tt> keywords to enable output logging when running CYANA on a workstation. They may not be necessary on a cluster, because the queue system generates its own log.  


Use the subroutine <tt>KEEP</tt> to keep assignment for peaks that you are confident, which is helpful if you peak list contains simulated peaks for short range NOEs.
Use the subroutine <tt>KEEP</tt> to keep assignment for peaks that you are confident, which is helpful if you peak list contains simulated peaks for short range NOEs.  


==== '''cyana.pbs - PBS queue submission script''' ====
=== '''cyana.pbs - PBS queue submission script''' ===
 
<pre>#!/bin/csh
<nowiki>#!/bin/csh
#!/bin/csh
#!/bin/csh
#PBS -m abe
#PBS -m abe
Line 216: Line 204:
cd $PBS_O_WORKDIR
cd $PBS_O_WORKDIR
echo "working directory = "$PBS_O_WORKDIR
echo "working directory = "$PBS_O_WORKDIR
{| border="1"
set NN = `cat $PBS_NODEFILE | wc -l`
|-
set NN = `cat $PBS_NODEFILE || wc -l`
|}
 
echo "NN = "$NN
echo "NN = "$NN
module load mpich/intel-9/ch_p4/current
module load mpich/intel-9/ch_p4/current
Line 227: Line 211:
limit coredumpsize 0
limit coredumpsize 0
source $MODULESHOME/init/tcsh
source $MODULESHOME/init/tcsh
{| border="1"
cat $PBS_NODEFILE | awk '{printf "%s.ccr.buffalo.edu\n",$1}' &gt; tmp.$$
|-
cat $PBS_NODEFILE || awk '{printf "%s.ccr.buffalo.edu\n",$1}' > tmp.$$
|}
 
cyana -c '/util/mpich/1.2.7p1/intel-9/ch_p4/bin/mpiexec ' ./CALC
cyana -c '/util/mpich/1.2.7p1/intel-9/ch_p4/bin/mpiexec ' ./CALC
#
#
echo "ALL Done!"
echo "ALL Done!"
</nowiki>
</pre>  
<br> The <tt>#PBS</tt> lines pass option to the PBS queue system. See [http://www.ccr.buffalo.edu/hotpages/content/pbsEXia32.htm this page] for details
 
The following options are important:


The <tt>#PBS</tt> lines pass option to the PBS queue system. See [http://www.ccr.buffalo.edu/hotpages/content/pbsEXia32.htm this page] for details
*


The following options are important:
<tt>#PBS -m abe</tt> tell PBS queue system to send e-mail alerts when calculation starts (<tt>b), aborts (a</tt>) or terminates successfully (e).  
* <tt>#PBS -m abe</tt> tell PBS queue system to send e-mail alerts when calculation starts (<tt>b=), aborts (=a</tt>) or terminates successfully (=e=).
* Enter you e-mail address in <tt>#PBS -M myname@mydomain</tt>. Without this line e-mail alerts will go into local mailbox.
* The line <tt>#PBS -l nodes=5:ppn=2</tt> means that we are using five dual-processor nodes and get 10-fold parallelization during simulated annealing. It doesn't make much sense to request more than 5 nodes: first, the relative gain in speed drops since NOE assignment step cannot be parallelized; second, the queue wait time may be longer when more nodes are requested.
* <tt>#PBS -q short_c</tt> submits the job to the <tt>short_c</tt> queue. This queue is dedicated to short jobs and has higher priority. Members of Szyperski's lab have 10 nodes reserved for this queue every weekday 9 a.m. - 4 p.m.
* <tt>#PBS -l walltime=02:00:00</tt> defines maximum allocated job execution time. The limit for the <tt>shorts_c</tt> queue is 2 hours, but even the most demanding CYANA job finish in less than one hour.


*Enter you e-mail address in <tt>#PBS -M myname@mydomain</tt>. Without this line e-mail alerts will go into local mailbox.
*The line <tt>#PBS -l nodes=5:ppn=2</tt> means that we are using five dual-processor nodes and get 10-fold parallelization during simulated annealing. It doesn't make much sense to request more than 5 nodes: first, the relative gain in speed drops since NOE assignment step cannot be parallelized; second, the queue wait time may be longer when more nodes are requested.
*<tt>#PBS -q short_c</tt> submits the job to the <tt>short_c</tt> queue. This queue is dedicated to short jobs and has higher priority. Members of Szyperski's lab have 10 nodes reserved for this queue every weekday 9 a.m. - 4 p.m.
*<tt>#PBS -l walltime=02:00:00</tt> defines maximum allocated job execution time. The limit for the <tt>shorts_c</tt> queue is 2 hours, but even the most demanding CYANA job finish in less than one hour.


-- Main.htpnmr - 22 Jan 2007
<br><br>


* [[NESG:%ATTACHURL%/CALC.cya|CALC.cya]]: CYANA 2.1 automated structure calculation script
*[[Media:CALC_noeassign.cya|CALC.cya]]: CYANA 2.1 automated structure calculation script


* [[NESG:%ATTACHURL%/cyana.pbs|cyana.pbs]]: PBS queue submission script for CYANA 2.1 on U2 cluster
*[[Media:Cyana.pbs|cyana.pbs]]: PBS queue submission script for CYANA 2.1 on U2 cluster


* [[NESG:%ATTACHURL%/make_short.cya|make_short.cya]]: CYANA script to run manual calculation with local constraints
*[[Media:Make_short.cya|make_short.cya]]: CYANA script to run manual calculation with local constraints

Latest revision as of 21:54, 6 January 2010

Introduction

Below is the description of how to run CYANA 2.1 for automated NOE assignment if you are working with CARA.  A tutorial for performing structure calculations with automated NOESY assignments using CYANA 3.0 is available on-line.

Input files

Required files

  • Initialization file init.cya.
  • SequenceList in XEASY format - usually XXXX.seq, where XXXX is the NESG ID.
  • AtomList in XEASY format XXXX.prot . Chemical shifts should be real, not folded. Make sure that you are using the most recent file. Atom labels should be swapped if using stereospecific assignments.
  • Separate unfolded PeakList for 15N and 13C NOESY: n.peaks, ali.peaks, aro.peaks.

Optional files

  • Stereospecific assignment script (such as stereofound.cya from FOUND/HABAS). Note that this script should contain only atom stereo declarations, but no atom swap statements! Atom labels must be already swapped in the AtomList and external UPL files.
  • External UPL files, such as short.upl. Atom labels should be swapped if using stereospecific assignments.
  • External ACO files, such as gridsearch.aco output of FOUND/HABAS.

Format Conversion

The input files (sequence, atom list, ACOs and UPLs) must adhere to the IUPAC nomenclature used by CYANA 2.1 (i.e., H instead of HN, etc.). CARA is fully compatible with this nomenclature, while data from other programs may need to be converted.

Conversion from XEASY/DYANA/CYANA 1.X

For the automated / noesyassign runs of CYANA, please make sure that your chemical shift list conforms to the IUPAC nomenclature (i.e., H instead of HN). To update your atom names, do the following in CYANA:

translate dyana
read protein.prot
translate off
write protein-cyana.prot

The protein-cyana.prot file now contains all of the correct atom names for CYANA.

You may need to do the same with UPLs created in DYANA or CYANA 1.X

See the ~/demo/details/MigrateFromDyanaCyana1.cya example script in the CYANA 2.1 installation directory for details.

Conversion from Sparky

CYANA can also read BMRB format chemical shift by using following commands:

...
read bmrb protein.bmrb
write prot protein.prot 

For Sparky users, please use Sparky command xe to write out XEASY format peaklists.

Splitting the simultaneous NOESY peaklist

When working with CARA it is not necessary to provide external ACO and UPL files. In CARA spin assignments are not derived from peak lists, and there is less impact from CYANA modifying existing peaks assignments. When external constraints are employed there are usually fewer peaks assigned and fewer UPLs derived. Thus it is recommended to use external UPL and ACO files only if there are convergence problems without them.

When using a simultaneous 3D NOESY peaklsit XEASY, you need to generate separate peaklists with UBNMR. The following UBNMR macro is provided as an example. It calculates proper 15N chemical shifts and peak positions, and writes out separate nnoe.peaks and cnoe.peaks peaklists. Modify the numbers to reflect the proper 15N and 13C carrier offsets (in ppm) and the spectral width ratios (sw2/sw2N).

init
read seq xxx.seq
write seq xxxseq.bmrb autoBMRB
read prot xxx-simnoesy.prot
read peaks xxx-simnoesy.peaks
update peak shift N -35.700 1.0822510
update peak shift N 117.273 1
write peaks ncnoe.peaks
split ncnoe.peaks nnoe.peaks cnoe.peaks
update proton shift N -35.700 1.0822510
update proton shift ND2 -35.700 1.0822510
update proton shift NE -35.700 1.0822510
update proton shift NE2 -35.700 1.0822510
update proton shift NE1 -35.700 1.0822510
update proton shift N 117.273 1
update proton shift ND2 117.273 1
update proton shift NE 117.273 1
update proton shift NE2 117.273 1
update proton shift NE1 117.273 1
write prot noe.prot

External UPL Files

noeassign employs so-called "sum of r-6" averaging method (peaks calibrate) to calibrate peaklists and interpret UPLs during calculation. Therefore, external UPLs should ideally be calibrated with the same method.

If you supply UPL constraints created with CALIBA (CALIBA uses "center" averaging), you should be aware that these constraints will be too loose.

Using Unassigned Peaklists

If you are using completely unassigned peaklist (for example, picke from scratch in CARA), then you will need to add the following line to the peaklist header:

#CYANAFORMAT HNh

or

#CYANAFORMAT HCh

The lowercase h denotes the indirect (NOE) 1H dimension.

If your peaklist contains assigned peaks, then CYANA will be able to determine the peaklist dimensions based on these assignments.

Running Automated Structure Calculation with CYANA 2.1

  1. Create a working subdirectory (for example, structure/cyana21/calc1).
  2. Create an init.cya file as described in Getting Started or copy a previously used file. Set an appropriate RMSD calculation range.
  3. Copy the latest sequence (XXXX.seq) and peaklist files (n.peaks, ali.peaks and aro.peaks) into the working directory. The sequence file and peaklist should in principle be the same as those used to run FOUND.
  4. Copy the updated atomlist (XXXX.prot). The spin labels in it should be swapped according to the output of FOUND.
  5. If you used FOUND, then copy the gridsearch.aco file from the previous FOUND run.
  6. If you used FOUND, then copy the stereofound.cya file from the previous FOUND run. Make sure that incorrect stereospecific assignments have been commented out or removed.
  7. (Optional) Generate the short-range UPL (short.upl) file based on the existing peak assignments. This is more convenient to do on a workstation. You can use the make_short.cya script (see below). Alternatively, you can define a KEEP subroutine in the CALC.cya file.
  8. Download the CALC.cya script (see below) and modify it according to the input data.

You can choose whether you want to run structure calculation on a local Linux workstation or on the U2 Linux cluster. The typical machine times on a single workstation are 1.5 - 3 hours, depending on the protein size. Calculations on the cluster take only 15-30 minutes, but there my be additional queue waiting time. On weekdays during working hours (9 a.m. - 4 p.m.) there are 10 dual-processor nodes reserved for us only, and there is no waiting time.

Check the queue status page and the nodemap page to see the current system loads on U2.

To run calculations on the U2 Linux cluster:

  1. Log in to u2.ccr.buffalo.edu
  2. Change directory to /san/projects1/szypersk/.
  3. Create a working subdirectory (like username/XXXX/cyana21)
  4. Copy the entire subdirectory calc1. You can use gftp, scp or sftp.
  5. Download the PBS submission script cyana.pbs (see below). Modify it if needed.
  6. Type qsub cyana.pbs to submit you job.

To run calculations on a workstation:

  1. Start CYANA 2.1 by typing cyana21
  2. Enter CALC at the cyana prompt.


Output files

  • final.pdb - resulting structure
  • final.ovw - final overview file
  • final.upl - final UPL file (unambiguous constraints; atom labels may be swapped)
  • *-final.prot - final atom list (chemical shifts unchanged?; atom labels may be swapped)
  • finalstereo.cya - stereospecific assignment file (to find swapped atom pairs see calculation log)
  • *-cycle7.peaks - assigned peaklists (in CYANA 2.1 format with multiple assignments)
  • cycleX.* - UPL, OVW, PDB and NOA files for cycle X (ambiguous constraints in UPL files)

Macro noeassign in CYANA 2.1 performs 7 routine calculation cycles and one final cycle. The output files are labeled cycle1.*, cycle2.* ... cycle7.* and final.* with appropriate extensions. Additional stereospecific assignment search is performed after cycle 7, therefore files, final.upl and *-final.prot likely have some labels swapped.

Assigned peak lists are saved after cycle 7. They may have multiple assignments for some peaks thus not being fully compatible with XEASY.

Always check the output of CYANA calculation for the results of peakcheck command. It is executed before the first calculation cycle and reports various inconsistencies in the atom list and peak lists. In the end, many UPL violations can be traced back to mistakes in assignment or mis-picked peaks.

Example scripts

Below are the key scripts for running CYANA. See the demo subdirectory of CYANA installation for more details.

make_short.cya

peaks      := n,ali,aro              # names of peak lists
prot       := $name                  # names of proton lists
tolerance  := 0.05,0.02,0.3          # chemical shift tolerances
                                     # order: 1H(a), 1H(b), 13C/15N(b), 13C/15N(a)
calibration:= 1.7E6,1.7E6,1.7E6      # calibration constants (will be determined
                                     # automatically, if commented out)
dref       := 4.2                    # average upper distance limit for
                                     # automatic calibration
peakcheck peaks=$peaks prot=$prot
calibration prot=$prot peaks=$peaks constant=$calibration dref=$dref
peaks calibrate "**" simple
write upl short.upl


For the calibration parameter you can provide the list of calibration constants you have derived for the "backbone" class with caliba, when you calibrated the initial peak lists for use with FOUND/HABAS. Do not comment or delete this line, leave it blank if you want automatic calibration. Automatic calibration uses the dref parameter as the presumed average distance for all peaks in a peaklist (not just for backbone, like caliba).


CALC.cya

peaks       := n,ali,aro                # names of NOESY peak lists
prot        := $name                    # names of chemical shift lists
constraints := gridsearch.aco,short.upl,stereofound.cya            # additional (non-NOE) constraints
tolerance   := 0.05,0.02,0.4            # chemical shift tolerances
                                        # order: 1H(a), 1H(b), 13C/15N(b), 13C/15N(a)
#upl_values  := 2.4,6.0                  # calibration cutoffs
calibration := 1.7E6,1.7E6,1.7E6        # NOE calibration parameters
structures  := 100,20                   # number of initial, final structures
steps       := 10000                    # number of torsion angle dynamics steps
rmsdrange   := 10..100                  # residue range for RMSD calculation
randomseed  := 434726                   # random number generator seed
dref        := 4.0                      # average distance for calibration, default 4.0
keep        :=                          # set to KEEP to retain existing assignments
 
subroutine KEEP
   peaks select "*,* number=20000..37999"
end
 
#protocol := noeassign.out              # output logging on
noeassign peaks=$peaks prot=$prot calibration=$calibration keep=$keep autoaco
#protocol :=


Parameter constraints can be a comma-separated list of all kinds of external constraints, which can be read by read data command in CYANA. You can have UPLs, ACOs and even .cya scripts, for example, defining stereospecific assignments of methyl groups. Do not comment this line, leave it blank if you are not providing external constraints.

If you a providing stereospecific assignments, do not use atom swap in the stereofound.cya script. Use atom list and short.upl with all required labels swapped, then the stereo.cya should only contain atom stereo declarations.

For the tolerance parameter pay attention to the unintuitive dimension order. The recommended tolerances are: 0.03 ppm or less for 1H (0.02 ppm or less for 2D homonuclear peaklists) and 0.6 ppm or less for 15N and 13C.

Lower and upper limit cutoffs can be changed by applying upl_values. The default values are 2.4 and 5.5 A, respectively.

For the calibration parameter you can provide the list of calibration constants you have derived for the "backbone" class with caliba, when you calibrated the initial peak lists for use with FOUND/HABAS. Do not comment or delete this line, leave it blank if you want automatic calibration. Automatic calibration uses the dref parameter as the presumed average distance for all peaks in a peaklist (not just for backbone, like caliba). Having initial calibration too tight is less of an issue with noeassign, because by default it "elastically" relaxes constrains, which are consistently violated.

Use the protocol keywords to enable output logging when running CYANA on a workstation. They may not be necessary on a cluster, because the queue system generates its own log.

Use the subroutine KEEP to keep assignment for peaks that you are confident, which is helpful if you peak list contains simulated peaks for short range NOEs.

cyana.pbs - PBS queue submission script

#!/bin/csh
#!/bin/csh
#PBS -m abe
#PBS -M yourname@domain
#PBS -q short_c
#PBS -l nodes=5:ppn=2
#PBS -l walltime=02:00:00
#PBS -o cyana.out
#PBS -j oe
#PBS -N cyana
#
cd $PBS_O_WORKDIR
echo "working directory = "$PBS_O_WORKDIR
set NN = `cat $PBS_NODEFILE | wc -l`
echo "NN = "$NN
module load mpich/intel-9/ch_p4/current
module load cyana/2.1-p4
limit stacksize unlimited
limit coredumpsize 0
source $MODULESHOME/init/tcsh
cat $PBS_NODEFILE | awk '{printf "%s.ccr.buffalo.edu\n",$1}' > tmp.$$
cyana -c '/util/mpich/1.2.7p1/intel-9/ch_p4/bin/mpiexec ' ./CALC
#
echo "ALL Done!"


The #PBS lines pass option to the PBS queue system. See this page for details

The following options are important:

#PBS -m abe tell PBS queue system to send e-mail alerts when calculation starts (b), aborts (a) or terminates successfully (e).

  • Enter you e-mail address in #PBS -M myname@mydomain. Without this line e-mail alerts will go into local mailbox.
  • The line #PBS -l nodes=5:ppn=2 means that we are using five dual-processor nodes and get 10-fold parallelization during simulated annealing. It doesn't make much sense to request more than 5 nodes: first, the relative gain in speed drops since NOE assignment step cannot be parallelized; second, the queue wait time may be longer when more nodes are requested.
  • #PBS -q short_c submits the job to the short_c queue. This queue is dedicated to short jobs and has higher priority. Members of Szyperski's lab have 10 nodes reserved for this queue every weekday 9 a.m. - 4 p.m.
  • #PBS -l walltime=02:00:00 defines maximum allocated job execution time. The limit for the shorts_c queue is 2 hours, but even the most demanding CYANA job finish in less than one hour.



  • CALC.cya: CYANA 2.1 automated structure calculation script
  • cyana.pbs: PBS queue submission script for CYANA 2.1 on U2 cluster
  • make_short.cya: CYANA script to run manual calculation with local constraints