CYANA

From NESG Wiki
Jump to navigation Jump to search

Introduction

CYANA is a macromolecular structure calculation algorithm based on simulated annealing molecular dynamics calculations in torsional angle space, in contrast to Cartesian space [1,2].  Here the only degrees of freedom are torsion angles with covalent structure parameters kept fixed, thereby significantly decreasing the number of degrees of freedom in the calculation.

The lateest version of CYANA is 3.0, which is capable of handling orientational (i.e., RDC) constraints. You can find reference material, examples and tutorial at the CYANA 3.0 wiki.

Below we provide a more detailed description of input files and protocols as used in the NESG.


CYANA 2.1

Residue Library

CYANA versions 2.0 and later use a new residue library ~/lib/cyana.lib. The description of its file format can be found here: Residue_library_file It is loaded by the cyanalib command. For backwards compatibility the old DYANA residue library dyana.lib is provided (loaded with dyanalib, of course).

The main difference is larger van der Waals radii in the newer library. This will give you a larger target function than DYANA, but also better clash scores.

The new library no longer includes separate entries for neutral and charged arginine, lysine, histidine, aspartic and glutamic acid. The new ARG, LYS, ASP and GLU entries in cyana.lib correspond to ARG+, LYS+, ASP- and GLU- entries in dyana.lib. The HIS residue in cyana.lib is a delta-protonated neutral species. The charged (HIS+) and epsilon-protonated neutral (HIST) residues are included in the special.lib residue library.

Atom Nomenclature

Atom nomenclature was made compatible with BMRB standard. The deviations from XEASY/DYANA conventions are: HN <-> H, HA1 <-> HA2 and HA2 <-> HA3 for GLY.

There a is macro translate.cya, which is used to convert input to different formats. For example, to read files with DYANA nomenclature, enter translate dyana. To switch back to CYANA 2.1 convention type translate off.


Pseudoatom Treatment

Pseudoatom handling is switched by setting pseudo=x, where x is 0, 1, 2, or 3.

With pseudo=0, the default setting, coordinate files *.cor and *.pdb do not contain pseudoatoms. They are calculated implicitly on the run.

Setting pseudo=1 restores the old DYANA behavior with explicit pseudoatoms.

Setting pseudo=2 switches to simplified pseudoatom names, such as HB instead of QB, HD1 instead of QD1, and HD instead of QQD of Leu. This is the setting to be used when reading chemical shifts from CARA. Coordinate files will contain explicit pseudoatoms, as with pseudo=1

Setting pseudo=3 allows X-Plor/CNS pseudoatom names, like HX* instead of QX. For some reason using translate xplor is not enough to do the conversion for all the atoms.


The Initialization File:  init.cya

The init.cya is a local initialization file, which is read when cyana starts. It should be located in the directory where CYANA is run. In a given project the same file can be used for nearly all calculations.

Create your own init.cya file with the following lines in a text editor or download this template init.cya file:

name:=XXXX            # Replace XXXX with NESG ID
nproc=2               # Number of processors on a workstation
rmsdrange:=20..72     # RMSD reported for these residues after structure calculation
# Read the standard and special libraries
cyanalib
read lib $cyanadir/lib/special.lib append
pseudo=2              # Allows HB, HD, etc. pseudoatom names, use with CARA
read seq $name        # Initialize

Replace XXXX with your NESG target ID. It is convenient to have the sequence and atomlist files named as XXXX.seq and XXXX.prot.

  • nproc defines the number of processors on a workstation.
  • rmsdrange is only used in structure calculation. Set the range to a valid residue range. From NOE patterns you can exclude flexible N- and C-terminal parts. If you have a flexible loop in the middle you can specify the range as 10..30,40..70.
  • cyanalib reads the default cyana.lib residue library. The special.lib library is appended to use non-standard residues, (i.e., His tautomers).
  • pseudo=2 is only necessary to read atom lists created with CARA, because they have H* for pseudoatom labels. Comment it out, or use pseudo=0 if you take atom lists from XEASY.


The Structure Calculation File:  CALC.cya

There are calculation demos for automatic assignment (~/demo/auto) and simple structure calculation (~/demo/manual) runs.

Here is a CALC.cya script for automatic NOE assignment:

peaks       := c13.peaks,n15.peaks,aro.peaks  # names of NOESY peak lists
prot        := demo                     # names of chemical shift lists
constraints := demo.aco                 # additional (non-NOE) constraints
tolerance   := 0.040,0.030,0.45         # chemical shift tolerances
calibration :=                          # NOE calibration parameters
structures  := 100,20                   # number of initial, final structures
steps       := 10000                    # number of torsion angle dynamics steps
rmsdrange   := 10..100                  # residue range for RMSD calculation
randomseed  := 434726                   # random number generator seed

noeassign peaks=$peaks prot=$prot autoaco

To prevent CYANA from changing existing peak assignments you need to define a subroutine to select the peaks to keep:

peaks       := c13.peaks,n15.peaks,aro.peaks  # names of NOESY peak lists
prot        := demo                     # names of chemical shift lists
constraints := demo.aco                 # additional (non-NOE) constraints
tolerance   := 0.040,0.030,0.45         # chemical shift tolerances
calibration :=                          # NOE calibration parameters
structures  := 100,20                   # number of initial, final structures
steps       := 10000                    # number of torsion angle dynamics steps
rmsdrange   := 10..100                  # residue range for RMSD calculation
randomseed  := 434726                   # random number generator seed

subroutine KEEP
   peaks select "*, * number=2..7999"
end

noeassign peaks=$peaks prot=$prot autoaco keep=KEEP

Here, subroutine KEEP is used to keep the assignments for peaks with peak numbers from 2 to 7999.


Here is a CALC.cya script for manual structure calculation:

peaks      := c13,n15,aro            # names of peak lists 
prot       := demo                   # names of proton lists
tolerance  := 0.040,0.030,0.45       # chemical shift tolerances
                                     # order: 1H(a), 1H(b), 13C/15N(b), 13C/15N(a)
calibration:= 6.7E5,8.2E5,8.0E4      # calibration constants (will be determined
                                     #   automatically, if commented out)
dref       := 4.2                    # average upper distance limit for 
                                     #   automatic calibration

if (master) then

  # ---- check consistency of peak and chemical shift lists----

  peakcheck peaks=$peaks prot=$prot

  # ---- calibration ----

  calibration prot=$prot peaks=$peaks constant=$calibration dref=$dref
  peaks calibrate "**" simple
  write upl $name-in.upl
  distance modify
  write upl $name.upl

end if
synchronize

# ---- structure calculation ----

read seq $name.seq                             # re-read sequence to initialize
read upl $name.upl                             # read upper distance limits
read aco $name.aco                             # read angle constraints
seed=5671                                      # random number generator seed
calc_all structures=100 command=anneal steps=10000    # calculate 100 conformers
overview $name.ovw structures=20 pdb           # write overview file and coordinates

Note the order in which tolerances are given.

The calibration field can be left empty, in this case dref will be used to derive calibration constants. If dref is not specified noeassign.cya will use a default value of 4.0. During calculation noeassign.cya will also relax the calibration if needed (that is in "elastic" mode, which is the default).

constraints need not be non-NOE despite what the comment says. You can add *.aco, *.upl, *.lol, and even *.cya macros for stereospecific assignments (haven't tested it yet, but that's the way CYANA adds stereospecific assignments in the final round).

master and synchronize keywords are needed for running on a cluster.

peakcheck checks the peaklist assignments against the atom list. Always check CYANA output for peakcheck results - those huge upl violations may be caused by mis-assigned peaks.


Chemical Shift Tolerances

The default chemical shift tolerances are 0.02,0.02,0.4,0.4, with the first two for H and the third is for the N/C bound to the second H, and the fourth is for the N/C bound to the first proton. ie. 3D Nnoesy hHN, 4D CCnoesy hHCc

An undocumented treatment of tolerances is that Cyana will use the largest ones, if duplicates are given. So, if you set the tolerances low in the CALC.cya, you can set tolerances independently in each peak list.

In the CALC.cya:

 tolerance  := 0.01,0.01,0.1,0.1 # chemical shift tolerances 

Then in the n15.peaks header:

#INAME 1 N
#INAME 2 h
#INAME 3 H
#CYANAFORMAT NhH
#TOLERANCE   0.4000   0.0400   0.0400


NOE Calibration

CYANA 2.1 by default does not use explicit pseudoatom corrections in distance constraints. Instead, these corrections are applied implicitly on-the-fly. This behavior is turned on by setting expand=1.

Calibration is thus performed with the undocumented statement peaks calibrate "**" simple. Trivial calculations show, however, that this command uses a simple r^-6 calibration without adding pseudoatom corrections.

Old calibration macros, such as calibrate.cya and caliba.cya are still allowed, but they do add explicit pseudoatom corrections. So if want to use them, don't forget to set expand=0. Omitting it will result in applying corrections twice, making the corresponding constraints very loose.

This is, of course, a matter of huge confusion since both methods produce otherwise identical *.upl files. Be sure you know HOW you calibrate your NOEs.

To modify upper and lower distance cutoffs for NOE calibration, use set upl_values:=2.4,6.0. The defaults are 2.4 and 5.5.

For a more detailed description of NOE calibration using CYANA, follow this link.


Dihedral Angle Constraints

In CYANA, dihedral angle constraints are specified in a .aco file.

Dihedral angle constraints for structure calculation in CYANA can come from a variety of sources.  For example, the FOUND module derives dihedral angle constraints based on local NOE data.

Programs such as TALOS provide backbone phi and psi torsion angle constraints based on chemical shifts.  In our structure determination pipeline we often make use of TALOS-derived backbone torsion angle constraints in our calculations. 


Stereospecific Assignments

Constraints for diastereotopic atoms (such as HB2, HB3) are treated as ambiguous by CYANA. This is switched on with swap=1.

For the manual run you may want to have swap=0 to be compatible with DYANA behavior. This option is apparently not necessary when distance modification is applied.

Distance modification does not affect Phe and Tyr ring atoms HD1/2 and HE1/2. Therefore, if you have degenerate ring chemical shift (as is almost always the case) make sure you have them labeled QD and QE

External stereospecific assignments determined with GLOMSA or with the help of a fractionally (i.e., 5%) 13C-labeled sample [3] can be defined in the file ssa.cya like this:

	# VAL
	atom stereo "QG1 25 36 38 87"
	atom stereo "QG1 43 90"
	atom swap   "QG1 43 90"
	# LEU
	atom stereo "QD1 60 63 97"
	atom stereo "QD1 35 56"
	atom swap   "QD1 35 56"

Here the syntax of CYANA 2.1 requires double quotes. For some strange reason, methyl groups should be written with the letter "Q" even if pseudo=2 is used.

A description of how to obtain the LEU and VAL stereospecific assignments is found here: SSA from 13C fractionally labeled sample.


NOESY Peak Lists

CYANA 2.1 can produce multiple assignments for a peak. Below is a part of an aliphatic NOESY peaklist with peak #6 having two assignments. #VC tags specify the weights given to individual assignments. Calibration of this peak yields two constraints splitting the peak integral according to these weights.

# Number of dimensions 3
#FORMAT xeasy3D
#INAME 1 H
#INAME 2 C
#INAME 3 h
#CYANAFORMAT HCh
     1   4.147  51.731   1.474 3 U   7.953E+03  0.000E+00 - 0  2234  2233  2238 #QU 1.000 #SUP  1.00
     2   4.147  51.731   4.251 4 U   4.181E+03  0.000E+00 - 0     0     0     0 
     3   4.147  51.731   7.791 3 U   6.017E+03  0.000E+00 - 0  2234  2233   232 #QU 1.000 #SUP  1.00
     4   1.474  22.186   0.515 3 U   1.481E+03  0.000E+00 - 0  2390  2389  1417 #QU 0.981 #SUP  0.98
     5   1.474  22.186   1.249 3 U   2.610E+04  0.000E+00 - 0  2390  2389  1706 #QU 0.987 #SUP  0.99
     6   1.474  22.186   2.635 3 U   1.396E+03  0.000E+00 - 0  2390  2389  1715 #VC 0.47897 #QU 0.774 #SUP  0.96
                                                               2390  2389  1815 #VC 0.52103 #QU 0.813 #SUP  0.96
     7   1.474  22.186   3.863 3 U   1.418E+04  0.000E+00 - 0  2390  2389  1657 #QU 0.885 #SUP  0.88
     8   1.474  22.186   4.147 3 U   1.448E+04  0.000E+00 - 0  2238  2237  2234 #QU 1.000 #SUP  1.00

The peaklists produced by CYANA 2.1 are not backwards-compatible with XEASY, but there are Lua scripts, which can read them into CARA including the information on ambiguous assignments. UBNMR should also be able to handle them in the future.

When supplying completely unassigned peaks for automatic NOE assignment it is necessary to include a line like #CYANAFORMAT HCh in the header.


CYANA 3.0

Again please consult the CYANA 3.0 wiki for complete details on file formats, input files for CYANA, and other documentation.

Residue Library

A residue library defines all properties of a residue including atom types, the nomenclature, the dihedral angle definitions, the covalent connectivities and the standard geometry. The standard geometry of the ECEPP/2 force field [4,5] is used for all amino acid residue types.  Standard residues are collected in the cyana.lib library; special residue types are in the special.lib library.

Sequence File

The sequence file (.seq) defines the sequence of the molecule you are working with.  Special residue types (i.e., oxdized cysteine, histidine tautomers, and cis-peptide bonds) can also be defined in the sequence file as follows:

  • oxidized cysteine:  CISS
  • charged histidine:  HIS+
  • Nε2H neutral tautomer:  HIST (the default HIS specifies the Nδ1H neutral tautomer).
  • cis-peptide bond:  place a "c" before the residue name;  i.e., cPRO
  • invisible intermolecular linkers:  PL, LL, LL2, LL5, LP


Automated Structure calculation

peaks       := n,ali,aro             # names of NOESY peak lists
prot        := $name                 # names of chemical shift lists
restraints  := talos.aco,stereo.cya  # additional (non-NOE) constraints
tolerance   := 0.04,0.02,0.4         # chemical shift tolerances
                                     # order: 1H(a), 1H(b), 13C/15N(b), 13C/15N(a)
upl_values  := 2.4,5.5               # calibration cutoffs
cut_upl=0.05
calibration_constant:=               # NOE calibration parameters
structures  := 100,20                # number of initial, final structures
steps       := 10000                 # number of torsion angle dynamics steps
rmsdrange   := 20..102               # residue range for RMSD calculation
randomseed  := 562                   # random number generator seed
calibration_dref := 4.0              # average distance for calibration, default 4.0
keep        :=                       # set to KEEP to retain existing assignments

weight_rdc   = 0.002                 # weight for RDC restraints
cut_rdc      = 1.0                   # cut-off for RDC violations
opt_tensor   = 1                     # alignment tensor optimization

subroutine KEEP
   peaks select "*,*"
end

noeassign peaks=$peaks prot=$prot keep=$keep autoaco

A Simple Automated Structure Calculation Using CYANA 3.0

This section provides an example of a standard automatic NOESY assignment calculation using CYANA 3.0 on a monomeric protein.

Input Files

Collect the following files in your directory (see the attached files for examples and formatting):

  • init.cya:  initialization file. Defines the protein name (i.e., PROT), residue library(ies), number of processors used, and rmsd residue range.
  • CALC.cya:  structure calculation file: Defines the peak lists, tolerances, any NOE calibration parameters (default is automatic calibration), total number of structures calculated in each cycle, number of structures with lowest target function retained after each cycle, number of torsion angle dynamics steps, random seed.
  • PROT.seq:  protein sequence file.
  • PROT.aco:  dihedral angle constraint file.
  • filename.prot:  chemical shift assignment list.  You should make all degenerate geminal proton assignments Q's, as well as degenerate side chain aromatics (HD1/HD2 and HE1/HE2).

If your assignments are in a bmrb file (2.1), start cyana, read in the bmrb file, and then write the shifts out to a prot file as follows:

	Open project in cyana 3.0:
		read bmrb [finename.bmrb]
		write prot [filename.prot]

You will then need to edit the filename.prot file to fix all degenerate geminal proton assignments:
        if degenerate HB2=HB3 then change HB2 to QB and delete HB3 line
        if degenerate HG2=HG3 then change HG2 to QG and delete HG3 line
                etc. for QD and QE and fix degenerate aromatic protons (typically all F and Y HD1/2 and HE1/2:
        if degenerate PHE/TYR HD1=HD2 then change HD1 to QD and delete HD2 line
        if degenerate PHE/TYR HE1=HE2 then change HE1 to QE and delete HE2 line
Degenerate aromatic side chain carbons don't need to be corrected.
Also fix degenerate VAL and LEU isopropyl methyl groups:
        if degenerate VAL QG1=QG2 then change QG1 to QQG and delete QG2 line
        if degenerate LEU QD1=QD2 then change QD1 to QQD and delete QD2 line

  • filename1.peaks, filename2.peaks, filename3.peaks:  NOESY peak lists in XEASY format.  The peak lists are unassigned.
  • ssa.cya:  file with stereospecific assignments defined. Leu and Val methyl group stereospecific assignments come from the U-15N, 5% biosynthetically directed 13C sample data.A description of how to obtain the LEU and VAL stereospecific assignments is found here: SSA from 13C fractionally labeled sample.  
    Stereospecific assignments for ASN HD21 / HD22 and GLN HD21 / HE22 come from NOEs to either the ASN HBs or GLN HGs.  The NH2 proton with the strongest NOEs to the HGs or HBs is assigned to the HD21 or HE22, respectively.

Running the Program

To run CYANA 3.0 on our cluster at CABM, login to master3 and type:

	/farm/software/cyana3.0/bin/cyana CALC > log.out 

Output Files

A CYANA structure calculation run will produce .pdb, .upl, .noa and .ovw files for each cycle and the final cycle, as well as log and ramachandran files for the run.

The command cyanatable produces a summary table of an automated NOE assignment structure calculation run.


References

1.    Güntert, P,, Mumenthaler, C. and Wüthrich, K. (1997) Torsion angle dynamics for NMR structure calculation with the new program DYANA. J. Mol. Biol. 273, 283-298.

2.    Herrmann, T., Güntert P. and Wüthrich, K. (2002) Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA.  J. Mol. Biol. 319 , 209-227.

3.    Neri, D., Szyperski, T., Otting, G., Senn, H. and Wüthrich, K. (1989) Stereospecific nuclear magnetic resonance assignments of the methyl groups of valine and leucine in the DNA-binding domain of the 434 repressor by biosynthetically directed fractional 13C labeling. Biochemistry 28, 7510-7516.

4.    Momany, F.A., McGuire, R.F., Burgess, A.W. and Scheraga, H.A. (1975)  Energy parameters in polypeptides. VII. Geometric parameters, partial atomic charges, nonbonded interactions, hydrogen bond interactions, and intrinsic torsional potentials for the naturally occurring amino acids.  J. Phys. Chem. 79, 2361-2381.

5.    Nemethy, G., Pottle, M.S. and Scheraga, H.A. (1983)  Energy parameters in polypeptides. 9. Updating of geometrical parameters, nonbonded interactions, and hydrogen bond interactions for the naturally occurring amino acids.  J. Phys. Chem. 87, 1883-1887.