Automated NOESY Assignment Using CYANA

From NESG Wiki
Jump to navigation Jump to search

Introduction

Below is the description of how to run CYANA 2.1 for automated NOE assignment if you are working with CARA.  A tutorial for performing structure calculations with automated NOESY assignments using CYANA 3.0 is available on-line.

Input files

Required files

  • Initialization file init.cya.
  • SequenceList in XEASY format - usually XXXX.seq, where XXXX is the NESG ID.
  • AtomList in XEASY format XXXX.prot . Chemical shifts should be real, not folded. Make sure that you are using the most recent file. Atom labels should be swapped if using stereospecific assignments.
  • Separate unfolded PeakList for 15N and 13C NOESY: n.peaks, ali.peaks, aro.peaks.

Optional files

  • Stereospecific assignment script (such as stereofound.cya from FOUND/HABAS). Note that this script should contain only atom stereo declarations, but no atom swap statements! Atom labels must be already swapped in the AtomList and external UPL files.
  • External UPL files, such as short.upl. Atom labels should be swapped if using stereospecific assignments.
  • External ACO files, such as gridsearch.aco output of FOUND/HABAS.

Format Conversion

The input files (sequence, atom list, ACOs and UPLs) must adhere to the IUPAC nomenclature used by CYANA 2.1 (i.e., H instead of HN, etc.). CARA is fully compatible with this nomenclature, while data from other programs may need to be converted.

Conversion from XEASY/DYANA/CYANA 1.X

For the automated / noesyassign runs of CYANA, please make sure that your chemical shift list conforms to the IUPAC nomenclature (i.e., H instead of HN). To update your atom names, do the following in CYANA:

translate dyana
read protein.prot
translate off
write protein-cyana.prot

The protein-cyana.prot file now contains all of the correct atom names for CYANA.

You may need to do the same with UPLs created in DYANA or CYANA 1.X

See the ~/demo/details/MigrateFromDyanaCyana1.cya example script in the CYANA 2.1 installation directory for details.

Conversion from Sparky

CYANA can also read BMRB format chemical shift by using following commands:

...
read bmrb protein.bmrb
write prot protein.prot 

For Sparky users, please use Sparky command xe to write out XEASY format peaklists.

Splitting the simultaneous NOESY peaklist

When working with CARA it is not necessary to provide external ACO and UPL files. In CARA spin assignments are not derived from peak lists, and there is less impact from CYANA modifying existing peaks assignments. When external constraints are employed there are usually fewer peaks assigned and fewer UPLs derived. Thus it is recommended to use external UPL and ACO files only if there are convergence problems without them.

When using a simultaneous 3D NOESY peaklsit XEASY, you need to generate separate peaklists with UBNMR. The following UBNMR macro is provided as an example. It calculates proper 15N chemical shifts and peak positions, and writes out separate nnoe.peaks and cnoe.peaks peaklists. Modify the numbers to reflect the proper 15N and 13C carrier offsets (in ppm) and the spectral width ratios (sw2/sw2N).

init
read seq xxx.seq
write seq xxxseq.bmrb autoBMRB
read prot xxx-simnoesy.prot
read peaks xxx-simnoesy.peaks
update peak shift N -35.700 1.0822510
update peak shift N 117.273 1
write peaks ncnoe.peaks
split ncnoe.peaks nnoe.peaks cnoe.peaks
update proton shift N -35.700 1.0822510
update proton shift ND2 -35.700 1.0822510
update proton shift NE -35.700 1.0822510
update proton shift NE2 -35.700 1.0822510
update proton shift NE1 -35.700 1.0822510
update proton shift N 117.273 1
update proton shift ND2 117.273 1
update proton shift NE 117.273 1
update proton shift NE2 117.273 1
update proton shift NE1 117.273 1
write prot noe.prot

External UPL Files

noeassign employs so-called "sum of r-6" averaging method (peaks calibrate) to calibrate peaklists and interpret UPLs during calculation. Therefore, external UPLs should ideally be calibrated with the same method.

If you supply UPL constraints created with CALIBA (CALIBA uses "center" averaging), you should be aware that these constraints will be too loose.

Using Unassigned Peaklists

If you are using completely unassigned peaklist (for example, picke from scratch in CARA), then you will need to add the following line to the peaklist header:

#CYANAFORMAT HNh

or

#CYANAFORMAT HCh

The lowercase h denotes the indirect (NOE) 1H dimension.

If your peaklist contains assigned peaks, then CYANA will be able to determine the peaklist dimensions based on these assignments.

Running Automated Structure Calculation with CYANA 2.1

  1. Create a working subdirectory (for example, structure/cyana21/calc1).
  2. Create an init.cya file as described in Getting Started or copy a previously used file. Set an appropriate RMSD calculation range.
  3. Copy the latest sequence (XXXX.seq) and peaklist files (n.peaks, ali.peaks and aro.peaks) into the working directory. The sequence file and peaklist should in principle be the same as those used to run FOUND.
  4. Copy the updated atomlist (XXXX.prot). The spin labels in it should be swapped according to the output of FOUND.
  5. If you used FOUND, then copy the gridsearch.aco file from the previous FOUND run.
  6. If you used FOUND, then copy the stereofound.cya file from the previous FOUND run. Make sure that incorrect stereospecific assignments have been commented out or removed.
  7. (Optional) Generate the short-range UPL (short.upl) file based on the existing peak assignments. This is more convenient to do on a workstation. You can use the make_short.cya script (see below). Alternatively, you can define a KEEP subroutine in the CALC.cya file.
  8. Download the CALC.cya script (see below) and modify it according to the input data.

You can choose whether you want to run structure calculation on a local Linux workstation or on the U2 Linux cluster. The typical machine times on a single workstation are 1.5 - 3 hours, depending on the protein size. Calculations on the cluster take only 15-30 minutes, but there my be additional queue waiting time. On weekdays during working hours (9 a.m. - 4 p.m.) there are 10 dual-processor nodes reserved for us only, and there is no waiting time.

Check the queue status page and the nodemap page to see the current system loads on U2.

To run calculations on the U2 Linux cluster:

  1. Log in to u2.ccr.buffalo.edu
  2. Change directory to /san/projects1/szypersk/.
  3. Create a working subdirectory (like username/XXXX/cyana21)
  4. Copy the entire subdirectory calc1. You can use gftp, scp or sftp.
  5. Download the PBS submission script cyana.pbs (see below). Modify it if needed.
  6. Type qsub cyana.pbs to submit you job.

To run calculations on a workstation:

  1. Start CYANA 2.1 by typing cyana21
  2. Enter CALC at the cyana prompt.


Output files

  • final.pdb - resulting structure
  • final.ovw - final overview file
  • final.upl - final UPL file (unambiguous constraints; atom labels may be swapped)
  • *-final.prot - final atom list (chemical shifts unchanged?; atom labels may be swapped)
  • finalstereo.cya - stereospecific assignment file (to find swapped atom pairs see calculation log)
  • *-cycle7.peaks - assigned peaklists (in CYANA 2.1 format with multiple assignments)
  • cycleX.* - UPL, OVW, PDB and NOA files for cycle X (ambiguous constraints in UPL files)

Macro noeassign in CYANA 2.1 performs 7 routine calculation cycles and one final cycle. The output files are labeled cycle1.*, cycle2.* ... cycle7.* and final.* with appropriate extensions. Additional stereospecific assignment search is performed after cycle 7, therefore files, final.upl and *-final.prot likely have some labels swapped.

Assigned peak lists are saved after cycle 7. They may have multiple assignments for some peaks thus not being fully compatible with XEASY.

Always check the output of CYANA calculation for the results of peakcheck command. It is executed before the first calculation cycle and reports various inconsistencies in the atom list and peak lists. In the end, many UPL violations can be traced back to mistakes in assignment or mis-picked peaks.

Example scripts

Below are the key scripts for running CYANA. See the demo subdirectory of CYANA installation for more details.

make_short.cya

peaks      := n,ali,aro              # names of peak lists
prot       := $name                  # names of proton lists
tolerance  := 0.05,0.02,0.3          # chemical shift tolerances
                                     # order: 1H(a), 1H(b), 13C/15N(b), 13C/15N(a)
calibration:= 1.7E6,1.7E6,1.7E6      # calibration constants (will be determined
                                     # automatically, if commented out)
dref       := 4.2                    # average upper distance limit for
                                     # automatic calibration
peakcheck peaks=$peaks prot=$prot
calibration prot=$prot peaks=$peaks constant=$calibration dref=$dref
peaks calibrate "**" simple
write upl short.upl


For the calibration parameter you can provide the list of calibration constants you have derived for the "backbone" class with caliba, when you calibrated the initial peak lists for use with FOUND/HABAS. Do not comment or delete this line, leave it blank if you want automatic calibration. Automatic calibration uses the dref parameter as the presumed average distance for all peaks in a peaklist (not just for backbone, like caliba).


CALC.cya

peaks       := n,ali,aro                # names of NOESY peak lists
prot        := $name                    # names of chemical shift lists
constraints := gridsearch.aco,short.upl,stereofound.cya            # additional (non-NOE) constraints
tolerance   := 0.05,0.02,0.4            # chemical shift tolerances
                                        # order: 1H(a), 1H(b), 13C/15N(b), 13C/15N(a)
#upl_values  := 2.4,6.0                  # calibration cutoffs
calibration := 1.7E6,1.7E6,1.7E6        # NOE calibration parameters
structures  := 100,20                   # number of initial, final structures
steps       := 10000                    # number of torsion angle dynamics steps
rmsdrange   := 10..100                  # residue range for RMSD calculation
randomseed  := 434726                   # random number generator seed
dref        := 4.0                      # average distance for calibration, default 4.0
keep        :=                          # set to KEEP to retain existing assignments
 
subroutine KEEP
   peaks select "*,* number=20000..37999"
end
 
#protocol := noeassign.out              # output logging on
noeassign peaks=$peaks prot=$prot calibration=$calibration keep=$keep autoaco
#protocol :=


Parameter constraints can be a comma-separated list of all kinds of external constraints, which can be read by read data command in CYANA. You can have UPLs, ACOs and even .cya scripts, for example, defining stereospecific assignments of methyl groups. Do not comment this line, leave it blank if you are not providing external constraints.

If you a providing stereospecific assignments, do not use atom swap in the stereofound.cya script. Use atom list and short.upl with all required labels swapped, then the stereo.cya should only contain atom stereo declarations.

For the tolerance parameter pay attention to the unintuitive dimension order. The recommended tolerances are: 0.03 ppm or less for 1H (0.02 ppm or less for 2D homonuclear peaklists) and 0.6 ppm or less for 15N and 13C.

Lower and upper limit cutoffs can be changed by applying upl_values. The default values are 2.4 and 5.5 A, respectively.

For the calibration parameter you can provide the list of calibration constants you have derived for the "backbone" class with caliba, when you calibrated the initial peak lists for use with FOUND/HABAS. Do not comment or delete this line, leave it blank if you want automatic calibration. Automatic calibration uses the dref parameter as the presumed average distance for all peaks in a peaklist (not just for backbone, like caliba). Having initial calibration too tight is less of an issue with noeassign, because by default it "elastically" relaxes constrains, which are consistently violated.

Use the protocol keywords to enable output logging when running CYANA on a workstation. They may not be necessary on a cluster, because the queue system generates its own log.

Use the subroutine KEEP to keep assignment for peaks that you are confident, which is helpful if you peak list contains simulated peaks for short range NOEs.

cyana.pbs - PBS queue submission script

#!/bin/csh
#!/bin/csh
#PBS -m abe
#PBS -M yourname@domain
#PBS -q short_c
#PBS -l nodes=5:ppn=2
#PBS -l walltime=02:00:00
#PBS -o cyana.out
#PBS -j oe
#PBS -N cyana
#
cd $PBS_O_WORKDIR
echo "working directory = "$PBS_O_WORKDIR
set NN = `cat $PBS_NODEFILE | wc -l`
echo "NN = "$NN
module load mpich/intel-9/ch_p4/current
module load cyana/2.1-p4
limit stacksize unlimited
limit coredumpsize 0
source $MODULESHOME/init/tcsh
cat $PBS_NODEFILE | awk '{printf "%s.ccr.buffalo.edu\n",$1}' > tmp.$$
cyana -c '/util/mpich/1.2.7p1/intel-9/ch_p4/bin/mpiexec ' ./CALC
#
echo "ALL Done!"


The #PBS lines pass option to the PBS queue system. See this page for details

The following options are important:

#PBS -m abe tell PBS queue system to send e-mail alerts when calculation starts (b), aborts (a) or terminates successfully (e).

  • Enter you e-mail address in #PBS -M myname@mydomain. Without this line e-mail alerts will go into local mailbox.
  • The line #PBS -l nodes=5:ppn=2 means that we are using five dual-processor nodes and get 10-fold parallelization during simulated annealing. It doesn't make much sense to request more than 5 nodes: first, the relative gain in speed drops since NOE assignment step cannot be parallelized; second, the queue wait time may be longer when more nodes are requested.
  • #PBS -q short_c submits the job to the short_c queue. This queue is dedicated to short jobs and has higher priority. Members of Szyperski's lab have 10 nodes reserved for this queue every weekday 9 a.m. - 4 p.m.
  • #PBS -l walltime=02:00:00 defines maximum allocated job execution time. The limit for the shorts_c queue is 2 hours, but even the most demanding CYANA job finish in less than one hour.



  • CALC.cya: CYANA 2.1 automated structure calculation script
  • cyana.pbs: PBS queue submission script for CYANA 2.1 on U2 cluster
  • make_short.cya: CYANA script to run manual calculation with local constraints