Structure Refinement Using CNS Energy Minimization With Explicit Water

From NESG Wiki
Revision as of 20:37, 6 January 2010 by Jma (talk | contribs)
Jump to navigation Jump to search

Introduction

Crystallography & NMR System (CNS) is a program has been designed to provide a flexible multi-level hierachical approach for the most commonly used algorithms in macromolecular structure determination [1]. Functions include heavy atom searching, experimental phasing (including MAD and MIR), density modification, crystallographic refinement with maximum likelihood targets, and NMR structure calculation using NOEs, J-coupling, chemical shift, and dipolar coupling data.

The latest version is 1.21. Please check the website http://cns-online.org for additional information.

CNS Refinement Protocol

The final structure from CYANA should be refined using CNS energy minimization with explicit water before PDB deposition.

The common input files are:

  • PDB coordinates
  • NOE constraints
  • Dihedral angle constraints
  • Hydrogen bond constraints

Converting input files from CYANA to XPLOR/CNS

If the final structure calculation was performed with CYANA 2.1, then the constraint and coordinate files must be first converted to XPLOR/CNS format.

Using cyana2cns.cya script

This procedure can be used to prepare input files to use with the WaterRefCNS script (see below).

  1. Download cyana2cns.cya into directory with the final structure calculation.
  2. Modify the read pdb, read upl and read aco lines if necessary.
  3. Start CYANA 2.1 and run cyana2cns.cya.
  • All conformers will be stored in a single PDB file KKK.pdb. If you plan run water bath refinement manually, use the p2X program instead.
  • The resulting NOE constraint file KKK_noe.tbl will have all lower limits set to 0. If want them to be set according to VdW radii, use the r2X program instead. Experience shows that there is no significant effect on geometry and quality scores as reported by PSVS.

Using programs p2X and r2X and d2X

These programs were written by Alex Lemak at the University of Toronto. They are typically used to prepare input files for manual CNS water bath refinement.

Go to ~1_projects/targetID/structure/cns/ directory and conversion_scripts sub directory, convert required input files to CNS format from CYANA format by using macro mkfil

         p2X cyana2.pdb pref 
         r2X found-c.upl noeupl.tbl 
         r2X gr-hbonds.upl hbonds.tbl
         cat noeupl.tbl hbonds.tbl > noe.tbl 
         d2X found-c.aco aco.tbl 
  • p2X splits conformers from a pdb file generated by CYANA 2.1 into individual pdb files (one file per conformer) and converts atom names to CNS format. The second argument "pref" specifies a common prefix name for the output pdb files. It should be no more than 4 characters long.
  • r2X converts distance constraints (noe/hbond) from CYANA 2.1 format to CNS format. The lower limits in the resulting constraint file are set according to VdW radii.
    • Distance constraints obtained from CYANA macro caliba or calibration can be used as input files.
    • Distance constraints obtained from CYANA macro peak calibrate should have the pseudo-atom correction run before used as input files, run following commands under CYANA:
      read upl final.upl

distance correct write upl final_corrected.upl This, however, will not add corrections for multiplicity!

  • d2X, to run: d2X cyana2.aco cns.tbl This will convert angle constraints from CYANA format to CNS format. If you don't have any dihedral angle constrains, create an empty file.

Both p2X and r2X require a translation table file atomtransC.tbl.

If have used simplified pseudoatom names (H* instead of Q*) with option pseudo=2 in CYANA 2.1, you may first want to them before running p2X and r2X:

pseudo=0
 read upl noe.upl
 read pdb cyana2.pdb
 write upl out.upl
 write pdb out.pdb all

Note that pseudo=0 should be set before loading PDB and constraint files.

Conversion from DYANA

Use CYANA macro MigrateFromDyanaCyana1.cya to convert of data files from standard CYANA1.x or DYANA nomenclature to the standard IUPAC nomenclature used by CYANA, then do then same thing as above.

 translate dyana                         # use Cyana 1.x/Dyana nomenclature
 read seq demo-dyana.seq                 # read sequence from Cyana 1.x/Dyana
 read upl demo-dyana.upl unknown=warn    # read upper distance limits from Cyana 1.x/Dyana
 read aco demo-dyana.aco unknown=warn    # read angle restraints from Cyana 1.x/Dyana
 translate off                           # return to standard (IUPAC) nomenclature
 write demo-cyana.seq                    # save sequence in standard Cyana format
 write upl demo-cyana.upl                # save upper distance limits in standard Cyana format
 write aco demo-cyana.aco                # save angle restraints in standard Cyana format 


Running CNS using WaterRefCNS script at CABM

The WaterRefCNS script, developed at Rutgers University by Roberto Tejero, performs many of the above CNS water refinement protocols in an easy to use script with many user controled options.

To see a list of options type the command:

	WaterRefCNS

You will see a list of options and how the program is used.

     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        WaterRefCN -- a tool to launch structure refinement with water
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Usage: /farm/software/WaterRefinement_cns/WaterRefCNS -na NamProt -que [PBS|No] -pro [xpl|cns]
 
 ** Full set of options [not all are needed for a perfect run, use defaults]
 
   -na  NamProt      Name of the protein NOTE: this is mandatory, no default value 
   -qu  [PBS|NO]     Que system to use, one of PBS or NOQUE                def NOQUE
   -pr  [cns|xpl]    Protocol to use, X-POR, CNS                           def cns 
   -av  [center|sum] average for the distance restraints (for CNS, XPLOR)  def sum 
   -ci  n1,n2        CIS pep info (PRO res num) in comma separated list, i.e 56,89 
   -ss  n1-n2,n3-n4  S-S bridge info in comma separated, dash-separated pairs, i.e. 2-24,30-40 
   -he               Display this help message
   -hisd n1,n2       HISD pep info (HIS res num) in comma separated list, i.e 12,43
   -hise n1,n2       HISE pep info (HIS res num) in comma separated list, i.e 25,67
   -heat N           number of cycles in heating stage,  default is 200   
   -hot  N           number of cycles in HOT stage, default is 1000       
   -cool N           number of cycles for cooling stage,   default is 100 
   -sc  N            Scale all terms (noe, dihe, hbond) by xN times        def 1   
   -par string       Choice of nonbonded params, one of OPLSX|PARAM19|PARMALLH6|PROLSQ|CONTACT
                     default is OPLSX
   -seed <int>       Seed for random number generator,  default 31415     
 
 Examples:
      /farm/software/WaterRefinement_cns/WaterRefCNS -na SR358 -que no -pr cns -ci 22  

Given a protein name KKK, here are the input files required to run WaterRefCNS:

  •   pdb coordinates (in a single file) prepared in Xplor format using PdbStat.  The name of the file should be:  KKK.pdb
  •   an noe.tbl file in CNS format.  You can convert CYANA a final.upl file using PdbStat.  We typically add 10-15% to the upper limit and make the lower limit vdw.  The name of this file should be:  KKK_noe.tbl
  •   a dihe.tbl file in CNS format.  Again, you can convert a CYANA .aco file using PdbStat.  The name of this file should be:  KKK_dihe.tbl
  •   an hbond.tbl file in CNS format (optional).  The name of this file should be:  KKK_bhond.tbl


In order to run WaterRefCNS on the CABM-NMR cluster:

  1. Log into hummer or master3
  2. go to the directory with the above 4 files
  3. type:
	/farm/software/WaterRefinement_cns/WaterRefCNS -na KKK -que PBS

and that's it.  If you have some CIS residue you add -ci NUMRES to the options above. If you want to have better vdw violation, you can add -par PARAM19 to the options above.

Future work:

Currently, the WaterRefCNS script does not support refinement with RDC's. 


WaterRefCNS at SUNY Buffalo

The CABM water refinement package was customized to run on computers at UB:

  • Fixed a bug in WaterRefCNS so that the assembled refined PDB structure file has a .pdb extension.
  • PBS queue submission code in WaterRefCNS modified for use on U2 cluster
  • Added -hisd and -hise options to use neutral histidine isoprotomers. This required modification of WaterRefCNS, cns_refine_h2o.inp and generate_h2o.inp.
  • Fixed a bug in the protein topology file topallhdg5.3.pro. The line atom CD1 type=CR1E charge=0.130 end in the HISE patch section was removed.
  • Fixed a bug that caused the order of conformers to be changed in the output PDB file by replacing the line $WATREFLIB/Agrupa *.pdb > All_${Name}_cns.pdb with $WATREFLIB/Agrupa `ls resa_?.pdb ; ls resa_??.pdb` > All_${Name}_cns.pdb in the WaterRefCNS script.

The modified package is installed in /nsm/chem/cen2/HTP2/3_src/WaterRefinement_cns on the local server and in /san/projects1/szypersk/src/WaterRefinement_cns on the U2 Linux cluster. Make sure that this directories is included in your path on spins* workstations and U2, respectively.

For reference, the download location is here: WaterRefinement_UB.tar.Z

The required files are:

  • PDB coordinates (with atom names in XPLOR/CNS format)
  • NOE constraints (in XPLOR/CNS format)
  • Dihedral angle constraints (in XPLOR/CNS format)
  • Hydrogen bond constraints (in XPLOR/CNS format)

Assuming that your coordinate file is called KKK.pdb, the constraint files should be named KKK_noe.tbl, KKK_dihe.tbl and KKK_hbond.tbl. If you don't have dihedral angle and/or hydrogen bond constraints, then create the corresponding empty files.

You may need to add a line like nassign=1000 at the top of the constraint files. The value should be equal to or greater than the number of constraints in the file.

WaterRefCNS assumes "sum of r-6" averaging by default. When using "center"-averaged UPLs (e.g. from caliba) add -av center when starting WaterRefCNS. For information on averaging conventions for calibration in CYANA see NOE Calibration in CYANA

It is recommended to run it on U2 Linux cluster. Examples:

  1. WaterRefCNS -na KKK -que PBS
  2. WaterRefCNS -na KKK -que PBS -ci 21,49
    residues 21 and 49 are cis-Pro
  3. WaterRefCNS -na KKK -que PBS -hise 40,82 -hisd 32,65
    residues 40 and 82 are ε-protonated neutral His
    residues 32 and 65 are δ-protonated neutral His
  4. WaterRefCNS -na KKK -que PBS -av center 
    for use with center-averaged calibration

To run on spins* Linux workstations without queue system use -que NO option instead of -que PBS option. The default is no queue system, so you can omit the -que option altogether.

Running WaterRefCNS

  1. Convert the coordinate and constraint files as described above.
  2. Copy the converted files into a CNS working directory on the U2 cluster or a workstation (e.g. structure/cns/calc1).
  3. Run the WaterRefCNS with the proper arguments.
  4. If the calculation was successful, the refined coordinate file will be stored in the refinedPDB subdirectory. Check the BeSureToREADME for details.

Running CNS "manually"

Go to ~1_projects/targetID/structure/cns/ directory and cns_scripts sub directory, follow the procedures as describe below.

1.  Generating MTF topology file from a PDB file.
    modify the header of the input file generate_mtf.inp and then run cns < generate_mtf.inp > mtf.log.

  • Modify input files path and name base on individual target as follows:
	...
	{in}       pdb_file="/san/user/gliu2/u2/cns/cns/convertion_scripts/ufc1_1.pdb";
	{in}       param_file="/san/user/gliu2/u2/cns/cns/cns_scripts/parallhdg5.3C.pro";
	{in}       topol_file="/san/user/gliu2/u2/cns/cns/cns_scripts/topallhdg5.3.pro";
	{in}       plink_file="/san/user/gliu2/u2/cns/cns/cns_scripts/protein.link";
	{out}       struct_file="ufc1_1.mtf"; 
	...
  • Requires these files:
  • parallhdg5.3C.pro: CNS parameter file
  • topallhdg5.3.pro: CNS topology file
  • protein.link: CNS peptide linkage definition file
  • For cis-Proline, define the residue Id of the residue prior to cis-Proline by additional lines in generate_mtf.inp before the WRITE command
	patch cisp
             reference=nil=( resid 90 )
         end 
  • If you get an error message on the atom type, you also can try 
	patch cipp
             reference=nil=( resid 90 )
         end 
  • For neutral Nε- and Nδ-protonated histidines add these lines in generate_mtf.inp before the WRITE command
	patch hise
             reference=nil=( resid 40 )
         end
         patch hisd
             reference=nil=( resid 65 )
         end 
  • For dimer refinement, add a line containing only TER between two monomer unit coordinates in the PDB conformer that is defined in generate_mtf_cis.inp as input file.


2.  Rebuilding hydrogen atom positions for each structure.
    Modify input file rebuild.inp and then run cns < rebuild.inp > rebuild.log

  • Modify input files path and name base on induvidual target as follows:      
	...
	evaluate ($mtf_file="ufc1_1.mtf")
	evaluate ($pdbname_in="../convertion_scripts/ufc1")
	evaluate ($pdbname_out="ufc1_rb")
	evaluate ($number_of_struct= 20 )
	...
	evaluate ($topol_p_file="/san/user/gliu2/u2/cns/cns/cns_scripts/topallhdg5.3.pro")
	evaluate ($param_p_file="/san/user/gliu2/u2/cns/cns/cns_scripts/parallhdg5.3C.pro")
	...

3. Refining structures with explicit water.

  • Running CNS on workstation with single processor, modify input file re_h2oc.inp and then run cns < re_h2oc.inp > h2o.log
  • Modify input files path and name base on induvidual target as follows:
	...
	evaluate ($mtf_file="ufc1_1.mtf")
	evaluate ($noe_file="../noe.tbl")
	evaluate ($dihe_file="../aco.tbl")
	evaluate ($hb_file="../hbond.tbl")
	evaluate ($pdbname_in="ufc1_rb")
	evaluate ($pdbname_out="ufc1_ref")
	evaluate ($number_of_struct= 20 )
	...
  • Modify the weight of energy contribution part if necessary, e.g. use 2x to 10x of the following scales:
 	scale ambi 50
	scale dist 50
	scale hbond 50
	end
	restraints dihedral
	scale=200
	end

    1.  Run cns_convertion as descibed above        

    2.  Go to cns_scripts directory, modify input file generate_mtf_cis.inp and then run =cns < generate_mtf_cis.inp = as described above    

    3.  Run macro getfil

	cp ../convertion_scripts/noe.tbl .
	cp ../convertion_scripts/hbonds.tbl .
	cp ../convertion_scripts/aco.tbl .
	cp ufc1_1.mtf com/.

    4.  Modify input file rebuild.inp and then run cns < rebuild.inp as described above.

    5.  Modify input file re_h2oc.inp as described above, remember to set $number_of_struct = 20.

    6.  Run macro subaba

	#bash
 	for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
	 do
	 mkdir ref$i
	 cd ref$i
	 cp ../com/* .
	 cp ../jan24/ufc1_rb_$i.pdb ufc1_rb_1.pdb
	 qsub cns.sc
	 cd ..

	donedone

              where cns.sc is a PBS macro used to submit CNS job by using PBS:

	#!/bin/csh
	#PBS -m e
	#PBS -q short_c
	#PBS -l nodes=1:GM:ppn=1
	#PBS -l walltime=00:56:00
	#PBS -o out1
	#PBS -j oe
	#PBS -N clean1
	 cd $PBS_O_WORKDIR
	 echo "working directory = "$PBS_O_WORKDIR
 	 set NN = `cat $PBS_NODEFILE | wc -l`
	 echo "NN = "$NN
	 # Run Job
	 cns < re_h2oc.inp
	 echo "ALL Done!"

    7.  Run getpdb to collect refined structures.

	#bash
 	for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
 	do
 	cd ref$i
 	cp  ufc1_ref_1.pdb ../pdb/ufc1_ref_$i.pdb
	cp ufc1_ref_1.vio ../pdb/ufc1_ref_$i.vio
	cd ..
	done

    8. Run script Agrupa to put all 20 conformers into a single PDB file.

	 Agrupa ufc1_ref*.pdb > ufc1all.pdb

Running CNS for Proteins with Metal Ions

The CNS refinement for proteins with metal ions can also be performed with the new WaterrefCNS script, using the appropriate metal ion nomenclature in both the pdb coordinates and distance constraints.

For manual CNS refinement please do the refinement as following:

1.  Set environment for CNS1.1 by run source /farm/users/gliu/alias.cns

      alias cns1 /farm/software/cns/cns_solve_1.1/intel-i686-linux_g77/bin/cns
      setenv CNS_TOPPAR /farm/data/gliu/cns1/

In addition to the topology and parameter files, the metal ion parameter file ion.top is required. An example can be found in "/farm/users/gliu/projects/cns_cuttha_cis" with all required input files. 2.  Prepared required files as described above (final.tbl, final cns format PDB files and put in xplorPDB dir with name as sa_#.pdb) except the PDB file should include the metal ions with format according to CNS library ion.top. cp sa_1.pdb as  template.pdb, input files for creating mtf file Note that alignment is important. eg:

ATOM   1249  OT2 ALA    83      69.296  13.232   5.744  1.00  0.00
ATOM   1250 ZN+2 ZN2   150      63.086  13.789 -10.407  1.00  0.00      zinc

3.  Run generate_h2o.inp once to create temp_h2o.pdb and temp_h2o.mtf. The extra proton atom in the ligand residues, eg. HIS HD1 or CYS S, are removed by editing the generate_h2o.inp; cis proline is also defined here in the generate_h2o.inp (resid is the residue number prior the proline).

{* any special prosthetic group patches can be applied here *}
{===>}
delete select (name hg and resname cys and resid 61) end
delete select (name hg and resname cys and resid 85) end
delete select (name hd1 and resname his and resid 46) end
delete select (name he2 and resname his and resid 83) end
 patch cisp
           reference=1=( resid 13 )
 end

4.  Edit generate_1.inp to remove the extra proton as did above.

5.  Run generate_20.com, this will run generate_#.inp 20 times, updating each pdb number and this creates cnsPDB/sa_cns_#.pdb

6.  Edit and run re_h2o_cu.inp, the refined pdb is kept in refinedPDB, or

7.  Use subcns to submit cns refinement by using PBS: eg, type "sh subcns". Before run subcns , make a folder " com" contains the following file. Type getpdb to get refined pdb files in refinedPDB after it finished.

  • cns.sc: PBS submission
  • cutc_h2o.mtf: mtf file created as descrive above
  • topology and parameter files: parallhdg5.3C.pro, parallhdg5.3.pro, topallhdg5.3.pro
  • re_h2o_cu.inp: input file for cns refinement


Running CNS with RDC Constraints



Files for Download

  • Agrupa: script for concatenating pdb coordinate files into one file
  • cyana2cns.cya: CYANA script to convert coordinates and constrains to XPLOR/CNS format
  • zn.tar: Exampleof CNS refinment for protein with zn (from Alex Lemark)

References

1.  Brünger, A.T., Adams, P.D., Clore, G.M., DeLano, W.L., Gros, P., Grosse-Kunstleve, R.W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N.S., Read, R.J., Rice, L.M., Simonson, T. and Warren,G.L. (1998) Crystallography & NMR system: a new software suite for macromolecular structure determination. Acta Crystallogr. D54, 905-921.