PDB and BMRB Deposition

From NESG Wiki
Revision as of 16:09, 22 September 2010 by Alex (talk | contribs) (→‎Preparing files for BMRB depostion)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Introduction

In this section we describe the NESG SOP for NMR PDB/BMRB depositions, including preparation of files for deposition and creating a SPiNE NMR record.

The latter has been replaced by HarvestDB.  Also, as of Dec. 2008, deposition of NMR data and PDB coordinates is cinducted through the ADIT-NMR server.   


Preparing files for PDB depostion

Note: Truncated coordinates:  In the past (PSI-1), researchers deposited NMR Structures after removing disorder or not well defined regions. RPF analysis results based on truncated coordinates generally are poorer than results based on full length coordinates.  Therefore, the policy adopted throughout the NESG NMR labs is to deposit coordinates for all residues that have NMR assignments.

Files deposited

  • PDB coordinate file - required.
  • Constraint files used in the calculation - required.
    • NOE distance constraints.
    • Dihedral angle constraints (e.g. from TALOS).
    • Hydrogen bond constraints
    • RDC constraints

Please deposit constraint files that were used to generate the deposited coordinates in latest refined calculation cycle. For example, assuming that constrained refinement in explicit water bath using CNS was performed, the corresponding constraints in CNS format should be deposited.

Prior to deposition individual conformers should be superimposed to minimize backbone atom RMSD of the folded region. Also, since structure calculation programs such as CYANA or CNS/XPLOR utilize custom atom nomenclature, the PDB coordinate file has to be converted to conform to the RCSB nomenclatuere. The procedure below describes how this can be done with PDBStat and MAXIT.

Converting PDB file with PDBStat and MAXIT

Start PDBStat and enter the following commands:

  read coor pdb All_ZZZ_cns.pdb             #read file with concatenated CNS pdb files
  all                                       #select all the models
  classify                                  #classify the models by energy
  order 0.9                                 #determine ordered residues; phi/psi cut-off 0.9
  rmsd best backbone                        #backbone rmsd
  [return]                                  #creates an rmsd output file
  write coor pdb overlay.pdb                #write overlayed coordinates 

You can optionally choose the desired orientation for the resulting molecular bundle. Open the overlay.pdb in MOLMOL, find the desired orientation and save with File -> Write Transform..., for example, as rotation_matrix.mac. Start PDBStat again and enter the following commands:

  read coor pdb overlay.pdb                     #read file with concatenated CNS pdb files
  all                                       #select all the models
  rotate file rotation_matrix.mac           #apply rotation matrix
  write coor pdb ordered                    #write overlayed coordinates 

In a Unix shell run MAXIT to convert atom nomenclature to the PDB standard:

   maxit-v8.01-O -i ordered -o 52

The resulting ordered.pdb file only requires renaming oxygen atoms of C-terminal -COO groups using sed or a text editor.

   sed "s/O''/OXT/g" ordered.pdb > deposit.pdb

Precheck and Validation

In your web browser go to the RCSB validation server,

  1. Select NMR experimental method and upload your PDB file.
  2. Run Precheck. Make sure that there are no errors reported.
  3. Continue to Validation. Examine the Validation summary letter. Pay attention to
     a. Close contacts.
     b. Bond distances and angles.
     c. Torsion angles.
     d. Hydrogen nomenclature.
     e. Missing atoms. (It is OK to have missing labile hydrogens of Asp, Glu and neutral His side chains. If other 
        atom types are listed missing means that either the PDB file is incomplete or there is an issue with atom
        nomenclature).
     f. Extra atoms.

Preparing files for BMRB depostion

Files deposited:

  • Chemical shifts (BMRB file) - required.
  • NOESY peaklists - required.
  • Raw NMR Data - NOESY FIDs required. Due to its size usually uploaded as a single archive to the FTP server of BMRB after the deposition is submitted.
  1. Create a new directory (like structure/deposit/bmrb). Copy the following files from the last CYANA 2.1 manual structure calculation:
  • init.cya
  • XXXX.seq
  • XXXX.prot
  • Stereospecific assignment file (e.g. finalstereo.cya)

    Make sure you have added the following missing CG2 atoms of Val and CD2 of Leu, Phe and Tyr in residues, where they are degenerate with CG1 and CD1, respectively. The proton list XXXX.prot should have stereoassigned atoms swapped. The stereospecific assignment file is needed to properly set the ambiguity codes in the resulting BMRB file.
  1. Download the bmrb_dep.cya script and set the tolerances appropriate for you project (see below).
  2. Start CYANA 2.1 and run the bmrb_dep.cya script. It should produce a file named XXXX.bmrb.
  3. You may have to rename non-standard residues in XXXX.bmrb, such as HIST or HIS+ to HIS, and cPRO to PRO. Use any text editor.


Creating a BMRB file from CYANA


  read prot XXXX-final.prot                 # read the latest atom list
  finalstereo                               # stereospecific assignments (to set proper ambiguity codes)
  translate bmrb                            # use BMRB nomenclature
  pseudo=2                                  # use H* labels for pseudoatoms
  write bmrb XXXX.bmrb                      # write out the BMRB file  

Using any text editor rename all non-standard residues, such as HIST or HIS+ to HIS, and cPRO to PRO in the XXXX.bmrb file.

Creating an NMR structure record in SPINE

Using HarvestDB to create a record

NMR depositions will run through BMRB. There is no need to use PDB-ADIT, after using the BMRB-ADIT you will be given a BMRB and PDB id

      1.  Run PSVS (full length coordinates)

     a. http://www-nmr.cabm.rutgers.edu/PSVS/
     b. For optimal structures all Z-scores should be > -5

      2.  Run RPF (full length coordinates)

     a. http://www-nmr.cabm.rutgers.edu/PSVS/
     b. For optimal structures DPF >.7

      3.  Coordinates, Constraint Lists, and Chemical Shifts to BMRB (NMR ONLY)

     a. Go to http://deposit.bmrb.wisc.edu/bmrb-adit/ to initiate or update a deposition
     b. See SPINE target Record for Suggested authors
     c. PDB/BMRB Title should include "Northeast Structural Genomics Consortium Target XXXNN".
     d. After completion you will be given a BMRB and PDB id, BMRB will complete PDB deposition for you.

      4.  FIDS and NOESY Peak Lists to BMRB (NMR ONLY)

     a. http://www.bmrb.wisc.edu/
     b. Use ftp://ftp.bmrb.wisc.edu/ to complete anonymous ftp of your compressed data
     c. Name file nesg_bmrbaccession.tar.gz

Alternatively, raw fids can be tar'ed and ftp'ed to BMRB using the SPINS database.

      5.  Create NMR Record in SPINE

     a. Go to http://www.spine.nesg.org
     b. Tools -> Basic Search -> Enter Target ID
     c. Summary Page will appear -> click on your target
     d. Scroll to bottom of target Record and click corresponding Purification Batch 
     e. Scroll to bottom of Purification Record and click NMR 
     f. Complete web form
     g. Archive coordinates, structure factors, chemical shifts in SPINE
     h. Archive NOESY peak lists in SPINE

When Archiving a project in HarvestDB, be aware of the following:

  • HarvestDB/PDBstat cannot handle simplified pseudoatom nomenclature, that is atoms names like HB of alanine instead of QB, or HD1 of leucine instead of QD1. Such nomenclature is standard in CARA, and can be used in CYANA v2.x and later with pseudo=2 setting. Make sure you convert them into DYANA/CYANA or Xplor/CNS format before uploading.
  • HarvestDB currently uses AutoStructure v2.1.1 to calculate RPF scores. Thus, you should upload a control file compatible with AutoStructure v2.1.1, and prepare a combined 13C peaklist if you used separate 13Cali and 13Caro peaklists.

Creating a Record Manually

  1. On the spine web site find your protein target in the database.
  2. Go to the protein sample tube record (usually the NC or NC5 sample)
  3. At the bottom of the sample tube page there is a line "Create new structure record: (HSQC) (NMR) (Xray)". Click on "NMR" - you'll be asked for a user name and password.
  4. Fill in the fields and click on "Update Entry".


Using HarvestDB to Prepare PDB and BMRB Depositions (Under Construction):

NMR depositions will in the future run through HarvestDB. HarvestDB has the following major functions, A. Archive NMR files; B. Version tracking; C. PSVS analysis; D. Deposit to BMRB; E. Update SPiNE and Structure Gallery. (Main.DehuaHang)

      1.  Submit NMR Protein Structure Information and Files to HarvestDB to Create Protein Record

     a. Complete web form: NESG target id, Protein id, version id, Swissprot id, total number of structures, NMR
        comments.
     b. Complete web form: Coordinates, constraint lists, chemical shift, NOESY peak lists
     c. After completing web form, HarvestDB sends email user with the link of the new structure record  
     d. HarvestDB generates protein pictures: Small (80 by 80), Big static (300 by 300), Big  dynamic (300 by 300)
     e. HarvestDB pulls author list from SPiNE
     f. HarvestDB setups NMR id, construct id, batch id from input PST id
     g. Users can update structure information and NMR files through HarvestDB

      2.  Run PSVS, RPF Analysis through HarvestDB

     a. http://www-nmr.cabm.rutgers.edu/PSVS/
     b. HarvestDB sends Target id, Protein is, Coordinates, Constraint Lists to PSVS
     c. HarvestDB receives zipped PSVS report from PSVS, parse the zipped HTML file to get the z-scores, and send
        email to notify user
     d. For optimal structures all Z-scores should be > -5
     e. For optimal structures DPF >.7
     f. HarvestDB compares z-scores with previous NSEG structure quality by scatter plots

      3.  NMR Structure File Version Tracking

     a. HarvestDB duplicates current files and information to create newer version
     b. Update files after refinement, tracks date and notes fro each version

      4.  Prepare NMRStar File and Coordinate file (mmCIF) through HarvestDB

     a. HarvestDB pulls information from SPiNE and Swissprot site
     b. HarvestDB collects information about molecular entity sequence, contact authors, title, citation, molecule,
        synthetic, sample conditions, spectrometer, experiment
     c. HarvestDB generates NMRStar file and Coordinate file (by using pdb_extract)

      5.  HarvestDB Runs through BMRB to Initiate or Update Auto-deposition

     a. Send info: Submitter info, PI info
     b. Send files: Coordinates, Constraint Lists and NMRStar file
     c. HarvestDB receives BMRB and PDB id, deposition date, deposition status from BMRB
     d. For successful deposition: HarvestDB updates SPiNE to create NMR record, send notify email to user and PI 
     e. For error deposition: HarvestDB asks user to modify and re-deposit

      6.  HarvestDB Updates Structure Gallery

     a. http://nmr.cabm.rutgers.edu:9090/gallery/jsp/Gallery.jsp
     b. Fix Header: HarvestDB asks user to fix the Title, Protein Name (NO Hypothetical ) and Author list (Last Author / PI name) of coordinates 
     c. Fix Protein Pictures
     d. Send info: BMRB and PDB id
     e. Send files: Three Pictures, Coordinates, Constraints, NMRStar file, Zipped PSVS report
     f. Structure Gallery returns structure link to HarvestDB
     g. HarvestDB sends notify email to user and PI 


Scripts

patch2.sh

Unix shell script.

#!/bin/sh
maxit-v8.01-O -i ordered -o 52
sed "s/O''/OXT/g" ordered.pdb > deposit.pdb
  • Runs maxit to to correct atom name nomenclature.
  • Sets proper names for the terminal -COO groups.


patch.sh

Unix shell script.

#!/bin/sh
sed 's/ARG+/ARG /g; s/LYS+/LYS /g; s/HIS+/HIS /g; s/1HT/ H1/; s/2HT/ H2/; s/3HT/ H3/; s/OT1/O  /g; s/OT2/OXT/g' tmp.pdb > fit
maxit-v8.01-O -i fit -o 52
chainsub.py fit.pdb
sed "/SEQRES/d; s/1H / H1/g; s/2H / H2/g; s/3H / H3/g; s/O''/OXT/g" fit.chainsub.pdb > deposit.pdb
  • Removes the plus sign from ARG+, LYS+ and HIS+ residues
  • Runs maxit to to correct atom name nomenclature.
  • Runs chainsub.py to add chain identifiers.
  • Sets proper names for the terminal -NH3 and -COO groups.


pdbfit.mac

Macro for MOLMOL:

# Initialize
InitAll yes
# Replace with your input pdb file
ReadPdb All_STR_cns.pdb
# Select secondary structure elements
SelectAtom ':7-9,21-27,31-38,41-48@CA'
Fit to_first
# Remove pseudoatoms
SelectAtom '@Q*'
RemoveAtom
# Write PDB
WritePdb tmp.pdb
System "patch.sh"

When using this file:

  • Replace the input PDB file name for the ReadPdb command.
  • Set the residue range used to superimpose the bundle. Secondary structure elements from PSVS are a good choice.


bmrb_deposit.cya

CYANA 2.1 macro to generate a BMRB file for deposition.

tolerance:=0.02, 0.05, 0.4
read prot $name
stereofound
deposit bmrb=$name
  • The first value in the tolerance list the 1H chemical shift tolerance. The last value is the 13C/15N tolerance. The second value is ignored.
  • Here stereofound declares stereospecific assignments so the ambiguity codes appropriately in the resulting BMRB file.



  • pdbfit.mac: MOLMOL macro to superimpose and convert the PDB bundle
  • patch.sh: shell script making small modifications to PDB files
  • bmrb_dep.cya: CYANA script to prepare a BMRB file for deposition
  • patch2.sh: shell script to fix nomenclature of PDB files