PDB and BMRB Deposition: Difference between revisions

Revision as of 19:25, 3 November 2009

BMRB and PDB Structure Depositions

NESG SOP For NMR PDB/BMRB Structure Depositions (Dec, 2006)

Note: Truncated coordinates: Researchers used to deposit NMR Structures after removing disorder or not well defined regions. RPF analysis results based on truncated coordinates usually are poorer than results based on full length coordinates (DehuaHang)

Preparing files for PDB depostion

Files deposited:

PDB coordinate file - required.
Constraint files used in the calculation - recommended.
- NOE constraints (UPL file).
- ACO constraints (e.g. from TALOS).
- H-bond constraints

Constraint files can be taken from the latest manual refined structure calculation.

Here we assume that you are depositing the coordinate file after refinement in explicit water bath. You will need to convert the resulting PDB file to use proper PDB atom names. You can use either PDBstat or Molmol to superimpose your ensemble.

The procedure below requires that you install maxit from the RCSB web site.

Using PDBStat

Start pdbstat and use the following commands
read coord pdb All_KKK_cns.pdb
classify
order 0.9
rmsd best backbone
write coord pdb ordered
These commands will sort the conformers in the order of lowest to highest CNS energy, superimpose them automatically based on the ordered regions and report RMSD.
Run patch2.sh script. It use Maxit to fix atom nomenclature and runs sed to rename C-terminal to OXT.

Using MolMol

Copy the resulting file of the water bath refinement (e.g. All_KKK_cns.pdb) in a new directory, preferably structure/deposit/pdb.
Download the chainsub.py script from RCSB web site.
Download the patch.sh script (see below).
Download the pdbfit.mac macro for MOLMOL (see below). Modify it to specify the input PDB file and the residue range of secondary structure elements used to superimpose the structure (consult the output of PSVS).
Start MOLMOL and execute the pdbfit.mac. Ignore the warning about incorrect atoms. It should produce a deposit.pdb file.

Precheck and Validation

In your web browser go to the RCSB validation server,

Select NMR experimental method and upload your PDB file.
Run Precheck. Make sure that there are no errors reported.
Continue to Validation. Examine the Validation summary letter. Pay attention to

     a. Close contacts.
     b. Bond distances and angles.
     c. Torsion angles.
     d. Hydrogen nomenclature.
     e. Missing atoms. (It is OK to have missing labile hydrogens of Asp, Glu and neutral His side chains. If other 
        atom types are listed missing means that either the PDB file is incomplete or there is an issue with atom
        nomenclature).
     f. Extra atoms.

Preparing files for BMRB depostion

Files deposited:

Chemical shifts (BMRB file) - required.
NOESY peaklists - strongly recommended.
Raw NMR Data - strongly recommended. Due to its size usually uploaded as a single archive to the FTP server of BMRB after the deposition is submitted.

Create a new directory (like structure/deposit/bmrb). Copy the following files from the last CYANA 2.1 manual structure calculation:

- init.cya
- XXXX.seq
- XXXX.prot
- Stereospecific assignment file (e.g. finalstereo.cya)
  
  Make sure you have added the following missing CG2 atoms of Val and CD2 of Leu, Phe and Tyr in residues, where they are degenerate with CG1 and CD1, respectively. The proton list XXXX.prot should have stereoassigned atoms swapped. The stereospecific assignment file is needed to properly set the ambiguity codes in the resulting BMRB file.

Download the bmrb_dep.cya script and set the tolerances appropriate for you project (see below).
Start CYANA 2.1 and run the bmrb_dep.cya script. It should produce a file named XXXX.bmrb.
You may have to rename non-standard residues in XXXX.bmrb, such as HIST or HIS+ to HIS, and cPRO to PRO. Use any text editor.

Creating an NMR structure record in SPINE

Using HarvestDB to create a record

NMR depositions will run through BMRB. There is no need to use PDB-ADIT, after using the BMRB-ADIT you will be given a BMRB and PDB id

Run PSVS (truncated coordinates)

     a. http://www-nmr.cabm.rutgers.edu/PSVS/
     b. For optimal structures all Z-scores should be > -5

Run RPF (full length coordinates)

     a. http://www-nmr.cabm.rutgers.edu/PSVS/
     b. For optimal structures DPF >.7

Coordinates, Constraint Lists, and Chemical Shifts to BMRB (NMR ONLY)

     a. Go to http://deposit.bmrb.wisc.edu/bmrb-adit/ to initiate or update a deposition
     b. See SPINE target Record for Suggested authors
     c. PDB/BMRB Title should include "Northeast Structural Genomics Consortium Target XXXNN".
     d. After completion you will be given a BMRB and PDB id, BMRB will complete PDB deposition for you.

FIDS and NOESY Peak Lists to BMRB (NMR ONLY)

     a. http://www.bmrb.wisc.edu/
     b. Use ftp://ftp.bmrb.wisc.edu/ to complete anonymous ftp of your compressed data
     c. Name file nesg_bmrbaccession.tar.gz

Create NMR Record in SPINE

     a. Go to http://www.spine.nesg.org
     b. Tools -> Basic Search -> Enter Target ID
     c. Summary Page will appear -> click on your target
     d. Scroll to bottom of target Record and click corresponding Purification Batch 
     e. Scroll to bottom of Purification Record and click NMR 
     f. Complete web form
     g. Archive coordinates, structure factors, chemical shifts in SPINE
     h. Archive NOESY peak lists in SPINE

When Archiving a project in HarvestDB, be aware of the following:

HarvestDB/PDBstat cannot handle simplified pseudoatom nomenclature, that is atoms names like HB of alanine instead of QB, or HD1 of leucine instead of QD1. Such nomenclature is standard in CARA, and can be used in CYANA v2.x and later with pseudo=2 setting. Make sure you convert them into DYANA/CYANA or Xplor/CNS format before uploading.
HarvestDB currently uses AutoStructure v2.1.1 to calculate RPF scores. Thus, you should upload a control file compatible with AutoStructure v2.1.1, and prepare a combined 13C peaklist if you used separate 13Cali and 13Caro peaklists.

Creating a Record Manually

On the spine web site find your protein target in the database.
Go to the protein sample tube record (usually the NC or NC5 sample)
At the bottom of the sample tube page there is a line "Create new structure record: (HSQC) (NMR) (Xray)". Click on "NMR" - you'll be asked for a user name and password.
Fill in the fields and click on "Update Entry".

Using HarvestDB to Prepare PDB and BMRB Depositions (Under Construction):

NMR depositions will run through HarvestDB. HarvestDB has the following major functions, A. Archive NMR files; B. Version tracking; C. PSVS analysis; D. Deposit to BMRB; E. Update SPiNE and Structure Gallery. (Main.DehuaHang)

Submit NMR Protein Structure Information and Files to HarvestDB to Create Protein Record

     a. Complete web form: NESG target id, Protein id, version id, Swissprot id, total number of structures, NMR
        comments.
     b. Complete web form: Coordinates, constraint lists, chemical shift, NOESY peak lists
     c. After completing web form, HarvestDB sends email user with the link of the new structure record  
     d. HarvestDB generates protein pictures: Small (80 by 80), Big static (300 by 300), Big  dynamic (300 by 300)
     e. HarvestDB pulls author list from SPiNE
     f. HarvestDB setups NMR id, construct id, batch id from input PST id
     g. Users can update structure information and NMR files through HarvestDB

Run PSVS, RPF Analysis through HarvestDB

     a. http://www-nmr.cabm.rutgers.edu/PSVS/
     b. HarvestDB sends Target id, Protein is, Coordinates, Constraint Lists to PSVS
     c. HarvestDB receives zipped PSVS report from PSVS, parse the zipped HTML file to get the z-scores, and send
        email to notify user
     d. For optimal structures all Z-scores should be > -5
     e. For optimal structures DPF >.7
     f. HarvestDB compares z-scores with previous NSEG structure quality by scatter plots

NMR Structure File Version Tracking

     a. HarvestDB duplicates current files and information to create newer version
     b. Update files after refinement, tracks date and notes fro each version

Prepare NMRStar File and Coordinate file (mmCIF) through HarvestDB

     a. HarvestDB pulls information from SPiNE and Swissprot site
     b. HarvestDB collects information about molecular entity sequence, contact authors, title, citation, molecule,
        synthetic, sample conditions, spectrometer, experiment
     c. HarvestDB generates NMRStar file and Coordinate file (by using pdb_extract)

HarvestDB Runs through BMRB to Initiate or Update Auto-deposition

     a. Send info: Submitter info, PI info
     b. Send files: Coordinates, Constraint Lists and NMRStar file
     c. HarvestDB receives BMRB and PDB id, deposition date, deposition status from BMRB
     d. For successful deposition: HarvestDB updates SPiNE to create NMR record, send notify email to user and PI 
     e. For error deposition: HarvestDB asks user to modify and re-deposit

HarvestDB Updates Structure Gallery

     a. http://nmr.cabm.rutgers.edu:9090/gallery/jsp/Gallery.jsp
     b. Fix Header: HarvestDB asks user to fix the Title, Protein Name (NO Hypothetical ) and Author list (Last Author / PI name) of coordinates 
     c. Fix Protein Pictures
     d. Send info: BMRB and PDB id
     e. Send files: Three Pictures, Coordinates, Constraints, NMRStar file, Zipped PSVS report
     f. Structure Gallery returns structure link to HarvestDB
     g. HarvestDB sends notify email to user and PI

Scripts

patch2.sh

Unix shell script.

#!/bin/sh maxit-v8.01-O -i ordered -o 52 sed "s/O''/OXT/g" ordered.pdb > deposit.pdb

Runs maxit to to correct atom name nomenclature.
Sets proper names for the terminal -COO groups.

patch.sh

Unix shell script.

#!/bin/sh sed 's/ARG+/ARG /g; s/LYS+/LYS /g; s/HIS+/HIS /g; s/1HT/ H1/; s/2HT/ H2/; s/3HT/ H3/; s/OT1/O /g; s/OT2/OXT/g' tmp.pdb > fit maxit-v8.01-O -i fit -o 52 chainsub.py fit.pdb sed "/SEQRES/d; s/1H / H1/g; s/2H / H2/g; s/3H / H3/g; s/O''/OXT/g" fit.chainsub.pdb > deposit.pdb

Removes the plus sign from ARG+, LYS+ and HIS+ residues
Runs maxit to to correct atom name nomenclature.
Runs chainsub.py to add chain identifiers.
Sets proper names for the terminal -NH3 and -COO groups.

pdbfit.mac

Macro for MOLMOL:

# Initialize InitAll yes # Replace with your input pdb file ReadPdb All_STR_cns.pdb # Select secondary structure elements SelectAtom ':7-9,21-27,31-38,41-48@CA' Fit to_first # Remove pseudoatoms SelectAtom '@Q*' RemoveAtom # Write PDB WritePdb tmp.pdb System "patch.sh"

When using this file:

Replace the input PDB file name for the ReadPdb command.
Set the residue range used to superimpose the bundle. Secondary structure elements from PSVS are a good choice.

bmrb_deposit.cya

CYANA 2.1 macro to generate a BMRB file for deposition.

tolerance:=0.02, 0.05, 0.4 read prot $name stereofound deposit bmrb=$name

The first value in the tolerance list the 1H chemical shift tolerance. The last value is the 13C/15N tolerance. The second value is ignored.
Here stereofound declares stereospecific assignments so the ambiguity codes appropriately in the resulting BMRB file.

-- Main.GaohuaLiu - 17 Feb 2007

pdbfit.mac: MOLMOL macro to superimpose and convert the PDB bundle

patch.sh: shell script making small modifications to PDB files

bmrb_dep.cya: CYANA script to prepare a BMRB file for deposition

patch2.sh: shell script to fix nomenclature of PDB files