PDB and BMRB Deposition: Difference between revisions

From NESG Wiki
Jump to navigation Jump to search
No edit summary
 
(32 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== '''BMRB and PDB Structure Depositions'''  ==
== '''Introduction'''  ==


'''NESG SOP For NMR PDB/BMRB Structure Depositions (Dec, 2006)'''
In this section we describe the NESG SOP for NMR PDB/BMRB depositions, including preparation of files for deposition and creating a SPiNE NMR record.


''Note: Truncated coordinates: Researchers used to deposit NMR Structures after removing disorder or not well defined regions. RPF analysis results based on truncated coordinates usually are poorer than results based on full length coordinates'' (DehuaHang)
The latter has been replaced by [[HarvestDB|HarvestDB]].  Also, as of Dec. 2008, deposition of NMR data and PDB coordinates is cinducted through the [http://deposit.bmrb.wisc.edu/bmrb-adit/ ADIT-NMR server].   


=== '''Preparing files for PDB depostion'''  ===
''<br>''  


Files deposited:
== '''Preparing files for PDB depostion'''  ==
 
''Note: Truncated coordinates:&nbsp; In the past (PSI-1), researchers deposited NMR Structures after removing disorder or not well defined regions. RPF analysis results based on truncated coordinates generally are poorer than results based on full length coordinates.&nbsp; Therefore, the policy adopted throughout the NESG&nbsp;NMR labs is to deposit coordinates for all residues that have NMR assignments.''
 
=== Files deposited ===


*PDB coordinate file - '''required'''.  
*PDB coordinate file - '''required'''.  
*Constraint files used in the calculation - '''recommended'''.  
*Constraint files used in the calculation - '''required'''.  
**NOE constraints (UPL file).  
**NOE distance constraints.  
**ACO constraints (e.g. from TALOS).  
**Dihedral angle constraints (e.g. from TALOS).  
**H-bond constraints
**Hydrogen bond constraints
**RDC constraints


Constraint files can be taken from the latest manual refined structure calculation.  
Please deposit constraint files that were used to generate the deposited coordinates in latest refined calculation cycle. For example, assuming that constrained refinement in explicit water bath using CNS was performed, the corresponding constraints in CNS format should be deposited.


Here we assume that you are depositing the coordinate file after refinement in explicit water bath. You will need to convert the resulting PDB file to use proper PDB atom names. You can use either PDBstat or Molmol to superimpose your ensemble.  
Prior to deposition individual conformers should be superimposed to minimize backbone atom RMSD of the folded region. Also, since structure calculation programs such as CYANA or CNS/XPLOR utilize custom atom nomenclature, the PDB coordinate file has to be converted to conform to the RCSB nomenclatuere. The procedure below describes how this can be done with PDBStat and [http://sw-tools.pdb.org/apps/MAXIT/index.html MAXIT].


The procedure below requires that you install [http://sw-tools.pdb.org/apps/MAXIT/index.html maxit] from the RCSB web site.
=== '''Converting PDB file with PDBStat and MAXIT'''  ===


==== '''Using PDBStat'''  ====
Start PDBStat and enter the following commands:
<onlyinclude>
  read coor pdb All_ZZZ_cns.pdb            #read file with concatenated CNS pdb files
  all                                      #select all the models
  classify                                  #classify the models by energy
  order 0.9                                #determine ordered residues; phi/psi cut-off 0.9
  rmsd best backbone                        #backbone rmsd
  [return]                                  #creates an rmsd output file
  write coor pdb overlay.pdb                #write overlayed coordinates</onlyinclude>


#Start <tt>pdbstat</tt> and use the following commands<br> <tt>read coord pdb All_KKK_cns.pdb</tt> <br> <tt>classify</tt> <br> <tt>order 0.9</tt> <br> <tt>rmsd best backbone</tt> <br> <tt>write coord pdb ordered</tt> <br> These commands will sort the conformers in the order of lowest to highest [[NESG:CNS|CNS]] energy, superimpose them automatically based on the ordered regions and report RMSD.  
You can optionally choose the desired orientation for the resulting molecular bundle. Open the <tt>overlay.pdb</tt> in MOLMOL, find the desired orientation and save with '''File''' -> '''Write Transform...''', for example, as <tt>rotation_matrix.mac</tt>. Start PDBStat again and enter the following commands:
#Run <tt>patch2.sh</tt> script. It use Maxit to fix atom nomenclature and runs <tt>sed</tt> to rename C-terminal <tt></tt>''to <tt>OXT</tt>.''
<onlyinclude>
  read coor pdb overlay.pdb                    #read file with concatenated CNS pdb files
  all                                      #select all the models
  rotate file rotation_matrix.mac          #apply rotation matrix
  write coor pdb ordered                    #write overlayed coordinates</onlyinclude>  


==== '''Using MolMol'''  ====
In a Unix shell run [http://sw-tools.pdb.org/apps/MAXIT/index.html MAXIT] to convert atom nomenclature to the PDB standard:
<pre>
  maxit-v8.01-O -i ordered -o 52</pre>


#Copy the resulting file of the water bath refinement (e.g. <tt>All_KKK_cns.pdb</tt>) in a new directory, preferably <tt>structure/deposit/pdb</tt>.  
The resulting <tt>ordered.pdb</tt> file only requires renaming oxygen atoms of C-terminal -COO groups using <tt>sed</tt> or a text editor.
#Download the [http://deposit.rcsb.org/pdbformat/chainsub.py chainsub.py] script from RCSB web site.
<pre>
#Download the [[NESG:%ATTACHURL%/patch.sh|patch.sh]] script (see below).
  sed "s/O''/OXT/g" ordered.pdb > deposit.pdb</pre>
#Download the [[NESG:%ATTACHURL%/pdbfit.mac|pdbfit.mac]] macro for MOLMOL (see below). Modify it to specify the input PDB file and the residue range of secondary structure elements used to superimpose the structure (consult the output of PSVS).
#Start MOLMOL and execute the <tt>pdbfit.mac</tt>. Ignore the warning about incorrect atoms. It should produce a <tt>deposit.pdb</tt> file.


==== '''Precheck and Validation'''  ====
=== '''Precheck and Validation'''  ===


In your web browser go to the [http://deposit.pdb.org/validate/ RCSB validation server],  
In your web browser go to the [http://deposit.pdb.org/validate/ RCSB validation server],  
Line 51: Line 68:
       f. Extra atoms.
       f. Extra atoms.


=== '''Preparing files for BMRB depostion'''  ===
== '''Preparing files for BMRB depostion'''  ==


Files deposited:  
Files deposited:  


*Chemical shifts (BMRB file) - '''required'''.  
*Chemical shifts (BMRB file) - '''required'''.  
*NOESY peaklists - '''strongly recommended'''.  
*NOESY peaklists - '''required'''.  
*Raw NMR Data - '''strongly recommended'''. Due to its size usually uploaded as a single archive to the FTP server of BMRB after the deposition is submitted.
*Raw NMR Data - '''NOESY FIDs required'''. Due to its size usually uploaded as a single archive to the FTP server of BMRB after the deposition is submitted.


#Create a new directory (like <tt>structure/deposit/bmrb</tt>). Copy the following files from the last CYANA 2.1 manual structure calculation:
#Create a new directory (like <tt>structure/deposit/bmrb</tt>). Copy the following files from the last CYANA 2.1 manual structure calculation:<br>


**<tt>init.cya</tt>  
*<span id="1257278398860S" style="display: none;">&nbsp;</span><tt>init.cya</tt>  
**<tt>XXXX.seq</tt>  
*<tt>XXXX.seq</tt>  
**<tt>XXXX.prot</tt>  
*<tt>XXXX.prot</tt>  
**Stereospecific assignment file (e.g. <tt>finalstereo.cya</tt>) <br> <br> Make sure you have added the following missing CG2 atoms of Val and CD2 of Leu, Phe and Tyr in residues, where they are degenerate with CG1 and CD1, respectively. The proton list <tt>XXXX.prot</tt> should have stereoassigned atoms swapped. The stereospecific assignment file is needed to properly set the ambiguity codes in the resulting BMRB file.
*Stereospecific assignment file (e.g. <tt>finalstereo.cya</tt>) <br> <span id="1257278398974E" style="display: none;">&nbsp;</span><br> Make sure you have added the following missing CG2 atoms of Val and CD2 of Leu, Phe and Tyr in residues, where they are degenerate with CG1 and CD1, respectively. The proton list <tt>XXXX.prot</tt> should have stereoassigned atoms swapped. The stereospecific assignment file is needed to properly set the ambiguity codes in the resulting BMRB file.


#Download the [[NESG:%ATTACHURL%/bmrb dep.cya|bmrb_dep.cya]] script and set the tolerances appropriate for you project (see below).  
#Download the [[Media:Bmrb_dep.cya|bmrb_dep.cya]] script and set the tolerances appropriate for you project (see below).<br>
#Start CYANA 2.1 and run the <tt>bmrb_dep.cya</tt> script. It should produce a file named <tt>XXXX.bmrb</tt>.  
#Start CYANA 2.1 and run the <tt>bmrb_dep.cya</tt> script. It should produce a file named <tt>XXXX.bmrb</tt>.<br>
#You may have to rename non-standard residues in <tt>XXXX.bmrb</tt>, such as <tt>HIST</tt> or <tt>HIS+</tt> to <tt>HIS</tt>, and <tt>cPRO</tt> to <tt>PRO</tt>. Use any text editor.
#You may have to rename non-standard residues in <tt>XXXX.bmrb</tt>, such as <tt>HIST</tt> or <tt>HIS+</tt> to <tt>HIS</tt>, and <tt>cPRO</tt> to <tt>PRO</tt>. Use any text editor.<br>


<br>  
<br>
 
Creating a BMRB file from CYANA
 
<onlyinclude>
  read prot XXXX-final.prot                # read the latest atom list
  finalstereo                              # stereospecific assignments (to set proper ambiguity codes)
  translate bmrb                            # use BMRB nomenclature
  pseudo=2                                  # use H* labels for pseudoatoms
  write bmrb XXXX.bmrb                      # write out the BMRB file  </onlyinclude>
 
Using any text editor rename all non-standard residues, such as <tt>HIST</tt> or <tt>HIS+</tt> to <tt>HIS</tt>, and <tt>cPRO</tt> to <tt>PRO</tt> in the <tt>XXXX.bmrb</tt> file.


=== '''Creating an NMR structure record in SPINE'''  ===
== '''Creating an NMR structure record in SPINE'''  ==


==== '''Using HarvestDB to create a record'''  ====
=== '''Using HarvestDB to create a record'''  ===


NMR depositions will run through BMRB. There is no need to use PDB-ADIT, after using the BMRB-ADIT you will be given a BMRB and PDB id  
NMR depositions will run through BMRB. There is no need to use PDB-ADIT, after using the BMRB-ADIT you will be given a BMRB and PDB id  


#'''Run PSVS (truncated coordinates)'''
'''&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1.&nbsp; Run PSVS (full length coordinates)'''  
 
       a. http://www-nmr.cabm.rutgers.edu/PSVS/
       a. http://www-nmr.cabm.rutgers.edu/PSVS/
       b. For optimal structures all Z-scores should be &gt; -5
       b. For optimal structures all Z-scores should be &gt; -5


#'''Run RPF (full length coordinates)'''
'''&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2.&nbsp; Run RPF (full length coordinates)'''  
 
       a. http://www-nmr.cabm.rutgers.edu/PSVS/
       a. http://www-nmr.cabm.rutgers.edu/PSVS/
       b. For optimal structures DPF &gt;.7
       b. For optimal structures DPF &gt;.7


#'''Coordinates, Constraint Lists, and Chemical Shifts to BMRB (NMR ONLY)'''
'''&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.&nbsp; Coordinates, Constraint Lists, and Chemical Shifts to BMRB (NMR ONLY)'''  
 
       a. Go to http://deposit.bmrb.wisc.edu/bmrb-adit/ to initiate or update a deposition
       a. Go to http://deposit.bmrb.wisc.edu/bmrb-adit/ to initiate or update a deposition
       b. See SPINE target Record for Suggested authors
       b. See SPINE target Record for Suggested authors
Line 92: Line 123:
       d. After completion you will be given a BMRB and PDB id, BMRB will complete PDB deposition for you.
       d. After completion you will be given a BMRB and PDB id, BMRB will complete PDB deposition for you.


#'''FIDS and NOESY Peak Lists to BMRB (NMR ONLY)'''
'''&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 4.&nbsp; FIDS and NOESY Peak Lists to BMRB (NMR ONLY)'''  
 
       a. http://www.bmrb.wisc.edu/
       a. http://www.bmrb.wisc.edu/
       b. Use ftp://ftp.bmrb.wisc.edu/ to complete anonymous ftp of your compressed data
       b. Use ftp://ftp.bmrb.wisc.edu/ to complete anonymous ftp of your compressed data
       c. Name file nesg_bmrbaccession.tar.gz
       c. Name file nesg_bmrbaccession.tar.gz


#'''Create NMR Record in SPINE'''
Alternatively, raw fids can be tar'ed and ftp'ed to BMRB&nbsp;using the SPINS database.
 
'''&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5.&nbsp; Create NMR Record in SPINE'''  
 
       a. Go to http://www.spine.nesg.org
       a. Go to http://www.spine.nesg.org
       b. Tools -&gt; Basic Search -&gt; Enter Target ID
       b. Tools -&gt; Basic Search -&gt; Enter Target ID
Line 110: Line 145:


*HarvestDB/PDBstat cannot handle simplified pseudoatom nomenclature, that is atoms names like HB of alanine instead of QB, or HD1 of leucine instead of QD1. Such nomenclature is standard in CARA, and can be used in CYANA v2.x and later with <tt>pseudo=2</tt> setting. Make sure you convert them into DYANA/CYANA or Xplor/CNS format before uploading.  
*HarvestDB/PDBstat cannot handle simplified pseudoatom nomenclature, that is atoms names like HB of alanine instead of QB, or HD1 of leucine instead of QD1. Such nomenclature is standard in CARA, and can be used in CYANA v2.x and later with <tt>pseudo=2</tt> setting. Make sure you convert them into DYANA/CYANA or Xplor/CNS format before uploading.  
*HarvestDB currently uses AutoStructure v2.1.1 to calculate RPF scores. Thus, you should upload a control file compatible with AutoStructure v2.1.1, and prepare a combined 13C peaklist if you used separate 13Cali and 13Caro peaklists.
*HarvestDB currently uses AutoStructure v2.1.1 to calculate RPF scores. Thus, you should upload a control file compatible with AutoStructure v2.1.1, and prepare a combined <sup>13</sup>C peaklist if you used separate <sup>13</sup>Cali and <sup>13</sup>Caro peaklists.


==== '''Creating a Record Manually'''  ====
=== '''Creating a Record Manually'''  ===


#On the [http://spine.nesg.org spine web site] find your protein target in the database.  
#On the [http://spine.nesg.org spine web site] find your protein target in the database.  
Line 119: Line 154:
#Fill in the fields and click on "Update Entry".
#Fill in the fields and click on "Update Entry".


=== '''Using HarvestDB to Prepare PDB and BMRB Depositions (Under Construction):'''  ===
<br>


NMR depositions will run through HarvestDB. HarvestDB has the following major functions, A. Archive NMR files; B. Version tracking; C. PSVS analysis; D. Deposit to BMRB; E. Update SPiNE and Structure Gallery. (Main.DehuaHang)  
== '''Using HarvestDB to Prepare PDB and BMRB Depositions (Under Construction):'''  ==


#'''Submit NMR Protein Structure Information and Files to HarvestDB to Create Protein Record'''
NMR depositions will in the future run through HarvestDB. HarvestDB has the following major functions, A. Archive NMR files; B. Version tracking; C. PSVS analysis; D. Deposit to BMRB; E. Update SPiNE and Structure Gallery. (Main.DehuaHang)


      a. Complete web form: NESG target id, Protein id, version id, Swissprot id, total number of structures, NMR comments.
'''&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1.&nbsp; Submit NMR Protein Structure Information and Files to HarvestDB to Create Protein Record'''
      a. Complete web form: Coordinates, constraint lists, chemical shift, NOESY peak lists
      a. After completing web form, HarvestDB sends email user with the link of the new structure record 
      a. HarvestDB generates protein pictures: Small (80 by 80), Big static (300 by 300), Big  dynamic (300 by 300)
      a. HarvestDB pulls author list from SPiNE
      a. HarvestDB setups NMR id, construct id, batch id from input PST id
      a. Users can update structure information and NMR files through HarvestDB


#'''Run PSVS, RPF Analysis through HarvestDB'''
      a. Complete web form: NESG target id, Protein id, version id, Swissprot id, total number of structures, NMR
        comments.
      b. Complete web form: Coordinates, constraint lists, chemical shift, NOESY peak lists
      c. After completing web form, HarvestDB sends email user with the link of the new structure record 
      d. HarvestDB generates protein pictures: Small (80 by 80), Big static (300 by 300), Big  dynamic (300 by 300)
      e. HarvestDB pulls author list from SPiNE
      f. HarvestDB setups NMR id, construct id, batch id from input PST id
      g. Users can update structure information and NMR files through HarvestDB
 
'''&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2.&nbsp; Run PSVS, RPF Analysis through HarvestDB'''  


       a. http://www-nmr.cabm.rutgers.edu/PSVS/
       a. http://www-nmr.cabm.rutgers.edu/PSVS/
       a. HarvestDB sends Target id, Protein is, Coordinates, Constraint Lists to PSVS
       b. HarvestDB sends Target id, Protein is, Coordinates, Constraint Lists to PSVS
       a. HarvestDB receives zipped PSVS report from PSVS, parse the zipped HTML file to get the z-scores, and send email to notify user
       c. HarvestDB receives zipped PSVS report from PSVS, parse the zipped HTML file to get the z-scores, and send
       a. For optimal structures all Z-scores should be &gt; -5
        email to notify user
       a. For optimal structures DPF &gt;.7
       d. For optimal structures all Z-scores should be &gt; -5
       a. HarvestDB compares z-scores with previous NSEG structure quality by scatter plots
       e. For optimal structures DPF &gt;.7
       f. HarvestDB compares z-scores with previous NSEG structure quality by scatter plots


#'''NMR Structure File Version Tracking'''
'''&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.&nbsp; NMR Structure File Version Tracking'''  


       a. HarvestDB duplicates current files and information to create newer version
       a. HarvestDB duplicates current files and information to create newer version
       a. Update files after refinement, tracks date and notes fro each version
       b. Update files after refinement, tracks date and notes fro each version


#'''Prepare NMRStar File and Coordinate file (mmCIF) through HarvestDB'''
'''&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 4.&nbsp; Prepare NMRStar File and Coordinate file (mmCIF) through HarvestDB'''  


       a. HarvestDB pulls information from SPiNE and Swissprot site
       a. HarvestDB pulls information from SPiNE and Swissprot site
       a. HarvestDB collects information about molecular entity sequence, contact authors, title, citation, molecule, synthetic, sample conditions, spectrometer, experiment
       b. HarvestDB collects information about molecular entity sequence, contact authors, title, citation, molecule,
       a. HarvestDB generates NMRStar file and Coordinate file (by using pdb_extract)
        synthetic, sample conditions, spectrometer, experiment
       c. HarvestDB generates NMRStar file and Coordinate file (by using pdb_extract)


#'''HarvestDB Runs through BMRB to Initiate or Update Auto-deposition'''
'''&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5.&nbsp; HarvestDB Runs through BMRB to Initiate or Update Auto-deposition'''  


       a. Send info: Submitter info, PI info
       a. Send info: Submitter info, PI info
       a. Send files: Coordinates, Constraint Lists and NMRStar file
       b. Send files: Coordinates, Constraint Lists and NMRStar file
       a. HarvestDB receives BMRB and PDB id, deposition date, deposition status from BMRB
       c. HarvestDB receives BMRB and PDB id, deposition date, deposition status from BMRB
       a. For successful deposition: HarvestDB updates SPiNE to create NMR record, send notify email to user and PI  
       d. For successful deposition: HarvestDB updates SPiNE to create NMR record, send notify email to user and PI  
       a. For error deposition: HarvestDB asks user to modify and re-deposit
       e. For error deposition: HarvestDB asks user to modify and re-deposit


#'''HarvestDB Updates Structure Gallery'''
'''&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 6.&nbsp; HarvestDB Updates Structure Gallery'''  


       a. http://nmr.cabm.rutgers.edu:9090/gallery/jsp/Gallery.jsp
       a. http://nmr.cabm.rutgers.edu:9090/gallery/jsp/Gallery.jsp
       a. Fix Header: HarvestDB asks user to fix the Title, Protein Name (NO Hypothetical ) and Author list (Last Author / PI name) of coordinates  
       b. Fix Header: HarvestDB asks user to fix the Title, Protein Name (NO Hypothetical ) and Author list (Last Author / PI name) of coordinates  
       a. Fix Protein Pictures
       c. Fix Protein Pictures
       a. Send info: BMRB and PDB id
       d. Send info: BMRB and PDB id
       a. Send files: Three Pictures, Coordinates, Constraints, NMRStar file, Zipped PSVS report
       e. Send files: Three Pictures, Coordinates, Constraints, NMRStar file, Zipped PSVS report
       a. Structure Gallery returns structure link to HarvestDB
       f. Structure Gallery returns structure link to HarvestDB
       a. HarvestDB sends notify email to user and PI  
       g. HarvestDB sends notify email to user and PI  


<br>  
<br>  


=== '''Scripts'''  ===
== '''Scripts'''  ==


==== '''patch2.sh'''  ====
=== '''patch2.sh'''  ===


Unix shell script.  
Unix shell script.  
 
<pre>#!/bin/sh
<nowiki>
#!/bin/sh
maxit-v8.01-O -i ordered -o 52
maxit-v8.01-O -i ordered -o 52
sed "s/O''/OXT/g" ordered.pdb > deposit.pdb
sed "s/O''/OXT/g" ordered.pdb &gt; deposit.pdb
</nowiki>  
</pre>  
 
*Runs [http://sw-tools.pdb.org/apps/MAXIT/index.html maxit] to to correct atom name nomenclature.  
*Runs [http://sw-tools.pdb.org/apps/MAXIT/index.html maxit] to to correct atom name nomenclature.  
*Sets proper names for the terminal -COO groups.
*Sets proper names for the terminal -COO groups.
Line 190: Line 227:
<br>  
<br>  


==== '''patch.sh'''  ====
=== '''patch.sh'''  ===


Unix shell script.  
Unix shell script.  
 
<pre>#!/bin/sh
<nowiki>
sed 's/ARG+/ARG /g; s/LYS+/LYS /g; s/HIS+/HIS /g; s/1HT/ H1/; s/2HT/ H2/; s/3HT/ H3/; s/OT1/O  /g; s/OT2/OXT/g' tmp.pdb &gt; fit
#!/bin/sh
sed 's/ARG+/ARG /g; s/LYS+/LYS /g; s/HIS+/HIS /g; s/1HT/ H1/; s/2HT/ H2/; s/3HT/ H3/; s/OT1/O  /g; s/OT2/OXT/g' tmp.pdb > fit
maxit-v8.01-O -i fit -o 52
maxit-v8.01-O -i fit -o 52
chainsub.py fit.pdb
chainsub.py fit.pdb
sed "/SEQRES/d; s/1H / H1/g; s/2H / H2/g; s/3H / H3/g; s/O''/OXT/g" fit.chainsub.pdb > deposit.pdb
sed "/SEQRES/d; s/1H / H1/g; s/2H / H2/g; s/3H / H3/g; s/O''/OXT/g" fit.chainsub.pdb &gt; deposit.pdb
</nowiki>  
</pre>  
 
*Removes the plus sign from ARG+, LYS+ and HIS+ residues  
*Removes the plus sign from ARG+, LYS+ and HIS+ residues  
*Runs [http://sw-tools.pdb.org/apps/MAXIT/index.html maxit] to to correct atom name nomenclature.  
*Runs [http://sw-tools.pdb.org/apps/MAXIT/index.html maxit] to to correct atom name nomenclature.  
*Runs <tt>chainsub.py</tt> to add chain identifiers.  
*Runs <tt>chainsub.py</tt> to add chain identifiers.  
*Sets proper names for the terminal -NH3 and -COO groups.
*Sets proper names for the terminal -NH3 and -COO groups.<br>


==== '''pdbfit.mac'''  ====
<br>


Macro for MOLMOL:
=== '''pdbfit.mac'''  ===


<nowiki>
Macro for MOLMOL:<br>  
# Initialize
<pre># Initialize
InitAll yes
InitAll yes
# Replace with your input pdb file
# Replace with your input pdb file
Line 224: Line 258:
# Write PDB
# Write PDB
WritePdb tmp.pdb
WritePdb tmp.pdb
System "patch.sh"
System "patch.sh"</pre>  
</nowiki>  
 
When using this file:  
When using this file:  


Line 232: Line 264:
*Set the residue range used to superimpose the bundle. Secondary structure elements from PSVS are a good choice.
*Set the residue range used to superimpose the bundle. Secondary structure elements from PSVS are a good choice.


==== '''bmrb_deposit.cya'''  ====
<br>


CYANA 2.1 macro to generate a BMRB file for deposition.  
=== '''bmrb_deposit.cya'''  ===


<nowiki>
CYANA 2.1 macro to generate a BMRB file for deposition.<br>  
tolerance:=0.02, 0.05, 0.4
<pre>tolerance:=0.02, 0.05, 0.4
read prot $name
read prot $name
stereofound
stereofound
deposit bmrb=$name
deposit bmrb=$name
</nowiki>  
</pre>  
 
*The first value in the <tt>tolerance</tt> list the <sup>1</sup>H chemical shift tolerance. The last value is the <sup>13</sup>C/<sup>15</sup>N tolerance. The second value is ignored.  
*The first value in the <tt>tolerance</tt> list the 1H chemical shift tolerance. The last value is the 13C/15N tolerance. The second value is ignored.  
*Here <tt>stereofound</tt> declares stereospecific assignments so the ambiguity codes appropriately in the resulting BMRB file.
*Here <tt>stereofound</tt> declares stereospecific assignments so the ambiguity codes appropriately in the resulting BMRB file.
 
<br>
<br>  
<br>  


-- Main.GaohuaLiu - 17 Feb 2007
*[[Media:Pdbfit.mac|pdbfit.mac]]: MOLMOL macro to superimpose and convert the PDB bundle
 
*[[NESG:%ATTACHURL%/pdbfit.mac|pdbfit.mac]]: MOLMOL macro to superimpose and convert the PDB bundle


*[[NESG:%ATTACHURL%/patch.sh|patch.sh]]: shell script making small modifications to PDB files
*[[Media:PBD_patch.sh|patch.sh]]: shell script making small modifications to PDB files


*[[NESG:%ATTACHURL%/bmrb dep.cya|bmrb_dep.cya]]: CYANA script to prepare a BMRB file for deposition
*[[Media:Bmrb_dep.cya|bmrb_dep.cya]]: CYANA script to prepare a BMRB file for deposition


*[[NESG:%ATTACHURL%/patch2.sh|patch2.sh]]: shell script to fix nomenclature of PDB files
*[[Media:PBD_patch2.sh|patch2.sh]]: shell script to fix nomenclature of PDB files

Latest revision as of 16:09, 22 September 2010

Introduction

In this section we describe the NESG SOP for NMR PDB/BMRB depositions, including preparation of files for deposition and creating a SPiNE NMR record.

The latter has been replaced by HarvestDB.  Also, as of Dec. 2008, deposition of NMR data and PDB coordinates is cinducted through the ADIT-NMR server.   


Preparing files for PDB depostion

Note: Truncated coordinates:  In the past (PSI-1), researchers deposited NMR Structures after removing disorder or not well defined regions. RPF analysis results based on truncated coordinates generally are poorer than results based on full length coordinates.  Therefore, the policy adopted throughout the NESG NMR labs is to deposit coordinates for all residues that have NMR assignments.

Files deposited

  • PDB coordinate file - required.
  • Constraint files used in the calculation - required.
    • NOE distance constraints.
    • Dihedral angle constraints (e.g. from TALOS).
    • Hydrogen bond constraints
    • RDC constraints

Please deposit constraint files that were used to generate the deposited coordinates in latest refined calculation cycle. For example, assuming that constrained refinement in explicit water bath using CNS was performed, the corresponding constraints in CNS format should be deposited.

Prior to deposition individual conformers should be superimposed to minimize backbone atom RMSD of the folded region. Also, since structure calculation programs such as CYANA or CNS/XPLOR utilize custom atom nomenclature, the PDB coordinate file has to be converted to conform to the RCSB nomenclatuere. The procedure below describes how this can be done with PDBStat and MAXIT.

Converting PDB file with PDBStat and MAXIT

Start PDBStat and enter the following commands:

  read coor pdb All_ZZZ_cns.pdb             #read file with concatenated CNS pdb files
  all                                       #select all the models
  classify                                  #classify the models by energy
  order 0.9                                 #determine ordered residues; phi/psi cut-off 0.9
  rmsd best backbone                        #backbone rmsd
  [return]                                  #creates an rmsd output file
  write coor pdb overlay.pdb                #write overlayed coordinates 

You can optionally choose the desired orientation for the resulting molecular bundle. Open the overlay.pdb in MOLMOL, find the desired orientation and save with File -> Write Transform..., for example, as rotation_matrix.mac. Start PDBStat again and enter the following commands:

  read coor pdb overlay.pdb                     #read file with concatenated CNS pdb files
  all                                       #select all the models
  rotate file rotation_matrix.mac           #apply rotation matrix
  write coor pdb ordered                    #write overlayed coordinates 

In a Unix shell run MAXIT to convert atom nomenclature to the PDB standard:

   maxit-v8.01-O -i ordered -o 52

The resulting ordered.pdb file only requires renaming oxygen atoms of C-terminal -COO groups using sed or a text editor.

   sed "s/O''/OXT/g" ordered.pdb > deposit.pdb

Precheck and Validation

In your web browser go to the RCSB validation server,

  1. Select NMR experimental method and upload your PDB file.
  2. Run Precheck. Make sure that there are no errors reported.
  3. Continue to Validation. Examine the Validation summary letter. Pay attention to
     a. Close contacts.
     b. Bond distances and angles.
     c. Torsion angles.
     d. Hydrogen nomenclature.
     e. Missing atoms. (It is OK to have missing labile hydrogens of Asp, Glu and neutral His side chains. If other 
        atom types are listed missing means that either the PDB file is incomplete or there is an issue with atom
        nomenclature).
     f. Extra atoms.

Preparing files for BMRB depostion

Files deposited:

  • Chemical shifts (BMRB file) - required.
  • NOESY peaklists - required.
  • Raw NMR Data - NOESY FIDs required. Due to its size usually uploaded as a single archive to the FTP server of BMRB after the deposition is submitted.
  1. Create a new directory (like structure/deposit/bmrb). Copy the following files from the last CYANA 2.1 manual structure calculation:
  • init.cya
  • XXXX.seq
  • XXXX.prot
  • Stereospecific assignment file (e.g. finalstereo.cya)

    Make sure you have added the following missing CG2 atoms of Val and CD2 of Leu, Phe and Tyr in residues, where they are degenerate with CG1 and CD1, respectively. The proton list XXXX.prot should have stereoassigned atoms swapped. The stereospecific assignment file is needed to properly set the ambiguity codes in the resulting BMRB file.
  1. Download the bmrb_dep.cya script and set the tolerances appropriate for you project (see below).
  2. Start CYANA 2.1 and run the bmrb_dep.cya script. It should produce a file named XXXX.bmrb.
  3. You may have to rename non-standard residues in XXXX.bmrb, such as HIST or HIS+ to HIS, and cPRO to PRO. Use any text editor.


Creating a BMRB file from CYANA


  read prot XXXX-final.prot                 # read the latest atom list
  finalstereo                               # stereospecific assignments (to set proper ambiguity codes)
  translate bmrb                            # use BMRB nomenclature
  pseudo=2                                  # use H* labels for pseudoatoms
  write bmrb XXXX.bmrb                      # write out the BMRB file  

Using any text editor rename all non-standard residues, such as HIST or HIS+ to HIS, and cPRO to PRO in the XXXX.bmrb file.

Creating an NMR structure record in SPINE

Using HarvestDB to create a record

NMR depositions will run through BMRB. There is no need to use PDB-ADIT, after using the BMRB-ADIT you will be given a BMRB and PDB id

      1.  Run PSVS (full length coordinates)

     a. http://www-nmr.cabm.rutgers.edu/PSVS/
     b. For optimal structures all Z-scores should be > -5

      2.  Run RPF (full length coordinates)

     a. http://www-nmr.cabm.rutgers.edu/PSVS/
     b. For optimal structures DPF >.7

      3.  Coordinates, Constraint Lists, and Chemical Shifts to BMRB (NMR ONLY)

     a. Go to http://deposit.bmrb.wisc.edu/bmrb-adit/ to initiate or update a deposition
     b. See SPINE target Record for Suggested authors
     c. PDB/BMRB Title should include "Northeast Structural Genomics Consortium Target XXXNN".
     d. After completion you will be given a BMRB and PDB id, BMRB will complete PDB deposition for you.

      4.  FIDS and NOESY Peak Lists to BMRB (NMR ONLY)

     a. http://www.bmrb.wisc.edu/
     b. Use ftp://ftp.bmrb.wisc.edu/ to complete anonymous ftp of your compressed data
     c. Name file nesg_bmrbaccession.tar.gz

Alternatively, raw fids can be tar'ed and ftp'ed to BMRB using the SPINS database.

      5.  Create NMR Record in SPINE

     a. Go to http://www.spine.nesg.org
     b. Tools -> Basic Search -> Enter Target ID
     c. Summary Page will appear -> click on your target
     d. Scroll to bottom of target Record and click corresponding Purification Batch 
     e. Scroll to bottom of Purification Record and click NMR 
     f. Complete web form
     g. Archive coordinates, structure factors, chemical shifts in SPINE
     h. Archive NOESY peak lists in SPINE

When Archiving a project in HarvestDB, be aware of the following:

  • HarvestDB/PDBstat cannot handle simplified pseudoatom nomenclature, that is atoms names like HB of alanine instead of QB, or HD1 of leucine instead of QD1. Such nomenclature is standard in CARA, and can be used in CYANA v2.x and later with pseudo=2 setting. Make sure you convert them into DYANA/CYANA or Xplor/CNS format before uploading.
  • HarvestDB currently uses AutoStructure v2.1.1 to calculate RPF scores. Thus, you should upload a control file compatible with AutoStructure v2.1.1, and prepare a combined 13C peaklist if you used separate 13Cali and 13Caro peaklists.

Creating a Record Manually

  1. On the spine web site find your protein target in the database.
  2. Go to the protein sample tube record (usually the NC or NC5 sample)
  3. At the bottom of the sample tube page there is a line "Create new structure record: (HSQC) (NMR) (Xray)". Click on "NMR" - you'll be asked for a user name and password.
  4. Fill in the fields and click on "Update Entry".


Using HarvestDB to Prepare PDB and BMRB Depositions (Under Construction):

NMR depositions will in the future run through HarvestDB. HarvestDB has the following major functions, A. Archive NMR files; B. Version tracking; C. PSVS analysis; D. Deposit to BMRB; E. Update SPiNE and Structure Gallery. (Main.DehuaHang)

      1.  Submit NMR Protein Structure Information and Files to HarvestDB to Create Protein Record

     a. Complete web form: NESG target id, Protein id, version id, Swissprot id, total number of structures, NMR
        comments.
     b. Complete web form: Coordinates, constraint lists, chemical shift, NOESY peak lists
     c. After completing web form, HarvestDB sends email user with the link of the new structure record  
     d. HarvestDB generates protein pictures: Small (80 by 80), Big static (300 by 300), Big  dynamic (300 by 300)
     e. HarvestDB pulls author list from SPiNE
     f. HarvestDB setups NMR id, construct id, batch id from input PST id
     g. Users can update structure information and NMR files through HarvestDB

      2.  Run PSVS, RPF Analysis through HarvestDB

     a. http://www-nmr.cabm.rutgers.edu/PSVS/
     b. HarvestDB sends Target id, Protein is, Coordinates, Constraint Lists to PSVS
     c. HarvestDB receives zipped PSVS report from PSVS, parse the zipped HTML file to get the z-scores, and send
        email to notify user
     d. For optimal structures all Z-scores should be > -5
     e. For optimal structures DPF >.7
     f. HarvestDB compares z-scores with previous NSEG structure quality by scatter plots

      3.  NMR Structure File Version Tracking

     a. HarvestDB duplicates current files and information to create newer version
     b. Update files after refinement, tracks date and notes fro each version

      4.  Prepare NMRStar File and Coordinate file (mmCIF) through HarvestDB

     a. HarvestDB pulls information from SPiNE and Swissprot site
     b. HarvestDB collects information about molecular entity sequence, contact authors, title, citation, molecule,
        synthetic, sample conditions, spectrometer, experiment
     c. HarvestDB generates NMRStar file and Coordinate file (by using pdb_extract)

      5.  HarvestDB Runs through BMRB to Initiate or Update Auto-deposition

     a. Send info: Submitter info, PI info
     b. Send files: Coordinates, Constraint Lists and NMRStar file
     c. HarvestDB receives BMRB and PDB id, deposition date, deposition status from BMRB
     d. For successful deposition: HarvestDB updates SPiNE to create NMR record, send notify email to user and PI 
     e. For error deposition: HarvestDB asks user to modify and re-deposit

      6.  HarvestDB Updates Structure Gallery

     a. http://nmr.cabm.rutgers.edu:9090/gallery/jsp/Gallery.jsp
     b. Fix Header: HarvestDB asks user to fix the Title, Protein Name (NO Hypothetical ) and Author list (Last Author / PI name) of coordinates 
     c. Fix Protein Pictures
     d. Send info: BMRB and PDB id
     e. Send files: Three Pictures, Coordinates, Constraints, NMRStar file, Zipped PSVS report
     f. Structure Gallery returns structure link to HarvestDB
     g. HarvestDB sends notify email to user and PI 


Scripts

patch2.sh

Unix shell script.

#!/bin/sh
maxit-v8.01-O -i ordered -o 52
sed "s/O''/OXT/g" ordered.pdb > deposit.pdb
  • Runs maxit to to correct atom name nomenclature.
  • Sets proper names for the terminal -COO groups.


patch.sh

Unix shell script.

#!/bin/sh
sed 's/ARG+/ARG /g; s/LYS+/LYS /g; s/HIS+/HIS /g; s/1HT/ H1/; s/2HT/ H2/; s/3HT/ H3/; s/OT1/O  /g; s/OT2/OXT/g' tmp.pdb > fit
maxit-v8.01-O -i fit -o 52
chainsub.py fit.pdb
sed "/SEQRES/d; s/1H / H1/g; s/2H / H2/g; s/3H / H3/g; s/O''/OXT/g" fit.chainsub.pdb > deposit.pdb
  • Removes the plus sign from ARG+, LYS+ and HIS+ residues
  • Runs maxit to to correct atom name nomenclature.
  • Runs chainsub.py to add chain identifiers.
  • Sets proper names for the terminal -NH3 and -COO groups.


pdbfit.mac

Macro for MOLMOL:

# Initialize
InitAll yes
# Replace with your input pdb file
ReadPdb All_STR_cns.pdb
# Select secondary structure elements
SelectAtom ':7-9,21-27,31-38,41-48@CA'
Fit to_first
# Remove pseudoatoms
SelectAtom '@Q*'
RemoveAtom
# Write PDB
WritePdb tmp.pdb
System "patch.sh"

When using this file:

  • Replace the input PDB file name for the ReadPdb command.
  • Set the residue range used to superimpose the bundle. Secondary structure elements from PSVS are a good choice.


bmrb_deposit.cya

CYANA 2.1 macro to generate a BMRB file for deposition.

tolerance:=0.02, 0.05, 0.4
read prot $name
stereofound
deposit bmrb=$name
  • The first value in the tolerance list the 1H chemical shift tolerance. The last value is the 13C/15N tolerance. The second value is ignored.
  • Here stereofound declares stereospecific assignments so the ambiguity codes appropriately in the resulting BMRB file.



  • pdbfit.mac: MOLMOL macro to superimpose and convert the PDB bundle
  • patch.sh: shell script making small modifications to PDB files
  • bmrb_dep.cya: CYANA script to prepare a BMRB file for deposition
  • patch2.sh: shell script to fix nomenclature of PDB files