PdbStat: Difference between revisions
(Created page with '== '''PdbStat''' == The PdbStat program, written and actively developed by our colleague Roberto Tejero (Universita de Valencia), is routinely used in the Montelione lab for th…') |
No edit summary |
||
Line 1: | Line 1: | ||
== '''PdbStat''' == | == '''PdbStat''' == | ||
The PdbStat program, written and actively developed by our colleague Roberto Tejero (Universita de Valencia), is routinely used in the Montelione lab for the analysis, conversion, and manipulation of coordinate and constraint files for protein structure determination. | The PdbStat program, written and actively developed by our colleague Roberto Tejero (Universita de Valencia), is routinely used in the Montelione lab for the analysis, conversion, and manipulation of coordinate and constraint files for protein structure determination. The program is also an integral component of the Protein Structure Validation Software package (Ref. 1) used across the NESG consortium. | ||
The current version of the software is PdbStat 5.1 (July, 2008). | The current version of the software is PdbStat 5.1 (July, 2008). There is no separate publication for PdbStat; we cite this program using Bhattacharya et al, 2007 (Ref. 1). Also, there is no official manual for the program. Below are a number of basic stand-alone PdbStat commands and common uses. Going forward, we can continue to add to this document as more applications are developed. | ||
Anyone interested in obtaining the latest version of PdbStat can contact Roberto at: | Anyone interested in obtaining the latest version of PdbStat can contact Roberto at: [mailto:roberto.tejero@uv.es] | ||
<br> | |||
=== COMMANDS AND APPLICATIONS === | === COMMANDS AND APPLICATIONS === | ||
====I. | |||
==== I. Starting the program ==== | |||
<tt>pdbstat</tt> | <tt>pdbstat</tt> | ||
Here are the recognized PdbStat commands for version 5.1: | ==== II. Menu of Commands ==== | ||
<nowiki> | |||
<tt>menu</tt> gives a list of commands and keywords recognized by PdbStat (also get this using the command: <tt>help command</tt>) | |||
Here are the recognized PdbStat commands for version 5.1: <nowiki> | |||
PdbStat> menu | PdbStat> menu | ||
Line 40: | Line 44: | ||
</nowiki> | </nowiki> | ||
====III. | ==== III. Help documents for specific commands ==== | ||
Each command has a short help document describing its usage and options. <br> | |||
Syntax:<br> | Each command has a short help document describing its usage and options. <br> Syntax:<br> help {topic} {subtopic} | ||
Example: | Example: <nowiki> | ||
<nowiki> | |||
PdbStat> help read | PdbStat> help read | ||
Line 69: | Line 71: | ||
is typed the program offers <coords/cons/seq> and then after that the | is typed the program offers <coords/cons/seq> and then after that the | ||
program offers the different formats <pdb,disman,....> | program offers the different formats <pdb,disman,....> | ||
</nowiki> | </nowiki> | ||
<br> | |||
==== IV. Preparation of final CNS coordinates for PDB deposition ==== | |||
Before we deposit our final coordinates we would like to order our models in order of lowest conformation energy. Also, we want to perform backbone superposition of the coordinates using ordered [S(phi) + S(psi) > 1.8] residues only. Finally, we want to rotate the ensemble to a desired orientation, which will be the orientation that appears when a user downloads out coordinates from the Protein Data Bank (www.rcsb.org). The final coordinates are saved in the original (CNS) format and IUPAC format for RPF analysis (Ref. 2). | |||
Before we deposit our final coordinates we would like to order our models in order of lowest conformation energy. | |||
To prepare the final coordinates file for PDB deposition use the following protocol: | To prepare the final coordinates file for PDB deposition use the following protocol: | ||
PdbStat: <br> | PdbStat: <br> <nowiki> | ||
<nowiki> | |||
rea coo pdb [filename] #read file with concatenated CNS pdb files | rea coo pdb [filename] #read file with concatenated CNS pdb files | ||
all #select all the models | all #select all the models | ||
Line 86: | Line 89: | ||
[return] #creates an rmsd output file | [return] #creates an rmsd output file | ||
write coo pdb [overlayed file] #write overlayed coordinates | write coo pdb [overlayed file] #write overlayed coordinates | ||
</nowiki> | </nowiki> | ||
Next, open the overlayed coordinates in Molmol and get the desired rotation. | Next, open the overlayed coordinates in Molmol and get the desired rotation. Use <tt>writetransform</tt> command to write the rotation matrix to a file. <br> Back in PdbStat: <br> <nowiki> | ||
Back in PdbStat: <br> | |||
<nowiki> | |||
rea coo pdb [overlayed.pdb] #read the overlayed coordinates (all models) | rea coo pdb [overlayed.pdb] #read the overlayed coordinates (all models) | ||
rotate file [filename] #apply rotation matrix | rotate file [filename] #apply rotation matrix | ||
Line 96: | Line 97: | ||
to iupac #converts atom nomenclature to IUPAC | to iupac #converts atom nomenclature to IUPAC | ||
write coo pdb [final_iupac.pdb] #write IUPAC coordinates for RPF analysis | write coo pdb [final_iupac.pdb] #write IUPAC coordinates for RPF analysis | ||
</nowiki> | </nowiki> | ||
==== V. Selecting specific models / residues / atoms. ==== | |||
A powerful option in PdbStat is the ability to select specific residues and/or atoms for further analysis. This is extremely useful for superposition and RMSD evaluation of selected residues/atoms. <br> Syntax and Examples: <br> <nowiki> | |||
A powerful option in PdbStat is the ability to select specific residues and/or atoms for further analysis. | |||
Syntax and Examples: | |||
<nowiki> | |||
#syntax | #syntax | ||
sel[ect] {model(s)} {residue(s)} {atom(s)} | sel[ect] {model(s)} {residue(s)} {atom(s)} | ||
Line 111: | Line 111: | ||
sele * 5-50,60-85 * | sele * 5-50,60-85 * | ||
rmsd sele backbone | rmsd sele backbone | ||
</nowiki> | </nowiki> | ||
<br> | |||
==== VI. Conversion of coordinate and constraint formats ==== | |||
In the course of a structure refinement we regularly have to convert between different coordinate and constraint formats for different structure programs (i.e., CYANA, XPLOR/CNS, ECEPP). We routinely use PdbStat for this. <br> Examples:<br> <nowiki> | |||
In the course of a structure refinement we regularly have to convert between different coordinate and constraint formats for different structure programs (i.e., CYANA, XPLOR/CNS, ECEPP). | |||
Examples:<br> | |||
<nowiki> | |||
# converting CYANA to XPLOR/CNS coordinates: | # converting CYANA to XPLOR/CNS coordinates: | ||
rea coor pdb [CYANA.pdb] #read CYANA models (all) | rea coor pdb [CYANA.pdb] #read CYANA models (all) | ||
Line 153: | Line 153: | ||
|} | |} | ||
</nowiki> | </nowiki> | ||
<br> | |||
==== VII. Constraint Violations ==== | |||
One can analyze constraint violations using PdbStat. It is best to convert the coordinates and constraints to IUPAC format; this is the approach used internally within PSVS. <br> Commands: <nowiki> | |||
One can analyze constraint violations using PdbStat. | |||
Commands: | |||
<nowiki> | |||
rea coo pdb [filename] | rea coo pdb [filename] | ||
to iupac #convert coordinates to IUPAC format | to iupac #convert coordinates to IUPAC format | ||
Line 169: | Line 169: | ||
see cutaco 1 #set cut-off for dihedral violations to 1 deg. | see cutaco 1 #set cut-off for dihedral violations to 1 deg. | ||
see viol aco #see dihedral violations above threshold | see viol aco #see dihedral violations above threshold | ||
</nowiki> | </nowiki> | ||
==== VIII. Sorting of Distance Constraints ==== | |||
There are a number of options in PdbStat for sorting or culling distance constraints. <br> <nowiki> | |||
There are a number of options in PdbStat for sorting or culling distance constraints. | |||
<nowiki> | |||
cons clean #keep conformationally-restricting constraints; also removes duplicates | cons clean #keep conformationally-restricting constraints; also removes duplicates | ||
noe analysis #NOE statistics (as in PSVS) | noe analysis #NOE statistics (as in PSVS) | ||
Line 183: | Line 183: | ||
noe keep ilv #keep NOE constraints consistent with ILV labeling | noe keep ilv #keep NOE constraints consistent with ILV labeling | ||
</nowiki> | </nowiki> | ||
==== IX. Commands for Obtaining Various Metrics ==== | |||
The “eval”, “show”, and “see” commands allow one to evaluate several types of metrics in a structure or ensemble. <br> <nowiki> | |||
The | |||
<nowiki> | |||
eval [procheck/rama] #get Procheck/Ramachandran statistics for model(s) | eval [procheck/rama] #get Procheck/Ramachandran statistics for model(s) | ||
eval dist * 68 sg 119 nd1 #get 68-SG to 119-ND1 distance across all models | eval dist * 68 sg 119 nd1 #get 68-SG to 119-ND1 distance across all models | ||
Line 200: | Line 200: | ||
select 1 119 * #use select and see in tandem | select 1 119 * #use select and see in tandem | ||
see coo #coordinates for residue 119 in model 1 | see coo #coordinates for residue 119 in model 1 | ||
</nowiki> | </nowiki> | ||
====X. | ==== X. Other Functions ==== | ||
<nowiki> | <nowiki> | ||
# renumbering / resetting residue numbering | # renumbering / resetting residue numbering | ||
Line 225: | Line 226: | ||
> MAIN_cntct: distance cutoff ?:_ 4.5 | > MAIN_cntct: distance cutoff ?:_ 4.5 | ||
> MAIN_cntct: atom type (hydr, heavy, all) ?:_ hydr #writes a postscript file | > MAIN_cntct: atom type (hydr, heavy, all) ?:_ hydr #writes a postscript file | ||
</nowiki> | </nowiki> | ||
<br> | |||
===== REFERENCES ===== | |||
1. Bhattacharya, A., Tejero, R., and Montelione, G. T. (2007) Evaluating protein structures determined by structural genomics consortia. ''Proteins 66'', 778-795. <br> 2. Huang, Y. J., Powers, R., and Montelione, G.T. (2005) Protein NMR Recall, Precision, and F-measure scores (RPF scores): structure quality assessment measures based on information retrieval statistics. ''J. Am. Chem. Soc. 127'', 1665-1674. <br> 3. Snyder, D. A. and Montelione, G. T. (2005) Clustering algorithms for identifying core atom sets and for assessing the precision of protein structure ensembles. ''Proteins 59'', 673-686. <br> | |||
1. Bhattacharya, A., Tejero, R., and Montelione, G. T. (2007) Evaluating protein structures determined by structural genomics consortia. ''Proteins 66'', 778-795. | |||
2. Huang, Y. J., Powers, R., and Montelione, G.T. (2005) Protein NMR Recall, Precision, and F-measure scores (RPF scores): structure quality assessment measures based on information retrieval statistics. ''J. Am. Chem. Soc. 127'', 1665-1674. <br> | |||
3. Snyder, D. A. and | |||
-- Main.JimAramini - 14 Aug 2008 | -- Main.JimAramini - 14 Aug 2008 |
Revision as of 16:44, 30 October 2009
PdbStat
The PdbStat program, written and actively developed by our colleague Roberto Tejero (Universita de Valencia), is routinely used in the Montelione lab for the analysis, conversion, and manipulation of coordinate and constraint files for protein structure determination. The program is also an integral component of the Protein Structure Validation Software package (Ref. 1) used across the NESG consortium.
The current version of the software is PdbStat 5.1 (July, 2008). There is no separate publication for PdbStat; we cite this program using Bhattacharya et al, 2007 (Ref. 1). Also, there is no official manual for the program. Below are a number of basic stand-alone PdbStat commands and common uses. Going forward, we can continue to add to this document as more applications are developed.
Anyone interested in obtaining the latest version of PdbStat can contact Roberto at: [1]
COMMANDS AND APPLICATIONS
I. Starting the program
pdbstat
II. Menu of Commands
menu gives a list of commands and keywords recognized by PdbStat (also get this using the command: help command)
Here are the recognized PdbStat commands for version 5.1: PdbStat> menu Commands/Keywords currently recognized by PDBSTAT The commands/keywords currently recognized by PDBSTAT are given below. Type "help <command>" for more information on each PDBSTAT function. align analyze author bye class clear close chain check chiral delete debug dump energy evaluate find fit fix get help history hydrogen hyper kabsch ident/chain info initialize log/change load menu missing order phi quit rama read relax reset restore rotate save see set show rmsd to trans version what write homology commands: get change homology homa Other kind/commands/keywords of help codes(amino codes) amino(amino geometry)
III. Help documents for specific commands
Each command has a short help document describing its usage and options.
Syntax:
help {topic} {subtopic}
Example: PdbStat> help read Read Syntax: rea[d] {<arg1>} {<arg2>} {<arg3>} arg1 can be: coo[rdinates] or con[straints] or seq[uence] arg2 can be: pdb or con[gen] or cha[rmm] or dis[man] arg3 can be: the name of the file. Examples: read coord pdb filename.pdb read seq filename read cons congen filename read cons diana filename Read coordinates/constraints/sequence in specified format from file. This command is able to open the file read it locating the number of structures stored and then read them. The user needs to type only <read> then a interactive cycle with questions begins. The program offers the different formats available, so after read is typed the program offers <coords/cons/seq> and then after that the program offers the different formats <pdb,disman,....>
IV. Preparation of final CNS coordinates for PDB deposition
Before we deposit our final coordinates we would like to order our models in order of lowest conformation energy. Also, we want to perform backbone superposition of the coordinates using ordered [S(phi) + S(psi) > 1.8] residues only. Finally, we want to rotate the ensemble to a desired orientation, which will be the orientation that appears when a user downloads out coordinates from the Protein Data Bank (www.rcsb.org). The final coordinates are saved in the original (CNS) format and IUPAC format for RPF analysis (Ref. 2).
To prepare the final coordinates file for PDB deposition use the following protocol:
PdbStat:
rea coo pdb [filename] #read file with concatenated CNS pdb files
all #select all the models
class #classify the models by energy
order 0.9 #determine ordered residues; phi/psi cut-off 0.9
rmsd best backbone #backbone rmsd
[return] #creates an rmsd output file
write coo pdb [overlayed file] #write overlayed coordinates
Next, open the overlayed coordinates in Molmol and get the desired rotation. Use writetransform command to write the rotation matrix to a file.
Back in PdbStat:
rea coo pdb [overlayed.pdb] #read the overlayed coordinates (all models)
rotate file [filename] #apply rotation matrix
write coo pdb [final.pdb] #write new coordinates
to iupac #converts atom nomenclature to IUPAC
write coo pdb [final_iupac.pdb] #write IUPAC coordinates for RPF analysis
V. Selecting specific models / residues / atoms.
A powerful option in PdbStat is the ability to select specific residues and/or atoms for further analysis. This is extremely useful for superposition and RMSD evaluation of selected residues/atoms.
Syntax and Examples:
#syntax
sel[ect] {model(s)} {residue(s)} {atom(s)}
#select backbone atoms of residues 5-25,30-50 in models 1,3-5,7-10
sele 1,3-5,7-10 5-25,30-50 backbone
#select N,C,CA,O atoms of residues 5-50 in all models
sele * 5-50 n,c,ca,o
#combine selection with superposition
sele * 5-50,60-85 *
rmsd sele backbone
VI. Conversion of coordinate and constraint formats
In the course of a structure refinement we regularly have to convert between different coordinate and constraint formats for different structure programs (i.e., CYANA, XPLOR/CNS, ECEPP). We routinely use PdbStat for this.
Examples:
# converting CYANA to XPLOR/CNS coordinates:
rea coor pdb [CYANA.pdb] #read CYANA models (all)
to xplor #converts to xplor; fixes stereospecfic labels
write #answer questions
> WRITER: Output file name ?:_ 4cns.pdb
> WRITER: Coords, constraints, aco or sequence file? :_ coor
> WRITER: ... backbone, heavy, full set? (back/heavy/all/select): all
> WRITER: What model ?_ : all
> WRITER: -- All models to be written
> COORD_writer: Format (pdb/congen/RasMol) ? : pdb
# converting CYANA to XPLOR/CNS distance constraints:
rea coor pdb [CYANA.pdb] #read CYANA models; you have to read in a pdb or sequence file before the constraints
rea cons cyana [CYANA.upl] #read CYANA upls file
write #write to CNS format; add 10% to upper bound; make lower bound van der Waals (1.8 A)
> WRITER: Output file name ?:_ 4cns_noe.tbl
> WRITER: Coords, constraints, aco or sequence file? :_ cons
> WRITER_constr: Output format
{| border="1"
|-
[congen||discover||ecepp||disman||dyana||cyana||diana||xplor||cns]? : cns
|}
> XPLOR_writer: percentage range for upper bound (upp+range)?: 10
> XPLOR_writer: range for lower bound (low-range)?: vdw
> XPLOR_writer: ... writing ... wait
# converting CYANA to XPLOR/CNS dihedral constraints:
rea aco cyana [CYANA.aco]
write
> WRITER: Output file name ?:_ 4cns_dihe.tbl
> WRITER: Coords, constraints, aco or sequence file? :_ aco
{| border="1"
|-
> WRITER_constr: Output format (congen || discover || impact || disman || diana || xplor || cns)? : cns
|}
VII. Constraint Violations
One can analyze constraint violations using PdbStat. It is best to convert the coordinates and constraints to IUPAC format; this is the approach used internally within PSVS.
Commands:
rea coo pdb [filename]
to iupac #convert coordinates to IUPAC format
rea cons [format] [filename] #read distance constraints; specify format
noe to iupac #convert distance constraints to IUPAC
rea aco [format] [filename] #read dihedral constraints; specify format
set cutu 0.2 #set cut-off for distance violations to 0.2 A
see viol noe [sum/ave/center] #see noe violations using sum/average/center averaging
see cutaco 1 #set cut-off for dihedral violations to 1 deg.
see viol aco #see dihedral violations above threshold
VIII. Sorting of Distance Constraints
There are a number of options in PdbStat for sorting or culling distance constraints.
cons clean #keep conformationally-restricting constraints; also removes duplicates
noe analysis #NOE statistics (as in PSVS)
noe delete intra #delete all intra NOE constraints
{| border="1"
|-
noe keep long #keep only long range (||i-j|| >/= 5) NOE constraints
|}
noe keep ilv #keep NOE constraints consistent with ILV labeling
IX. Commands for Obtaining Various Metrics
The “eval”, “show”, and “see” commands allow one to evaluate several types of metrics in a structure or ensemble.
eval [procheck/rama] #get Procheck/Ramachandran statistics for model(s)
eval dist * 68 sg 119 nd1 #get 68-SG to 119-ND1 distance across all models
{| border="1"
|-
eval [phi || psi || omega] 1 #get phi/psi/omega torsion angles in model 1
|}
see lib [residue] #see library definitions for residue type
select 1 119 * #use select and see in tandem
see coo #coordinates for residue 119 in model 1
X. Other Functions
# renumbering / resetting residue numbering reset * 10 #sets first residue in file to 10 reset * #resets coordinates to original # ordering ensemble using FindCore algorithm (Ref. 3): {| border="1" |- find [ -bb || -heavy || -all || -noe ] |} # contact map generation based on coordinates or constraints contact > DRAW_cntct: from coordinates or constraints (coor/cons) ?:_ coor > MAIN_cntct: What model do you want (1-20) or average ?_ :ave > do_average_coords(): Making average for backbone atoms > do_average_coords(): Calculating center of masses > do_average_coords(): Calling optimal rotation for backbone > do_average_coords(): Calc. average coordinates backbone > MAIN_cntct: distance cutoff ?:_ 4.5 > MAIN_cntct: atom type (hydr, heavy, all) ?:_ hydr #writes a postscript file
REFERENCES
1. Bhattacharya, A., Tejero, R., and Montelione, G. T. (2007) Evaluating protein structures determined by structural genomics consortia. Proteins 66, 778-795.
2. Huang, Y. J., Powers, R., and Montelione, G.T. (2005) Protein NMR Recall, Precision, and F-measure scores (RPF scores): structure quality assessment measures based on information retrieval statistics. J. Am. Chem. Soc. 127, 1665-1674.
3. Snyder, D. A. and Montelione, G. T. (2005) Clustering algorithms for identifying core atom sets and for assessing the precision of protein structure ensembles. Proteins 59, 673-686.
-- Main.JimAramini - 14 Aug 2008