AVS: Difference between revisions

From NESG Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(8 intermediate revisions by 3 users not shown)
Line 1: Line 1:
= '''Introduction'''  =
== '''Introduction'''  ==


Assignment validation suite ([http://www.ncbi.nlm.nih.gov/pubmed/14872126 AVS]) checks the chemical shifts list in BioMagResBank (BMRB) format for a number of possible problems such as consistency to IUPAC labeling, chemical shifts that are grossly outside the typical range for the particular atom/residue, and reports useful statistics information about the examined chemical shift set (e.g. percent assignments, number of stereospecifically assigned methlys, percents aromatic sidechain assignments, etc).   AVS is run on every chemical shift set that is submitted to the BMRB, and can be included as part of the Protein Structure Validation Suite (PSVS) run.  It is advisable to run any chemical shift validation prior to structure determination steps in order to uncover problems with the assignments that could impact the performance of noesy assignments and structure calculation downstream.   
Assignment validation suite ([http://www.ncbi.nlm.nih.gov/pubmed/14872126 AVS]) checks the chemical shifts list in BioMagResBank (BMRB) format for a number of possible problems such as consistency to IUPAC residue/atom naming, chemical shifts that are widely outside the typical range for the particular atom/residue, and reports useful statistics information about the examined chemical shift set (e.g. percents assignments, number of stereospecifically assigned methyls, percents aromatic sidechain assignments, etc).   AVS is run on every chemical shift set that is submitted to the BMRB, and is included as part of the Protein Structure Validation Suite (PSVS).   


<br>  
<br>  


=== Practical Aspects  ===
== '''Practical Aspects'''  ==
 
=== Running AVS ===


A number of version for the standalone AVS routine exist that are adapted for different bmrb versions&nbsp; (2.1 or 3.1). Two perl scripts can be run from any directory on any computer running perl by either pointing to the local AutoAssign script repository directory or by downloading the scripts linked below.&nbsp; Here an example script is provideded that generates the bmrb in 2.1 format directly from the sparky resonance list 'rl' and the protein sequences.&nbsp; The script validates and computes the completeness statistics for the generated chemical shift list.&nbsp; As modifications are made in the sparky project the operation is repeated until a final bmrb file is achieved.<br>  
A number of version for the standalone AVS routine exist that are adapted for different bmrb versions&nbsp; (2.1 or 3.1). Two perl scripts can be run from any directory on any computer running perl by either pointing to the local AutoAssign script repository directory or by downloading the scripts linked below.&nbsp; Here an example script is provideded that generates the bmrb in 2.1 format directly from the sparky resonance list 'rl' and the protein sequences.&nbsp; The script validates and computes the completeness statistics for the generated chemical shift list.&nbsp; As modifications are made in the sparky project the operation is repeated until a final bmrb file is achieved.<br>  
Line 15: Line 17:
cp test_bmrb.bmrb HsR50_bb.bmrb
cp test_bmrb.bmrb HsR50_bb.bmrb
rm test_bmrb.bmrb</pre>  
rm test_bmrb.bmrb</pre>  
<br> Three scripts are run: 1) [[Media:SparkyRL2bmrb.txt|sparkyRL2bmrb.pl]], 2) [[Media:Missing_shifts.txt|missing_shifts.pl]], and 3) [[Media:Validate_assignments.txt|validate_assignments.pl]]. In addition, a bmrb parsing module [[Media:BMRBParsing.pm|BMRBParsing.pm]] is called that interprets the sequence in single letter code and numbers the bmrb residues, ind this case starting from 1.  
<br> Three scripts are run: 1) [[Media:SparkyRL2bmrb.txt|sparkyRL2bmrb.pl]], 2) [[Media:Missing_shifts.txt|missing_shifts.pl]], and 3) [[Media:Validate_assignments.txt|validate_assignments.pl]]. In addition, a bmrb parsing module [[Media:BMRBParsing.pm|BMRBParsing.pm]] is called that interprets the sequence in single letter code and returns numbering in the bmrb file, in this case starting from residue 1.  


Newer file versions are available in later versions of the AutoAssign program that should handle bmrb 3.1 format.  
Newer file versions are available in later versions of the AutoAssign program that should handle bmrb 3.1 format.  
Line 21: Line 23:
<br>  
<br>  


Editing in progress
=== '''Output interpretation'''  ===
 
The output interpretation is straightforward. A view of the output for res. 189-191 from the validation script is shown below, the summary of errors at the bottom of the file provides quick list of overall problems to the scientist:<br>
<pre>D189    Overall: Consistent    Typing: Consistent    SRO: Consistent    C Shifts: Consistent    H Shifts: Consistent
    PRTL&gt;&gt;    D 0.28  L 0.17  N 0.12  C 0.11  K 0.09  F 0.09  Y 0.08  R 0.01 
    HN Overlap&gt;&gt;    D13 R126
    C Shift Assignments&gt;&gt;    C&nbsp;:: 176.670    CA&nbsp;:: 54.404    CB&nbsp;:: 40.809
    H Shift Assignments&gt;&gt;    H&nbsp;:: 8.458    HA&nbsp;:: 4.591
 
L190    Overall: Consistent    Typing: Consistent    SRO: Consistent    C Shifts: Consistent    H Shifts: Consistent
    PRTL&gt;&gt;    L 0.22  K 0.2  D 0.14  R 0.12  C 0.1  F 0.09  Y 0.07  N 0.02 
    HN Overlap&gt;&gt;    A20
    C Shift Assignments&gt;&gt;    C&nbsp;:: 177.898    CA&nbsp;:: 55.723    CB&nbsp;:: 41.968
    H Shift Assignments&gt;&gt;    H&nbsp;:: 8.115    HA&nbsp;:: 4.191
 
E191    Overall: Consistent    Typing: Consistent    SRO: Consistent    C Shifts: Consistent    H Shifts: Consistent
    PRTL&gt;&gt;    E 0.14  H 0.13  W 0.13  R 0.13  Q 0.13  C 0.11  K 0.1  M 0.05  I 0.02  V 0.01 
    C Shift Assignments&gt;&gt;    C&nbsp;:: 176.524    CA&nbsp;:: 57.003    CB&nbsp;:: 29.910
    H Shift Assignments&gt;&gt;    H&nbsp;:: 8.226    HA&nbsp;:: 4.107
 
Error Summary:
G92    HA2 = 5.318(S),    Expected =  3.95, Std = 0.4000, ChiSquare = 6.2621e-04
P116    HA = 5.681(S),    Expected =  4.41, Std = 0.3600, ChiSquare = 4.1469e-04
R132    Typing: Mistyped
R132    CB = 38.217(S),    Expected = 30.66, Std = 1.7700, ChiSquare = 1.9592e-05
A160    HB = 0.090(S),    Expected =  1.38, Std = 0.2500, ChiSquare = 2.4695e-07
T181    HA = 2.217(S),    Expected =  4.48, Std = 0.5000, ChiSquare = 6.0111e-06
 
</pre>
Several issues are flagged in the error summary for this entry, proton frequency out of range and CB&nbsp;for R132 out of range to indicate possible mis-assignment.


<br>  
<br>  


PaoLo roSSi
Below is a view of the missing_shift.pl script.&nbsp; For the protein in the example, backbone assignment only was conducted (unlisted atoms are present in the bmrb):
<pre>D189:    HB2  HB3
L190:    CD1  CD2  CG  HB2  HB3  HD1  HD2  HG
E191:    CG  HB2  HB3  HG2  HG3
 
AtomType Completeness Statistics:
                aromatic completeness&nbsp;::    0 /  174 =  0.00%
                backbone completeness&nbsp;::  845 /  965 =  87.56%
                sidechain completeness&nbsp;::  227 / 1244 =  18.25%
                unambiguous CH2 completeness&nbsp;::    0 /  20 =  0.00%
                unambiguous CH3 completeness&nbsp;::    0 /  32 =  0.00%
 
 
      C&nbsp;::  168 /  197 =  85.28%
      CA&nbsp;::  181 /  197 =  91.88%
      CB&nbsp;::  167 /  183 =  91.26%
      H&nbsp;::  160 /  180 =  88.89%
      HA&nbsp;::  156 /  183 =  85.25%
    HA2&nbsp;::  11 /  14 =  78.57%
    HA3&nbsp;::    9 /  14 =  64.29%
      HB&nbsp;::    8 /  56 =  14.29%
 
</pre>
<br>

Latest revision as of 16:56, 5 January 2010

Introduction

Assignment validation suite (AVS) checks the chemical shifts list in BioMagResBank (BMRB) format for a number of possible problems such as consistency to IUPAC residue/atom naming, chemical shifts that are widely outside the typical range for the particular atom/residue, and reports useful statistics information about the examined chemical shift set (e.g. percents assignments, number of stereospecifically assigned methyls, percents aromatic sidechain assignments, etc).   AVS is run on every chemical shift set that is submitted to the BMRB, and is included as part of the Protein Structure Validation Suite (PSVS). 


Practical Aspects

Running AVS

A number of version for the standalone AVS routine exist that are adapted for different bmrb versions  (2.1 or 3.1). Two perl scripts can be run from any directory on any computer running perl by either pointing to the local AutoAssign script repository directory or by downloading the scripts linked below.  Here an example script is provideded that generates the bmrb in 2.1 format directly from the sparky resonance list 'rl' and the protein sequences.  The script validates and computes the completeness statistics for the generated chemical shift list.  As modifications are made in the sparky project the operation is repeated until a final bmrb file is achieved.

/Local/AutoAssign1.14/bin/sparkyRL2bmrb.pl HsR50_bb.rl test_bmrb.bmrb 1 MSPIPLPVTDTDDAWRARIAA
HRADKDEFLATHDQSPIPPADRGAFDGLRYFDIDASFRVAARYQPARDPEAVELETTRGPPAEYTRAAVLGFDLGDSHHTLTAFRVEGESSLF
VPFTDETTDDGRTYEHGRYLDVDPAGADGGDEVALDFNLAYNPFCAYGGSFSCALPPADNHVPAAITAGERVDADLEHHHHHH -diasterio
/Local/AutoAssign1.14/bin/missing_shifts.pl -printstats test_bmrb.bmrb > missing_HsR50_101109
/Local/AutoAssign1.14/bin/validate_assignments.pl test_bmrb.bmrb > vali_HsR50_101109
cp test_bmrb.bmrb HsR50_bb.bmrb
rm test_bmrb.bmrb


Three scripts are run: 1) sparkyRL2bmrb.pl, 2) missing_shifts.pl, and 3) validate_assignments.pl. In addition, a bmrb parsing module BMRBParsing.pm is called that interprets the sequence in single letter code and returns numbering in the bmrb file, in this case starting from residue 1.

Newer file versions are available in later versions of the AutoAssign program that should handle bmrb 3.1 format.


Output interpretation

The output interpretation is straightforward. A view of the output for res. 189-191 from the validation script is shown below, the summary of errors at the bottom of the file provides quick list of overall problems to the scientist:

D189    Overall: Consistent     Typing: Consistent     SRO: Consistent     C Shifts: Consistent     H Shifts: Consistent
    PRTL>>    D 0.28   L 0.17   N 0.12   C 0.11   K 0.09   F 0.09   Y 0.08   R 0.01  
    HN Overlap>>     D13 R126
    C Shift Assignments>>     C :: 176.670     CA :: 54.404     CB :: 40.809
    H Shift Assignments>>     H :: 8.458     HA :: 4.591

L190    Overall: Consistent     Typing: Consistent     SRO: Consistent     C Shifts: Consistent     H Shifts: Consistent
    PRTL>>    L 0.22   K 0.2   D 0.14   R 0.12   C 0.1   F 0.09   Y 0.07   N 0.02  
    HN Overlap>>     A20
    C Shift Assignments>>     C :: 177.898     CA :: 55.723     CB :: 41.968
    H Shift Assignments>>     H :: 8.115     HA :: 4.191

E191    Overall: Consistent     Typing: Consistent     SRO: Consistent     C Shifts: Consistent     H Shifts: Consistent
    PRTL>>    E 0.14   H 0.13   W 0.13   R 0.13   Q 0.13   C 0.11   K 0.1   M 0.05   I 0.02   V 0.01  
    C Shift Assignments>>     C :: 176.524     CA :: 57.003     CB :: 29.910
    H Shift Assignments>>     H :: 8.226     HA :: 4.107

Error Summary:
G92    HA2 = 5.318(S),     Expected =  3.95, Std = 0.4000, ChiSquare = 6.2621e-04
P116    HA = 5.681(S),     Expected =  4.41, Std = 0.3600, ChiSquare = 4.1469e-04
R132    Typing: Mistyped
R132    CB = 38.217(S),     Expected = 30.66, Std = 1.7700, ChiSquare = 1.9592e-05
A160    HB = 0.090(S),     Expected =  1.38, Std = 0.2500, ChiSquare = 2.4695e-07
T181    HA = 2.217(S),     Expected =  4.48, Std = 0.5000, ChiSquare = 6.0111e-06

Several issues are flagged in the error summary for this entry, proton frequency out of range and CB for R132 out of range to indicate possible mis-assignment.


Below is a view of the missing_shift.pl script.  For the protein in the example, backbone assignment only was conducted (unlisted atoms are present in the bmrb):

D189:    HB2  HB3 
L190:    CD1  CD2  CG  HB2  HB3  HD1  HD2  HG 
E191:    CG  HB2  HB3  HG2  HG3 

AtomType Completeness Statistics:
                aromatic completeness ::    0 /  174 =   0.00%
                backbone completeness ::  845 /  965 =  87.56%
                sidechain completeness ::  227 / 1244 =  18.25%
                unambiguous CH2 completeness ::    0 /   20 =   0.00%
                unambiguous CH3 completeness ::    0 /   32 =   0.00%


       C ::  168 /  197 =  85.28%
      CA ::  181 /  197 =  91.88%
      CB ::  167 /  183 =  91.26%
       H ::  160 /  180 =  88.89%
      HA ::  156 /  183 =  85.25%
     HA2 ::   11 /   14 =  78.57%
     HA3 ::    9 /   14 =  64.29%
      HB ::    8 /   56 =  14.29%