AVS: Difference between revisions

From NESG Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
= '''Introduction'''  =
= '''Introduction'''  =


Assignment validation suite ([http://www.ncbi.nlm.nih.gov/pubmed/14872126 AVS]) checks the chemical shifts list in BioMagResBank (BMRB) format for a number of possible problems such as consistency to IUPAC labeling, chemical shifts that are grossly outside the typical range for the particular atom/residue, and reports useful statistics information about the examined chemical shift set (e.g. percent assignments, number of stereospecifically assigned methlys, percents aromatic sidechain assignments, etc).   AVS is run on every chemical shift set that is submitted to the BMRB, and can be included as part of the Protein Structure Validation Suite (PSVS) run.  It is advisable to run any chemical shift validation prior to structure determination steps in order to uncover problems with the assignments that could impact the performance of noesy assignments and structure calculation downstream.   
Assignment validation suite ([http://www.ncbi.nlm.nih.gov/pubmed/14872126 AVS]) checks the chemical shifts list in BioMagResBank (BMRB) format for a number of possible problems such as consistency to IUPAC residue/atom naming, chemical shifts that are widely outside the typical range for the particular atom/residue, and reports useful statistics information about the examined chemical shift set (e.g. percent assignments, number of stereospecifically assigned methlys, percents aromatic sidechain assignments, etc).   AVS is run on every chemical shift set that is submitted to the BMRB, and is included as part of the Protein Structure Validation Suite (PSVS).  It is wise to run any chemical shift validation prior to structure determination, and uncover problems with the assignments that could impact the performance of noesy assignments and structure calculation downstream.   


<br>  
<br>  
Line 19: Line 19:
Newer file versions are available in later versions of the AutoAssign program that should handle bmrb 3.1 format.  
Newer file versions are available in later versions of the AutoAssign program that should handle bmrb 3.1 format.  


<br>


 
=== Output interpretation ===
=== Output interpretation ===


The output interpretation is straightforward. A view of the output for res. 189-191 from the validation script is shown below, the summary of errors at the bottom of the file provides quick list of overall problems to the scientist:<br>  
The output interpretation is straightforward. A view of the output for res. 189-191 from the validation script is shown below, the summary of errors at the bottom of the file provides quick list of overall problems to the scientist:<br>  
<pre>
<pre>
D189    Overall: Consistent    Typing: Consistent    SRO: Consistent    C Shifts: Consistent    H Shifts: Consistent
D189    Overall: Consistent    Typing: Consistent    SRO: Consistent    C Shifts: Consistent    H Shifts: Consistent
     PRTL&gt;&gt;    D 0.28  L 0.17  N 0.12  C 0.11  K 0.09  F 0.09  Y 0.08  R 0.01   
     PRTL&gt;&gt;    D 0.28  L 0.17  N 0.12  C 0.11  K 0.09  F 0.09  Y 0.08  R 0.01   
     HN Overlap&gt;&gt;    D13 R126
     HN Overlap&gt;&gt;    D13 R126
     C Shift Assignments&gt;&gt;    C :: 176.670    CA :: 54.404    CB :: 40.809
     C Shift Assignments&gt;&gt;    C&nbsp;:: 176.670    CA&nbsp;:: 54.404    CB&nbsp;:: 40.809
     H Shift Assignments&gt;&gt;    H :: 8.458    HA :: 4.591
     H Shift Assignments&gt;&gt;    H&nbsp;:: 8.458    HA&nbsp;:: 4.591


L190    Overall: Consistent    Typing: Consistent    SRO: Consistent    C Shifts: Consistent    H Shifts: Consistent
L190    Overall: Consistent    Typing: Consistent    SRO: Consistent    C Shifts: Consistent    H Shifts: Consistent
     PRTL&gt;&gt;    L 0.22  K 0.2  D 0.14  R 0.12  C 0.1  F 0.09  Y 0.07  N 0.02   
     PRTL&gt;&gt;    L 0.22  K 0.2  D 0.14  R 0.12  C 0.1  F 0.09  Y 0.07  N 0.02   
     HN Overlap&gt;&gt;    A20
     HN Overlap&gt;&gt;    A20
     C Shift Assignments&gt;&gt;    C :: 177.898    CA :: 55.723    CB :: 41.968
     C Shift Assignments&gt;&gt;    C&nbsp;:: 177.898    CA&nbsp;:: 55.723    CB&nbsp;:: 41.968
     H Shift Assignments&gt;&gt;    H :: 8.115    HA :: 4.191
     H Shift Assignments&gt;&gt;    H&nbsp;:: 8.115    HA&nbsp;:: 4.191


E191    Overall: Consistent    Typing: Consistent    SRO: Consistent    C Shifts: Consistent    H Shifts: Consistent
E191    Overall: Consistent    Typing: Consistent    SRO: Consistent    C Shifts: Consistent    H Shifts: Consistent
     PRTL&gt;&gt;    E 0.14  H 0.13  W 0.13  R 0.13  Q 0.13  C 0.11  K 0.1  M 0.05  I 0.02  V 0.01   
     PRTL&gt;&gt;    E 0.14  H 0.13  W 0.13  R 0.13  Q 0.13  C 0.11  K 0.1  M 0.05  I 0.02  V 0.01   
     C Shift Assignments&gt;&gt;    C :: 176.524    CA :: 57.003    CB :: 29.910
     C Shift Assignments&gt;&gt;    C&nbsp;:: 176.524    CA&nbsp;:: 57.003    CB&nbsp;:: 29.910
     H Shift Assignments&gt;&gt;    H :: 8.226    HA :: 4.107
     H Shift Assignments&gt;&gt;    H&nbsp;:: 8.226    HA&nbsp;:: 4.107


Error Summary:
Error Summary:
Line 51: Line 50:
T181    HA = 2.217(S),    Expected =  4.48, Std = 0.5000, ChiSquare = 6.0111e-06
T181    HA = 2.217(S),    Expected =  4.48, Std = 0.5000, ChiSquare = 6.0111e-06


</pre>
</pre>  
Here is a view of the missing_shift.pl script, for the protein in the example backbone assignment only was conducted:  
Here is a view of the missing_shift.pl script.&nbsp; For the protein in the example backbone assignment only was conducted:  
<pre>D189:    HB2  HB3  
<pre>D189:    HB2  HB3  
L190:    CD1  CD2  CG  HB2  HB3  HD1  HD2  HG  
L190:    CD1  CD2  CG  HB2  HB3  HD1  HD2  HG  
Line 58: Line 57:


AtomType Completeness Statistics:
AtomType Completeness Statistics:
                 aromatic completeness ::    0 /  174 =  0.00%
                 aromatic completeness&nbsp;::    0 /  174 =  0.00%
                 backbone completeness ::  845 /  965 =  87.56%
                 backbone completeness&nbsp;::  845 /  965 =  87.56%
                 sidechain completeness ::  227 / 1244 =  18.25%
                 sidechain completeness&nbsp;::  227 / 1244 =  18.25%
                 unambiguous CH2 completeness ::    0 /  20 =  0.00%
                 unambiguous CH2 completeness&nbsp;::    0 /  20 =  0.00%
                 unambiguous CH3 completeness ::    0 /  32 =  0.00%
                 unambiguous CH3 completeness&nbsp;::    0 /  32 =  0.00%




       C ::  168 /  197 =  85.28%
       C&nbsp;::  168 /  197 =  85.28%
       CA ::  181 /  197 =  91.88%
       CA&nbsp;::  181 /  197 =  91.88%
       CB ::  167 /  183 =  91.26%
       CB&nbsp;::  167 /  183 =  91.26%
       H ::  160 /  180 =  88.89%
       H&nbsp;::  160 /  180 =  88.89%
       HA ::  156 /  183 =  85.25%
       HA&nbsp;::  156 /  183 =  85.25%
     HA2 ::  11 /  14 =  78.57%
     HA2&nbsp;::  11 /  14 =  78.57%
     HA3 ::    9 /  14 =  64.29%
     HA3&nbsp;::    9 /  14 =  64.29%
       HB ::    8 /  56 =  14.29%
       HB&nbsp;::    8 /  56 =  14.29%


</pre>
</pre>  
 
<br>
 
 
 
Editing in progress


<br>  
<br>  


PaoLo roSSi
-- PaoloRossi - 20 Nov 2009

Revision as of 21:26, 23 November 2009

Introduction

Assignment validation suite (AVS) checks the chemical shifts list in BioMagResBank (BMRB) format for a number of possible problems such as consistency to IUPAC residue/atom naming, chemical shifts that are widely outside the typical range for the particular atom/residue, and reports useful statistics information about the examined chemical shift set (e.g. percent assignments, number of stereospecifically assigned methlys, percents aromatic sidechain assignments, etc).   AVS is run on every chemical shift set that is submitted to the BMRB, and is included as part of the Protein Structure Validation Suite (PSVS).  It is wise to run any chemical shift validation prior to structure determination, and uncover problems with the assignments that could impact the performance of noesy assignments and structure calculation downstream. 


Practical Aspects

A number of version for the standalone AVS routine exist that are adapted for different bmrb versions  (2.1 or 3.1). Two perl scripts can be run from any directory on any computer running perl by either pointing to the local AutoAssign script repository directory or by downloading the scripts linked below.  Here an example script is provideded that generates the bmrb in 2.1 format directly from the sparky resonance list 'rl' and the protein sequences.  The script validates and computes the completeness statistics for the generated chemical shift list.  As modifications are made in the sparky project the operation is repeated until a final bmrb file is achieved.

/Local/AutoAssign1.14/bin/sparkyRL2bmrb.pl HsR50_bb.rl test_bmrb.bmrb 1 MSPIPLPVTDTDDAWRARIAA
HRADKDEFLATHDQSPIPPADRGAFDGLRYFDIDASFRVAARYQPARDPEAVELETTRGPPAEYTRAAVLGFDLGDSHHTLTAFRVEGESSLF
VPFTDETTDDGRTYEHGRYLDVDPAGADGGDEVALDFNLAYNPFCAYGGSFSCALPPADNHVPAAITAGERVDADLEHHHHHH -diasterio
/Local/AutoAssign1.14/bin/missing_shifts.pl -printstats test_bmrb.bmrb > missing_HsR50_101109
/Local/AutoAssign1.14/bin/validate_assignments.pl test_bmrb.bmrb > vali_HsR50_101109
cp test_bmrb.bmrb HsR50_bb.bmrb
rm test_bmrb.bmrb


Three scripts are run: 1) sparkyRL2bmrb.pl, 2) missing_shifts.pl, and 3) validate_assignments.pl. In addition, a bmrb parsing module BMRBParsing.pm is called that interprets the sequence in single letter code and returns numbering in the bmrb file, in this case starting from residue 1.

Newer file versions are available in later versions of the AutoAssign program that should handle bmrb 3.1 format.


Output interpretation

The output interpretation is straightforward. A view of the output for res. 189-191 from the validation script is shown below, the summary of errors at the bottom of the file provides quick list of overall problems to the scientist:

D189    Overall: Consistent     Typing: Consistent     SRO: Consistent     C Shifts: Consistent     H Shifts: Consistent
    PRTL>>    D 0.28   L 0.17   N 0.12   C 0.11   K 0.09   F 0.09   Y 0.08   R 0.01  
    HN Overlap>>     D13 R126
    C Shift Assignments>>     C :: 176.670     CA :: 54.404     CB :: 40.809
    H Shift Assignments>>     H :: 8.458     HA :: 4.591

L190    Overall: Consistent     Typing: Consistent     SRO: Consistent     C Shifts: Consistent     H Shifts: Consistent
    PRTL>>    L 0.22   K 0.2   D 0.14   R 0.12   C 0.1   F 0.09   Y 0.07   N 0.02  
    HN Overlap>>     A20
    C Shift Assignments>>     C :: 177.898     CA :: 55.723     CB :: 41.968
    H Shift Assignments>>     H :: 8.115     HA :: 4.191

E191    Overall: Consistent     Typing: Consistent     SRO: Consistent     C Shifts: Consistent     H Shifts: Consistent
    PRTL>>    E 0.14   H 0.13   W 0.13   R 0.13   Q 0.13   C 0.11   K 0.1   M 0.05   I 0.02   V 0.01  
    C Shift Assignments>>     C :: 176.524     CA :: 57.003     CB :: 29.910
    H Shift Assignments>>     H :: 8.226     HA :: 4.107

Error Summary:
G92    HA2 = 5.318(S),     Expected =  3.95, Std = 0.4000, ChiSquare = 6.2621e-04
P116    HA = 5.681(S),     Expected =  4.41, Std = 0.3600, ChiSquare = 4.1469e-04
R132    Typing: Mistyped
R132    CB = 38.217(S),     Expected = 30.66, Std = 1.7700, ChiSquare = 1.9592e-05
A160    HB = 0.090(S),     Expected =  1.38, Std = 0.2500, ChiSquare = 2.4695e-07
T181    HA = 2.217(S),     Expected =  4.48, Std = 0.5000, ChiSquare = 6.0111e-06

Here is a view of the missing_shift.pl script.  For the protein in the example backbone assignment only was conducted:

D189:    HB2  HB3 
L190:    CD1  CD2  CG  HB2  HB3  HD1  HD2  HG 
E191:    CG  HB2  HB3  HG2  HG3 

AtomType Completeness Statistics:
                aromatic completeness ::    0 /  174 =   0.00%
                backbone completeness ::  845 /  965 =  87.56%
                sidechain completeness ::  227 / 1244 =  18.25%
                unambiguous CH2 completeness ::    0 /   20 =   0.00%
                unambiguous CH3 completeness ::    0 /   32 =   0.00%


       C ::  168 /  197 =  85.28%
      CA ::  181 /  197 =  91.88%
      CB ::  167 /  183 =  91.26%
       H ::  160 /  180 =  88.89%
      HA ::  156 /  183 =  85.25%
     HA2 ::   11 /   14 =  78.57%
     HA3 ::    9 /   14 =  64.29%
      HB ::    8 /   56 =  14.29%



-- PaoloRossi - 20 Nov 2009