AVS

From NESG Wiki
Jump to navigation Jump to search

Introduction

Assignment validation suite (AVS) checks the chemical shifts list in BioMagResBank (BMRB) format for a number of possible problems such as consistency to IUPAC residue/atom naming, chemical shifts that are widely outside the typical range for the particular atom/residue, and reports useful statistics information about the examined chemical shift set (e.g. percents assignments, number of stereospecifically assigned methyls, percents aromatic sidechain assignments, etc).   AVS is run on every chemical shift set that is submitted to the BMRB, and is included as part of the Protein Structure Validation Suite (PSVS). 


Practical Aspects

Running AVS

A number of version for the standalone AVS routine exist that are adapted for different bmrb versions  (2.1 or 3.1). Two perl scripts can be run from any directory on any computer running perl by either pointing to the local AutoAssign script repository directory or by downloading the scripts linked below.  Here an example script is provideded that generates the bmrb in 2.1 format directly from the sparky resonance list 'rl' and the protein sequences.  The script validates and computes the completeness statistics for the generated chemical shift list.  As modifications are made in the sparky project the operation is repeated until a final bmrb file is achieved.

/Local/AutoAssign1.14/bin/sparkyRL2bmrb.pl HsR50_bb.rl test_bmrb.bmrb 1 MSPIPLPVTDTDDAWRARIAA
HRADKDEFLATHDQSPIPPADRGAFDGLRYFDIDASFRVAARYQPARDPEAVELETTRGPPAEYTRAAVLGFDLGDSHHTLTAFRVEGESSLF
VPFTDETTDDGRTYEHGRYLDVDPAGADGGDEVALDFNLAYNPFCAYGGSFSCALPPADNHVPAAITAGERVDADLEHHHHHH -diasterio
/Local/AutoAssign1.14/bin/missing_shifts.pl -printstats test_bmrb.bmrb > missing_HsR50_101109
/Local/AutoAssign1.14/bin/validate_assignments.pl test_bmrb.bmrb > vali_HsR50_101109
cp test_bmrb.bmrb HsR50_bb.bmrb
rm test_bmrb.bmrb


Three scripts are run: 1) sparkyRL2bmrb.pl, 2) missing_shifts.pl, and 3) validate_assignments.pl. In addition, a bmrb parsing module BMRBParsing.pm is called that interprets the sequence in single letter code and returns numbering in the bmrb file, in this case starting from residue 1.

Newer file versions are available in later versions of the AutoAssign program that should handle bmrb 3.1 format.


Output interpretation

The output interpretation is straightforward. A view of the output for res. 189-191 from the validation script is shown below, the summary of errors at the bottom of the file provides quick list of overall problems to the scientist:

D189    Overall: Consistent     Typing: Consistent     SRO: Consistent     C Shifts: Consistent     H Shifts: Consistent
    PRTL>>    D 0.28   L 0.17   N 0.12   C 0.11   K 0.09   F 0.09   Y 0.08   R 0.01  
    HN Overlap>>     D13 R126
    C Shift Assignments>>     C :: 176.670     CA :: 54.404     CB :: 40.809
    H Shift Assignments>>     H :: 8.458     HA :: 4.591

L190    Overall: Consistent     Typing: Consistent     SRO: Consistent     C Shifts: Consistent     H Shifts: Consistent
    PRTL>>    L 0.22   K 0.2   D 0.14   R 0.12   C 0.1   F 0.09   Y 0.07   N 0.02  
    HN Overlap>>     A20
    C Shift Assignments>>     C :: 177.898     CA :: 55.723     CB :: 41.968
    H Shift Assignments>>     H :: 8.115     HA :: 4.191

E191    Overall: Consistent     Typing: Consistent     SRO: Consistent     C Shifts: Consistent     H Shifts: Consistent
    PRTL>>    E 0.14   H 0.13   W 0.13   R 0.13   Q 0.13   C 0.11   K 0.1   M 0.05   I 0.02   V 0.01  
    C Shift Assignments>>     C :: 176.524     CA :: 57.003     CB :: 29.910
    H Shift Assignments>>     H :: 8.226     HA :: 4.107

Error Summary:
G92    HA2 = 5.318(S),     Expected =  3.95, Std = 0.4000, ChiSquare = 6.2621e-04
P116    HA = 5.681(S),     Expected =  4.41, Std = 0.3600, ChiSquare = 4.1469e-04
R132    Typing: Mistyped
R132    CB = 38.217(S),     Expected = 30.66, Std = 1.7700, ChiSquare = 1.9592e-05
A160    HB = 0.090(S),     Expected =  1.38, Std = 0.2500, ChiSquare = 2.4695e-07
T181    HA = 2.217(S),     Expected =  4.48, Std = 0.5000, ChiSquare = 6.0111e-06

Several issues are flagged in the error summary for this entry, proton frequency out of range and CB for R132 out of range to indicate possible mis-assignment.


Below is a view of the missing_shift.pl script.  For the protein in the example, backbone assignment only was conducted (unlisted atoms are present in the bmrb):

D189:    HB2  HB3 
L190:    CD1  CD2  CG  HB2  HB3  HD1  HD2  HG 
E191:    CG  HB2  HB3  HG2  HG3 

AtomType Completeness Statistics:
                aromatic completeness ::    0 /  174 =   0.00%
                backbone completeness ::  845 /  965 =  87.56%
                sidechain completeness ::  227 / 1244 =  18.25%
                unambiguous CH2 completeness ::    0 /   20 =   0.00%
                unambiguous CH3 completeness ::    0 /   32 =   0.00%


       C ::  168 /  197 =  85.28%
      CA ::  181 /  197 =  91.88%
      CB ::  167 /  183 =  91.26%
       H ::  160 /  180 =  88.89%
      HA ::  156 /  183 =  85.25%
     HA2 ::   11 /   14 =  78.57%
     HA3 ::    9 /   14 =  64.29%
      HB ::    8 /   56 =  14.29%