AutoStructure

From NESG Wiki
Revision as of 21:25, 6 January 2010 by Jma (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Introduction

AutoStructure is a protein structure determination tool that uses uninterpreted NOESY cross peaks together with structure calculation programs like XPLOR or DYANA to generate a 3D structure of the protein that is as close to the true structure as possible [1]. AutoStructure uses an iterative bottom-up topology-constrained approach to analyze NOE peak lists. It first builds an initial fold based on intraresidue and sequential NOESY data, together with characteristic NOE patterns of secondary structures, including helical medium-range NOE interactions and interstrand beta-sheet NOE interactions, and unique long-range packing NOE interactions based on chemical shift matching and symmetry considerations. Unassigned NOESY cross peaks are not used in structure calculations. Additional NOESY cross peaks are iteratively assigned using intermediate structures and the knowledge of high-order topology constraints of alpha-helix and beta-sheet packing geometries. This protocol, in principal, resembles the methodology that an expert would utilize in manually solving a protein structure by NMR.

The RPF program within AutoStructure uses a novel, rapid, and simple approach for calculating global structure quality scores [2]. Specifically, this program calculates RECALL, PRECISION, and F-MEASURE (RPF) scores for the query structures, which are statistical quality scores commonly used in the field of information retrieval. These scores quickly provide the goodness-of-fit of the query structures when compared to the NOESY peak lists and resonance assignment data. The program also presents false positive and false negative data that can be used in refining the NMR structure determination protocols.

The input for AutoStructure/RPF includes the amino acid sequence, a list of resonance assignments, lists of 2D, 3D and/or 4D-NOESY cross peaks. Query structure(s) are needed for RPF.

As of Nov. 2009, the latest manual for AutoStructure covers version 2.1.1; the current version of the program is 2.2.1.  This version does not support structure calculations on dimers or including RDC's; a new version of the program is in development, and will support these options as well as AS-DP.  Please contact Janet Huang for further information on future releases.

 

Getting Started

Input Files

The following files are required to perform an AutoStructure run.

Protein sequence file

The sequence file must be in the following format:

1   @   MET    2   @   GLU    3   @   PHE    4   @   PRO    5   @   ASP
6   @   LEU    7   @   THR    8   @   VAL    9   @   GLU    10   @   ILE
....etc.

Here is an example sequence file.

Chemical shift assignment file

The chemical shift assignments for the protein must be in BMRB 2.1 format.  The header information is ignored.

AutoSructure does interpret the ambiguity code column.  This is important for denoting stereospecific assignments.

1       1       Met     HA      H       3.999   .       1
2       1       Met     HB2     H       2.011   .       2
3       1       Met     HB3     H       1.946   .       2
4       1       Met     HG2     H       2.368   .       2
5       1       Met     HG3     H       2.274   .       2
6       1       Met     CA      C       55.110  .       1
7       1       Met     CB      C       33.147  .       1
8       1       Met     CG      C       30.951  .       1
9       2       Glu     HA      H       4.548   .       1
....etc.

Here is an example bmrb file.

NOESY peak lists

AutoStructure can accept 3D and 4D NOESY peak lists.  In principle, any column formated peak list file can be read by the program; we typically use either Sparky or Xeasy formats. 

Here are example 15N-edited NOESY and 13C-edited NOESY peak lists.

The column definitions and tolerances are defined in the control file (see below). 

Other input files

Other files that can be included in an AutoStructure calculation inlcude:

  • HN-Hα J-couplings:  used for initial secondary structure anlysis
  • slow exchanging NH list:  used for initial secondary structrue anlysis
  • dihedral angle constraints:  used directly in CYANA/XPLOR structure calculations
  • maunal distance constraints:  used directly in CYANA/XPLOR structure calculations

Graphical User Interface

AutoStructure features a GUI which is particularly useful for preparing the control file for and subsequent analysis of structure calulations.

The main page (Figure 1) features the following pull-down menus:

  • File:  manipulation of control file
  • AutoStructure:  starting an AutoStructure run and analysis of an output directory
  • AutoQF(RPF):  starting an RPF analysis and analysis of an RPF directory
  • PDB Tools:  tools for converting PDB coordinates to IUPAC format
  • Misc Tools

Figure 1:  AutoStructure main page

AS mainpage.png

Control File

The control file is the central file which defines the input, parameter, and execution files and options for an AutoStructure run.

It is easiest to manipulate the control file using the GUI.  The control file in the GUI is subdivided into three main sections:

  • General:  general input files for the AutoStructure run (Figure 2).

Figure 2: AutoStructure control file: General Section

AScontrolfile general.png

  • Command:  specific command input and selections for the AutoStructure run.  This will be discussed in another section.
  • Peak Lists:  peak lists for the AutoStructure run (Figure 3). Note that AutoStructure can take into account aliasing in any dimension; simply enter the sweep width of the aliased dimesnion(s).  Also, we generally combine aliphatic and aromatic 13C-edited NOESY peak lists into a single list for an AutoStructure run.

Figure 3: AutoStructure control file: PeakList Section

AScontrolfile peaks.png


Starting a Calculation

The user can start a calculation from either the graphical user face or the command line. 

1. Launching a calculation from the GUI.

  • Under the Autostructure pull-down on the main page, choosing Start -> Calc opens the dialogue box below. 

AS startCalc.png

  • Launching calculations from the GUI uses is generally slow since only the processors on your own machine are used.  To speed up the calculations use the command line on a cluster.


2.  Launching a calculation from the command line.

  • Login in the a cluster.  At CABM we use AutoStructure version 2.2.1 on hummer.
  • A simple command line run can be started as follows:
/farm/software/AutoStructure/AutoStructure-2.2.1/bin/autostructure -c controlfile_CYANArun -o testCYANArun.out -v
  • Running the autostructure command gives the following options (like those avaliable in the GUI):
 AutoStructure/RPF Version 2.2.1 Copyright(C) 2007
     Center for Advanced Biotechnology and Medicine (CABM)
     Rutgers University

     Options:
         -c control_file      Required
         -o output_dir        Required
         -d                   For debug
         -h                   Help
         -m                   Exclude PCT-filter of Cycle1 for symmetry analysis
         -n                   Exclude CSI-based secondary structure analysis
         -i structure_file    inital fold for bootstrapping
         -j                   Include J-coupling constant data for angle constraint analysis (HYPER)
         -k float_number      Calibration coefficient
         -N                   Include NOE assignments in HYPER caluclation (under development)
         -q structure_file    AutoQF-Calculate the F and DP scores of the input structure_file (IUPAC naming)
         -r path              Restore from a prior outout_dir
         -R                   Include rotamer constraints in HYPER calculation (under development)
         -v                   Calculate the M score and average shifts

  

References

1.    Huang, Y.J., Tejero, R., Powers, R. and Montelione, G.T. (2006) A topology-constrained distance network algorithm for protein structure determination from NOESY data, Proteins 62, 587-603.

2.    Huang, Y.J., Powers, R. and Montelione, G.T. (2005) Protein NMR Recall, Precision, and F-measure scores (RPF scores): structure quality assessment measures based on information retrieval statistics.J. Am. Chem. Soc. 127, 1665-1674.