This tutorial will illustrate how to use the MMTSB Tool Set to access a variety of
    tools for template-based protein structure prediction.
    
    As an example we will predict the structure of human peptidyl-prolyl cis-trans isomerase 
    G with the following sequence:
     RPRCFFDIAINNQPAGRVVFELFSDVCPKTCENFRCLCTGEKGTGKSTQKPLH
     YKSCLFHRVVKDFMVQGGDFSEGNGRGGESIYGGFFEDESFAVKHNAAFLLSM
     ANRGKDTNGSQFFITTKPTPHLDGHHVVFGQVISGQEVVREIENQKTDAASKP
     FAEVRILSCGELIP
    1. Secondary structure prediction
    Copy the sequence into a file called 
sequence and then run PSIPRED through
    the following command:
    
    psipred.pl sequence > 2ndary.prediction
    
    
    This command will take a few minutes to complete because PSIPRED first runs 
    PSI-BLAST to obtain a sequence profile. 
    Take a look at the secondary structure. You should find that this protein is
    predicted to consist mostly of extended segments (E) rather than helices (H).
    
2. Identification of Templates
    In order to find templates from PDB structures with similar sequences we will
    run PSI-BLAST with the following command:
    
    psiblast.pl -pdb -log psiblast.log sequence > psiblast.alignments
    
    The 
-pdb option is used to indicate that only sequences of structures
    from the PDB will be searched. This command also takes a few minutes because
    the entire genomic sequence database is searched initially to build a sequence
    profile.
    The output from this command contains the top scoring alignments to known PDB 
    structures that could be used as templates. What is the function of these 
    templates? Does it match the function of the protein that we want to predict?
    The 
psiblast.pl tool can also be used to extract single alignments 
    in FASTA format from the log file with the following command:
   
    
    psiblast.pl -readlog psiblast.log -no 4 sequence > alignment.4
    
    Use this command to extract the first 10 alignments into separate files.
    
3. Template-based Modeling
    
    From the alignment files we can build template-based models with 
buildModel.pl:
    
    buildModel.pl alignment.1 > model.1.pdb
    
    This script performs a number of tasks, including side chain modeling and loop modeling 
    of loops with less than 12 residues using Modeller. 
    After the modeling is complete, the output will tell you
    which part of the structure was built and which parts are missing from the model
    that was generated.
    Repeat this step for all ten alignments.
    What is the consensus range of residues that is covered by all models?
    In the following we will truncate all of the models to the same length and score
    them to decide which one is the best model. Because we have 10 structures we will
    use the ensemble computing facility to make life easier.
    We begin by creating an ensemble from the 10 models:
    
    checkin.pl -dir ens model model.*.pdb
    
    Now truncate all of the models to the same length:
    
    ensrun.pl -new truncated -dir ens model convpdb.pl -sel : 
    
    We can minimize and score the truncated models with:
    
    ensmin.pl -par minsteps=100,dielec=rdie,epsilon=4 -dir ens truncated min
    ensrun.pl -set score:1 -dir ens min enerCHARMM.pl -par gb,nocut 
    
     
    The best model can be found with 
getprop.pl as the model with
    the most negative energy:
    
    getprop.pl -prop score -dir ens min | sort +1n
    
    Take a look at this model with VMD. You can also examine its secondary
    structure with:
    
    genseq.pl -out onesec -dssp 
    
    How well does the secondary structure from this model match the predicted
    secondary structure?
    
4. Comparison with Experimental Structure
    The experimental structure is given in the file 
native.pdb. It is also
    available from the Protein Data Bank with the ID 2GW2.
    We can compare our predicted structures with the native by calculating root
    mean square deviations of our models:
    
    ensrun.pl -set rmsnative:1 -dir ens min rms.pl -fit -out CA \
              -nowarn `pwd`/native.pdb
    
    The 
-fit option is needed to perform a least-squares fit superposition
    of the models with the native before calculating the RMSD values. 
-nowarn
    suppresses warning messages about missing atoms/residues in the experimental
    structure. In this example we will look at C-alpha RMSD values.
  
    Use 
getprop.pl again to check whether the best-scoring structure
    also corresponds to the model with the smallest RMSD:
    
    getprop.pl -prop rmsnative,score -dir ens min