PDB file manipulation
PDB
format files from the
Protein Data Bank or other
sources usually require minor modifications with respect to residue and
atom names before they can be read into CHARMM or Amber, the main modeling
packages used by the MMTSB Tool Set. CHARMM also requires unique segment IDs
at the end of each PDB line. Format conversion convpdb.pl can convert atom and residue names for a number of specific formats. Most of the changes involve histidine residue names. The -out option followed by the format name is used for this purpose: convpdb.pl -out charmm22 1vii.orig.pdbwill match atom and residue names to the naming convention used in the CHARMM22 force field. This will also strip all PDB lines that do not begin with ATOM, i.e. all remarks, crystallographic, sequence and other information. convpdb.pl always ignores such lines in PDB input files since none of the MMTSB tools use any of this extra information. Other supported formats are charmm19 for the CHARMM19 force field and amber for suitable input to Amber's leap/tleap program that is used to setup topology and coordinate files. Also supported is generic to convert atom and residue names in CHARMM or Amber PDB output back to PDB files with generic names. Again, this involves for the most part histidine residue names, since names other than HIS are often not recognized by other programs. The MMTSB Tool Set automatically converts all PDB files to CHARMM22 format before writing them out. This is done so that the histidine protonation state is unambigously preserved through the corresponding residue naming used in CHARMM22. All tools, including convpdb.pl therefore expect input with either canonical or CHARMM22 residue names. If PDB files that were generated for or by CHARMM19 need to be read by convpdb.pl problems with histidine residues may arise and the special option -charmm19 needs to be used to indicate that the input file contains CHARMM19 residue names. As an example, let us first generate a PDB file with CHARMM19 residue naming: convpdb.pl -out charmm19 1vii.exp.pdb > c19.pdbThen try the following command to convert the file back to CHARMM22 format: convpdb.pl -charmm19 -out charmm22 c19.pdb Please note that conversion to and from CHARMM19 and CHARMM22 formats does not add or remove any hydrogen atoms. So, when a CHARMM19 output file is converted to CHARMM22 this means only that the naming convention is compatible with the CHARMM22 force field. The structure will still miss the non-polar hydrogen atoms expected by CHARMM22 since only polar hydrogens are included in CHARMM19. Another utility, complete.pl, is available for completing structures for a given force field. It is explained in more detail in another part of this tutorial section. CHARMM segment names
When PDB files are read into CHARMM they are expected to have a four letter
segment ID starting at position 73 at the end of each PDB ATOM line. Segment
IDs are used like the more common chain IDs to distinguish different molecular
segments that are not covalently bound to each other. convpdb.pl -segnames 1vii.orig.pdbThis will write out a CHARMM22 PDB file with segment IDs. CHARMM22 is the default output mode of convpdb.pl, so -out charmm22 can be omitted as in the example. Normally, convpdb.pl will ignore segment IDs when reading PDB files because PDB files from other sources may contain other information in the segment ID columns which might lead to confusion with CHARMM. It may be useful, however, to preserve existing valid segment IDs in PDB files written out by CHARMM or by convpdb.pl. In this case, the option -readseg can be given so that existing segment IDs are not discarded. This is used in the following example where a PDB file written out by CHARMM with segment IDs is converted to CHARMM19 format while preserving the original segment IDs from the input file: convpdb.pl -readseg -out charmm19 1vii.sample.1.pdb Residue numbering
Residue numbering in PDB files is important for determining
(non-)continuous fragment and identifying structure fragments, e.g. loop regions
in loop modeling problems. Especially, if only parts of a given structure are modeled
it is crucial to maintain the correct residue numbering to be able to merge again
with the rest of the protein system at a later point to form a complete structure. convpdb.pl -renumber 1 1vii.orig.pdb > 1vii1.pdbThe result is a new PDB file 1vii1.pdb where residue numbering starts at 1 instead of 41. The second option, -add <shiftvalue>, is used for maintaining relative numbering in fragmented structures while shifting all residue numbers by a constant. The third option, -match <reference PDB>, is more sophisticated. It will first align the amino acid sequence with the sequence from the reference PDB. If a complete alignment is not possible, a partial alignment of the largest fragment with exact matching residue names will be done. Any shift in residue numbers after alignment with respect to the reference is then applied to the whole molecule so that the residue numbering agrees with the reference for the matching residues. This option is useful in the following example: A villin conformation 1vii.sample.1.pdb with residue numbering starting at 1 should be compared to the experimental structure deposited in the Protein Data Bank where residue numbering starts at 41 by calculating root mean square deviations between coordinate positions. Trying to use rms.pl directly will not work because the residue numbering does not match. In this case convpdb.pl -match 1vii.orig.pdb 1vii.sample.1.pdb | rms.pl 1vii.orig.pdbwill first change residue numbering in the sample file to match the original PDB entry before passing the structure on to rms.pl for calculating an RMSD value. Chain ID
Single letter chain IDs are commonly used to distinguish units
in multidomain proteins or other types of complexes. As explained above
CHARMM does not recognize chain IDs and uses segment IDs instead.
However, chain IDs are recognized by many tools in the MMTSB Tool Set
and can be used in residue selection criteria for loop modeling or
other applications, e.g. for restraining part of a structure during
minimization.
convpdb.pl -setchain A 1poa.exp.pdb > A.pdb convpdb.pl -setchain B 1vii.exp.pdb > B.pdb convpdb.pl -merge B.pdb A.pdb > AB.pdbThe resulting file AB.pdb contains both molecules, 1POA and 1VII, distinguished by the chain IDs A and B. An alternative method is to set chain IDs automatically from the last letter in CHARMM segment IDs. This is done with the option -chainfromseg and may be useful for multidomain PDB files that were written out by CHARMM without chain IDs. |