Examples

Here are a few examples of using the Python psfgen interface, that should cover most typical uses.

These examples form the basis of the psfgen-Python test suite, so all input topology files and protein structures for these examples can be found here.

Combining separate PDB files

This example uses multiple PDB files, containing water, ions, and two protein fragments, and composites them into a single PSF file. Several disulfide bonds are added with a patch.

These proteins were prepared with “capping groups” of ACE and NMA on the N- and C-termini that are represented as their own residues rather than the Charmm style of capping patches. Topology information for these residues is found in top_all36_caps.rtf, but this is just a convention used in my lab.

Although not required to parse the test case files, the common task of aliasing different histidine protonation states to the same topology residue definition is done.

from psfgen import PsfGen
gen = PsfGen(output="/dev/null")  # Suppress output since there's too much
gen.read_topology("top_all36_caps.rtf")
gen.read_topology("top_all36_prot.rtf")
gen.read_topology("top_water_ions.rtf")

# Set up an alias for histidine protonation states
gen.alias_residue(top_resname="HIS", pdb_resname="HIE")
gen.alias_residue(top_resname="HIS", pdb_resname="HID")

# Read protein
gen.add_segment(segid="P0", pdbfile="psf_protein_P0.pdb")
gen.read_coords(segid="P0", filename="psf_protein_P0.pdb")

gen.add_segment(segid="P1", pdbfile="psf_protein_P1.pdb")
gen.read_coords(segid="P1", filename="psf_protein_P1.pdb")

# Read waters, with 10k atoms per file to avoid PDB limitations
# (this limitation may be fixed later)
gen.add_segment(segid="W0", pdbfile="psf_wat_0.pdb")
gen.read_coords(segid="W0", filename="psf_wat_0.pdb")

gen.add_segment(segid="W1", pdbfile="psf_wat_1.pdb")
gen.read_coords(segid="W1", filename="psf_wat_1.pdb")

# Read ions
gen.add_segment(segid="I", pdbfile="psf_ions.pdb")
gen.read_coords(segid="I", filename="psf_ions.pdb")

# Add disulfides
gen.patch(patchname="DISU", targets=[("P0","10"), ("P0","15")])
gen.patch(patchname="DISU", targets=[("P0","24"), ("P1","23")])
gen.patch(patchname="DISU", targets=[("P0","11"), ("P1","11")])

# Regenerate
gen.regenerate_angles()
gen.regenerate_dihedrals()

# Write
gen.write_psf(filename="relaxin.psf")
gen.write_pdb(filename="relaxin.pdb")

Mutating a residue

Using the same proteins as the previous example, we’ll load only the first chain to keep the example simple. Then, we’ll mutate Leu:2 to an Alanine and have psfgen guess coordinates for new atoms.

from psfgen import PsfGen
gen = PsfGen()
gen.read_topology("top_all36_caps.rtf")
gen.read_topology("top_all36_prot.rtf")

# Read protein and do the mutation
gen.add_segment(segid="P", pdbfile="psf_protein_P0.pdb",
                mutate=[("2","ALA")])

# Add disulfide bond
gen.patch(patchname="DISU", targets=[("P","10"), ("P","15")])

# Guess coordinates for ALA mutation
gen.guess_coords()

# Regenerate angles and dihedrals since we added a patch
gen.regenerate_angles()
gen.regenerate_dihedrals()

# Save output
gen.write_psf(filename="L2A.psf")
gen.write_pdb(filename="L2A.pdb")

We keep the default output from the PsfGen object, so will see it in stdout (the terminal window) and will see a few warnings about failing to set coordinates for LEU:2 when calling read_coords. This is because we’re trying to read in coordinates for a residue that doesn’t exist, as we have an alanine at that location, and the warning can therefore be safely ignored.

Adding capping groups

Let’s say you have a protein without explicitly represented capping groups, called protein_nocaps.pdb (again, this file can be found in the test directory).

This example shows how the add_segment function can be used to set the patches applied to the N- and C-termini. We’ll apply the neutral NTER patch to the N-terminus, and have a positively charged GLUP patch on the C-terminus. We’ll use the residues argument to add an ALA and a GLU to the end of the protein chain (after resid 24) as well, as there isn’t a glutamate at the end of the input protein.

from psfgen import PsfGen
gen = PsfGen()
gen.read_topology("top_all36_prot.rtf")

# Read protein, set the patches for the ends, and add ALA-GLU to the end
gen.add_segment(segid="P", pdbfile="protein_nocaps.pdb",
                first="NTER", last="GLUP",
                residues=[("25","ALA"), ("26","GLU")])

# Read in the coordinates we have and guess the remainder
gen.read_coords(segid="P", filename="protein_nocaps.pdb")
gen.guess_coords()

# Save the result
gen.write_psf(filename="patch_ends.psf")
gen.write_pdb(filename="patch_ends.pdb")

You’ll see in the output a lot of warnings about poorly guessed coordinates, but this is a contrived example so this is okay.

Working with velocities

Psfgen can actually read and write NAMD binary files, including velocities. Let’s pretend we have a pre-equilibrated lipid membrane, and a simulation of protein in solution. We’ll take the protein and lipid and combine them while preserving velocities from simulation, using vmd-python to figure out which water molecules to delete to insert the membrane.

This is an extremely contrived example designed to show off how to use psfgen.PsfGen.read_psf() and psfgen.PsfGen.write_namdbin(), and isn’t an actually recommended way to set up simulations. We’ll pretend that the coordinates of everything are aligned so the files can be combined, too, and that the lipid will be exactly from z=5 to z=15.

from psfgen import PsfGen
from vmd-python import atomsel, molecule

gen = PsfGen()

# Psfgen will attempt to load topologies listed in the PSF file, but to be
# safe and to avoid problems with absolute vs. relative paths we explicitly
# load them here.
gen.read_topology("top_all36_prot.rtf")
gen.read_topology("top_all3_lipid.rtf")

# Read the protein segment, including coordinates and velocities.
gen.read_psf(filename="protein_equil.psf",
             namdbinfile="protein_equil.bin",
             velnamdbinfile="protein_equil_vel.bin")

# Get the segment name that was read in, will use for deleting later
pseg = gen.get_segids()[0]

# Now the lipid segment
gen.read_psf(filename="popc_equil.psf",
             namdbinfile="popc_equil.bin",
             velnamdbinfile="popc_equil_vel.bin")

# Load files in vmd-python to figure out which waters to delete
pid = molecule.load("psf", "protein_equil.psf",
                    "namdbin", "protein_equil.bin")
to_delete = set(atomsel("water and z > 5 and z < 15").get("resid"))
delete(pid)

# Delete overlapping waters in the water segment
for resid in to_delete:
    gen.delete_atoms(segid=pseg, resid=resid)

# Write the output files, including velocities
gen.write_psf(filename="combined.psf")
gen.write_namdbin(filename="combined.bin",
                  velocity_filename="combined_vel.bin")