.. _examples:

Examples
========

Here are a few examples of using the Python psfgen interface, that should
cover most typical uses.

These examples form the basis of the psfgen-Python test suite, so all
input topology files and protein structures for these examples can be
found `here <https://github.com/Eigenstate/psfgen/tree/master/test>`_.


Combining separate PDB files
----------------------------
This example uses multiple PDB files, containing water, ions, and two protein
fragments, and composites them into a single PSF file. Several disulfide bonds
are added with a patch.

These proteins were prepared with "capping groups" of ACE and NMA on the
N- and C-termini that are represented as their own residues rather than the
Charmm style of capping patches. Topology information for these residues is
found in `top_all36_caps.rtf`, but this is just a convention used in my lab.

Although not required to parse the test case files, the common task of aliasing
different histidine protonation states to the same topology residue definition
is done.

.. code-block:: python

    from psfgen import PsfGen
    gen = PsfGen(output="/dev/null")  # Suppress output since there's too much
    gen.read_topology("top_all36_caps.rtf")
    gen.read_topology("top_all36_prot.rtf")
    gen.read_topology("top_water_ions.rtf")

    # Set up an alias for histidine protonation states
    gen.alias_residue(top_resname="HIS", pdb_resname="HIE")
    gen.alias_residue(top_resname="HIS", pdb_resname="HID")

    # Read protein
    gen.add_segment(segid="P0", pdbfile="psf_protein_P0.pdb")
    gen.read_coords(segid="P0", filename="psf_protein_P0.pdb")

    gen.add_segment(segid="P1", pdbfile="psf_protein_P1.pdb")
    gen.read_coords(segid="P1", filename="psf_protein_P1.pdb")

    # Read waters, with 10k atoms per file to avoid PDB limitations
    # (this limitation may be fixed later)
    gen.add_segment(segid="W0", pdbfile="psf_wat_0.pdb")
    gen.read_coords(segid="W0", filename="psf_wat_0.pdb")

    gen.add_segment(segid="W1", pdbfile="psf_wat_1.pdb")
    gen.read_coords(segid="W1", filename="psf_wat_1.pdb")

    # Read ions
    gen.add_segment(segid="I", pdbfile="psf_ions.pdb")
    gen.read_coords(segid="I", filename="psf_ions.pdb")

    # Add disulfides
    gen.patch(patchname="DISU", targets=[("P0","10"), ("P0","15")])
    gen.patch(patchname="DISU", targets=[("P0","24"), ("P1","23")])
    gen.patch(patchname="DISU", targets=[("P0","11"), ("P1","11")])

    # Regenerate
    gen.regenerate_angles()
    gen.regenerate_dihedrals()

    # Write
    gen.write_psf(filename="relaxin.psf")
    gen.write_pdb(filename="relaxin.pdb")


Mutating a residue
------------------
Using the same proteins as the previous example, we'll load only the first chain
to keep the example simple. Then, we'll mutate Leu:2 to an Alanine and have
psfgen guess coordinates for new atoms.


.. code-block:: python

    from psfgen import PsfGen
    gen = PsfGen()
    gen.read_topology("top_all36_caps.rtf")
    gen.read_topology("top_all36_prot.rtf")

    # Read protein and do the mutation
    gen.add_segment(segid="P", pdbfile="psf_protein_P0.pdb",
                    mutate=[("2","ALA")])

    # Add disulfide bond
    gen.patch(patchname="DISU", targets=[("P","10"), ("P","15")])

    # Guess coordinates for ALA mutation
    gen.guess_coords()

    # Regenerate angles and dihedrals since we added a patch
    gen.regenerate_angles()
    gen.regenerate_dihedrals()

    # Save output
    gen.write_psf(filename="L2A.psf")
    gen.write_pdb(filename="L2A.pdb")


We keep the default output from the PsfGen object, so will see it in stdout (the
terminal window) and will see a few warnings about failing to set coordinates
for LEU:2 when calling `read_coords`. This is because we're trying to read in
coordinates for a residue that doesn't exist, as we have an alanine at that
location, and the warning can therefore be safely ignored.


Adding capping groups
---------------------

Let's say you have a protein without explicitly represented capping groups,
called `protein_nocaps.pdb` (again, this file can be found in the test
directory).

This example shows how the `add_segment` function can be used to
set the patches applied to the N- and C-termini. We'll apply the neutral NTER
patch to the N-terminus, and have a positively charged GLUP patch on the
C-terminus. We'll use the `residues` argument to add an ALA and a GLU to the
end of the protein chain (after resid 24) as well, as there isn't a glutamate
at the end of the input protein.

.. code-block:: python

    from psfgen import PsfGen
    gen = PsfGen()
    gen.read_topology("top_all36_prot.rtf")

    # Read protein, set the patches for the ends, and add ALA-GLU to the end
    gen.add_segment(segid="P", pdbfile="protein_nocaps.pdb",
                    first="NTER", last="GLUP",
                    residues=[("25","ALA"), ("26","GLU")])

    # Read in the coordinates we have and guess the remainder
    gen.read_coords(segid="P", filename="protein_nocaps.pdb")
    gen.guess_coords()

    # Save the result
    gen.write_psf(filename="patch_ends.psf")
    gen.write_pdb(filename="patch_ends.pdb")

You'll see in the output a lot of warnings about poorly guessed coordinates, but
this is a contrived example so this is okay.


Working with velocities
-----------------------

Psfgen can actually read and write NAMD binary files, including velocities.
Let's pretend we have a pre-equilibrated lipid membrane, and a simulation of
protein in solution. We'll take the protein and lipid and combine them while
preserving velocities from simulation, using vmd-python to figure out which
water molecules to delete to insert the membrane.

This is an extremely contrived example designed to show off how to use
:meth:`psfgen.PsfGen.read_psf` and :meth:`psfgen.PsfGen.write_namdbin`, and
isn't an actually recommended way to set up simulations. We'll pretend that
the coordinates of everything are aligned so the files can be combined, too,
and that the lipid will be exactly from z=5 to z=15.

.. code-block:: python

    from psfgen import PsfGen
    from vmd-python import atomsel, molecule

    gen = PsfGen()

    # Psfgen will attempt to load topologies listed in the PSF file, but to be
    # safe and to avoid problems with absolute vs. relative paths we explicitly
    # load them here.
    gen.read_topology("top_all36_prot.rtf")
    gen.read_topology("top_all3_lipid.rtf")

    # Read the protein segment, including coordinates and velocities.
    gen.read_psf(filename="protein_equil.psf",
                 namdbinfile="protein_equil.bin",
                 velnamdbinfile="protein_equil_vel.bin")

    # Get the segment name that was read in, will use for deleting later
    pseg = gen.get_segids()[0]

    # Now the lipid segment
    gen.read_psf(filename="popc_equil.psf",
                 namdbinfile="popc_equil.bin",
                 velnamdbinfile="popc_equil_vel.bin")

    # Load files in vmd-python to figure out which waters to delete
    pid = molecule.load("psf", "protein_equil.psf",
                        "namdbin", "protein_equil.bin")
    to_delete = set(atomsel("water and z > 5 and z < 15").get("resid"))
    delete(pid)

    # Delete overlapping waters in the water segment
    for resid in to_delete:
        gen.delete_atoms(segid=pseg, resid=resid)

    # Write the output files, including velocities
    gen.write_psf(filename="combined.psf")
    gen.write_namdbin(filename="combined.bin",
                      velocity_filename="combined_vel.bin")