Exercise 4: Side

Workshop in Computational Structural Biology (81855), Spring 2014
Exercise 4: Side-Chain Modeling
Contents
Side-chain modeling
1
Rosetta fixbb
1
Redesigning core or surface residues
3
SCWRL
3
PyRosetta
4
Exploring protein geometry
5
Manipulating protein geometry
6
Visualization and the PyMOL Mover
6
Scripting
6
Programming Exercises
7
Side-chain modeling
In this section you will predict the rotamers in the structure of Ubiquitin from a backbone-only model using
Rosetta and SCWRL, two of the leading tools for side- chain modeling.
Rosetta fixbb
This application allows you to remodel a protein's side-chains (rotamers, as well as identities), while
keeping the backbone fixed. For this purpose it requires a resfile. Resfiles are used in Rosetta to inform
the protocol you are using about specific modeling tasks you require - You can manipulate the amino
acids of a structure to be changed completely (i.e. mutated), remodeled to a better rotamer, or left
untouched.
1
Workshop in Computational Structural Biology (81855), Spring 2014
Rosetta Resfiles
Resfiles are text files that contain a header for default general behavior of any residue that does
not have specific instructions, and later, the file body, with residue specific commands. Common
resfile commands are:
• NATRO - freeze this side-chain; keep both aa type and rotamer fixed.
• NATAA - keep the amino acid type, allow change of rotamer.
• ALLAA - allow change to all possible amino acids.
• PIKAA - mutate to one of a specific list of aa types.
• NOTAA - excludes mutations to specific aa types.
We can add additional rotamers to the standard rotamer library used for each residue. The EX 1
and EX 2 statement will add extra χ1 and χ2 rotamers respectively for buried residues only, and
one can also control the amount of rotamers added or the number of neighbors a residue must
have in order to be considered "buried" (and by that the number of residues that can sample extra
rotamers). More on the resfile format in the Rosetta manual.
Inspect
the
example
resfile
found
in
the
exercise
folder
($CSBW_HOME/resources/ex4/example_resfile.txt). The header allows global commands, while
after the start, specified residues can be addressed. The NATAA statement will keep all the amino acid
types (unless stated otherwise later). However, for residue 5 in chain A, no search for rotamers will be
performed - the NATRO command will use the native rotamer, rather than search for a (perhaps) better
one.
Create the following resfiles in a text editor (with indicative names):
• Using NATAA on all amino acids
• Using NATAA and EX 1 on all amino acids
• Using NATAA, EX 1 and EX 2 on all amino acids
Next, acquire a PDB of ubiquitin stripped of its side-chains (i.e. backbone only):
$>
prody select "backbone and chain A" 1ubi --output "1ubiA_bb"
Run Rosetta to model back its side chains:
$>
fixbb.linuxgccrelease \
-database $ROSETTA_DB \
-s 1ubiA_bb.pdb \
-resfile [RESFILE] \ # replace with your first resfile
-nstruct 25 \
-scorefile scoreFixedBB.sc \
> fixbb.log
Output: 25 structures, called 1ubiA_bb_0001.pdb to 1ubiA_bb_0025.pdb.
Question 1: Align all 25 structures onto the native structure using prody align, to make sure indeed
the backbone hadn't moved. Inspect the aligned structures in PyMOL. Generally, can you tell where there
are differences between the structures?
2
Workshop in Computational Structural Biology (81855), Spring 2014
Question 2: Look at scoreFixedBB.sc. What is the range of energy scores? What is the energy of the
lowest- scoring structure?
Question 3: Apply the same run as above for the other two resfiles you were asked to create, each in a
different directory. For each resfile, what is the range of energy scores and the energy of the lowest
scoring structure?
Now let's compare the lowest-energy structures to the native PDB
Question 4: Use prody align to measure the side-chain RMSd of the lowest energy decoy (one for
each of the runs you made).
$> prody align -s "sc and noh" 1ubiA.pdb <name_of_structure_to_compare_to.pdb>
What are the RMSd values you get for each of the resfiles?
Question 5: Which resfile best recapitulated the rotamers of the native structure? Does the side-chain
RMSd correlate with the score for these structures? Why/Why not?
Redesigning core or surface residues
Core residues have more structural constraints, compared to surface residues. We assume that the
prediction will be more accurate for them than for surface residues, which are relatively free. As you
already know, one way to inspect the movement of a residue is its B-factor.
Inspect two groups of residues: residues 3, 5, 26, 30, 41, 45 and 66 (low B-factor residues, found at the
core of the structure), and the surface residues 24, 32, 60, 62, 72, 73, 74. Create two resfiles, one for
each group of residues. In each resfile, only the specified residues will be allowed to move, using NATAA,
EX 1, and EX 2. The rest of the residues are NATRO.
Question 6: Run the fixbb application to create 25 structures as before, using as input the resfiles, and
1ubiA.pdb as starting structure. Find the lowest-energy structure. for each group. What is its RMSd to
the native structure? Why?
SCWRL
SCWRL models side-chains of structures by breaking the problem into segments. Side chains are
represented as vertices in an undirected graph, where each two interacting residues are connected by an
edge. The graph can be partitioned into connected subgraphs with no edges between them. These can be
broken into biconnected components, which are graphs that cannot be disconnected by removal of a
single vertex. The combinatorial problem is reduced to finding the minimum energy of these small
biconnected components and combining the results to identify the global minimum energy conformation.
Run SCWRL:
$> scwrl3 \
-i 1ubiA_bb.pdb
\ # input file
-o 1ubi_scwrl.pdb \ # output file
> 1ubi.scwrl.log
Question 7: Find the RMS between the SCWRL-generated sidechains and the native structure. What is
its value?
Question 8: Fit the native structure, the lowest-energy structures of each resfile run, and the
SCWRL-generated structure on top of one another. What are the differences between the structures? Are
these differences located at the core or at the surface? Are they pinpointed to specific residues? Are they
important?
3
Workshop in Computational Structural Biology (81855), Spring 2014
PyRosetta
Experienced scientists with a solid foundation in programming have done great research with the power
and flexibility Rosetta has to offer. However, it is difficult to access these capabilities quickly and get
valuable results. To tackle this problem, the Rosetta community developed some high-level, more
human-friendly tools to lower the barrier of entry to Rosetta, called RosettaScripts (see Fleishman et al.)
and PyRosetta.
RosettaScripts allows writing Rosetta applications using a simple XML hierarchical syntax. We will learn to
use it in a future exercise, but for now we'll focus on PyRosetta.
PyRosetta is a Python-based interface to Rosetta functionality. Python 3 is an easy-to-learn, general
purpose programming language, so PyRosetta allows performance of customized simulations and
experiments, without having to learn the Rosetta code. It also allows working with the framework
interactively (similar to MATLAB®) in an exploratory style of research.
This part was adapted from the PyRosetta tutorials, with a few adjustments to our setting.
Copy the clean 1ubiA.pdb file you worked with in previous sections into your current directory. Open a
terminal and start IPython ($> ipython). To load the PyRosetta library, type
>>> from rosetta import *
>>> init()
The first line loads the Rosetta commands for use in the Python shell, and the second command loads
Rosetta database files. The first line may require a few seconds to load. Note that we must always first
import the Rosetta modules and initialize Rosetta with the init() command before loading a structure.
Load the file as follows:
>>> pose = pose_from_pdb("1ubiA.pdb")
This creates a Pose object that you can now work with using a variety of methods.
Examine the protein using the following query functions:
>>>
>>>
>>>
>>>
print
print
print
print
pose
pose.sequence()
"Protein has", pose.total_residue(), "residues."
pose.residue(28).name()
Question 9: What type of amino acid is residue 28?
Note
Typing pose.residue(28) returns the actual 28th residue in the PDB file, but not necessarily
"residue number 28" according to the deposited information in the PDB file. Many PDB files have
multiple peptide chains (so that in every new chain the PDB numbering goes back to 1).
Sometimes the residue numbering follows a convention from a family of homologous proteins, and
often several residues of the N-terminus do not show up in a crystal structure. In this case of
ubiquitin, the numbers are in sync. However, the antibody structure we studied in exercise 1
(1YY8) does have a numbering offset.
To find out the chain and PDB residue number of residue 500 in the antibody, copy 1yy8 to your
directory and type:
4
Workshop in Computational Structural Biology (81855), Spring 2014
>>> pose2 = pose_from_pdb("1yy8.pdb")
>>> print pose2.pdb_info().chain(500)
>>> print pose2.pdb_info().number(500)
Lookup the Rosetta internal number for residue 7 in chain B:
>>> print pose2.pdb_info().pdb2pose("B", 7)
The converse command is:
>>> print pose2.pdb_info().pose2pdb(220)
Type in print pose.seq and hit the TAB key. IPython should complete the keyword sequence for
you. Type pose. and hit the TAB key after typing the dot. You should see a list of functions available for
Pose objects.
The built-in help features can be used by typing any one of the following:
>>> Pose?
>>> ?Pose
>>> help(Pose)
Each of these will give a brief description of the Pose class and its purpose. The last form will also give a
list of function signatures for all the available functions within the class. These methods of accessing help
should work on many of the PyRosetta objects.
Exploring protein geometry
Find the φ, ψ, and χ1 dihedral angles of residue 32:
>>> print pose.phi(32)
>>> print pose.psi(32)
>>> print pose.chi(1, 32)
Next, we'll find out the N–Cα and Cα–C bond lengths of residue 32.
Access the Cartesian coordinates and use the Vector class to find the norm of the displacement vector
between the two atoms:
>>>
>>>
>>>
>>>
N_xyz = pose.residue(32).xyz("N")
CA_xyz = pose.residue(32).xyz("CA")
N_CA_vector = CA_xyz – N_xyz
print N_CA_vector.norm
These bond lengths are actual, experimental bond lengths from the crystal structure. When Rosetta
creates proteins de novo, it uses ideal values, similar to those from Engh & Huber (1991) 1 . Let’s check
how the actual bond lengths compare to Rosetta’s ideal values.
5
Workshop in Computational Structural Biology (81855), Spring 2014
Within the Rosetta database directory ( $ROSETTA_HOME/rosetta_database), enter the sub-directory
chemical/residue_type_sets/fa_standard/residue_types/l-caa and, with your text editor,
open the parameter file appropriate for residue 32. The ICOOR_INTERNAL lines give the internal
coordinates for an ideal conformation, including the torsion angle, bond angle, and bond length needed to
build each subsequent atom in the group.
Question 10: What are the N–Cα and Cα–C bond lengths?
Manipulating protein geometry
Question 11: What are the XYZ coordinates of N atom of residue 6?
We can also alter the geometry of the protein. Perform the following commands:
>>>
>>>
>>>
>>>
import math
pose.set_phi(5, -60)
pose.set_psi(5, -43)
pose.set_chi(1, 5, 180)
Question 12: What are the new XYZ coordinates of N atom of residue 6?
Note that even though dihedral angles are set in degrees, the bond angle is set in radians!
Visualization and the PyMOL Mover
What if we wish to view the changes to geometry that we have made? We can "dump" the information in a
pose object into a new PDB file with the method pose.dump_pdb("desired_filename.pdb") and
then open this PDB file in our favorite visualization software. However, constantly dumping output and
loading new files into a visualizer can be cumbersome; thus, the visualization process was streamlined
with the PyMOL_Mover, which provides a means for observing structural changes almost instantaneously
in PyMOL as they are made in PyRosetta.
First, we must open PyMOL and run a script that will cause PyMOL to listen to instructions from
PyRosetta:
PyMOL> cd /vol/ek/share/csbw/tools/pyrosetta
PyMOL> run PyMOLPyRosettaServer.py
Then, in the PyRosetta terminal, we must instantiate a PyMOL_Mover and then apply it to a pose any
time we make a change:
>>> pymol = PyMOL_Mover()
>>> pymol.apply(pose)
Make some changes to the dihedral angles of a pose and apply the PyMOL mover to watch the effect of
the new angles on the structure. For more advanced PyMOL mover options, visit this tutorial.
Scripting
You can write programs in Python to accomplish more complicated tasks. You can write an entire program
(e.g.: rama.py) and then run it either from the command line:
$> python rama.py
or from inside an IPython shell:
6
Workshop in Computational Structural Biology (81855), Spring 2014
>>> import rama
Read the following sample program, and make sure you understand what it does.
from rosetta import *
init()
p = pose_from_pdb("1ubiA.pdb")
for i in range(1, p.total_residue() + 1):
print i, "phi =", p.phi(i), "psi =", p.psi(i)
Copy this program into a new file that will be called rama.py, and make sure you can successfully run it
from the command line.
Programming Exercises
For each of these tasks, submit your script file and your output.
Question 13: Ideal helix: write a program to create a 20-residue ideal helix by setting the φ and ψ angles
to the typical values for an α-helix. To start, use pose = pose_from_sequence('AAA') to create a
new pose, except use 20 As in the argument to create a 20-residue poly-alanine. Output your structure
using pose.dump_pdb('helix.pdb'). Explain how you obtained torsion angle values for the helix.
Question 14: View your new file in PyMOL to check your work. How can you be sure your structure is a
proper α-helix? List distinct structural characteristics that you can check or different methods to verify your
structure is helical.
Question 15: (Bonus) write a PyRosetta version of MyMinimize.cc.
1
2
3
R. A. Engh & R. Huber, Accurate bond and angle parameters for X-ray protein
structure refinement, Acta. Cryst. A 47, 392–400 (1991).
J. Parsons et al., Practical conversion from torsion space to Cartesian space for in
silico protein synthesis, J. Comp. Chem. 26, 1063–1068 (2005).
Python help available at http://docs.python.org.
7