Workshop in Computational Structural Biology (81855), Spring 2014 Exercise 4: Side-Chain Modeling Contents Side-chain modeling 1 Rosetta fixbb 1 Redesigning core or surface residues 3 SCWRL 3 PyRosetta 4 Exploring protein geometry 5 Manipulating protein geometry 6 Visualization and the PyMOL Mover 6 Scripting 6 Programming Exercises 7 Side-chain modeling In this section you will predict the rotamers in the structure of Ubiquitin from a backbone-only model using Rosetta and SCWRL, two of the leading tools for side- chain modeling. Rosetta fixbb This application allows you to remodel a protein's side-chains (rotamers, as well as identities), while keeping the backbone fixed. For this purpose it requires a resfile. Resfiles are used in Rosetta to inform the protocol you are using about specific modeling tasks you require - You can manipulate the amino acids of a structure to be changed completely (i.e. mutated), remodeled to a better rotamer, or left untouched. 1 Workshop in Computational Structural Biology (81855), Spring 2014 Rosetta Resfiles Resfiles are text files that contain a header for default general behavior of any residue that does not have specific instructions, and later, the file body, with residue specific commands. Common resfile commands are: • NATRO - freeze this side-chain; keep both aa type and rotamer fixed. • NATAA - keep the amino acid type, allow change of rotamer. • ALLAA - allow change to all possible amino acids. • PIKAA - mutate to one of a specific list of aa types. • NOTAA - excludes mutations to specific aa types. We can add additional rotamers to the standard rotamer library used for each residue. The EX 1 and EX 2 statement will add extra χ1 and χ2 rotamers respectively for buried residues only, and one can also control the amount of rotamers added or the number of neighbors a residue must have in order to be considered "buried" (and by that the number of residues that can sample extra rotamers). More on the resfile format in the Rosetta manual. Inspect the example resfile found in the exercise folder ($CSBW_HOME/resources/ex4/example_resfile.txt). The header allows global commands, while after the start, specified residues can be addressed. The NATAA statement will keep all the amino acid types (unless stated otherwise later). However, for residue 5 in chain A, no search for rotamers will be performed - the NATRO command will use the native rotamer, rather than search for a (perhaps) better one. Create the following resfiles in a text editor (with indicative names): • Using NATAA on all amino acids • Using NATAA and EX 1 on all amino acids • Using NATAA, EX 1 and EX 2 on all amino acids Next, acquire a PDB of ubiquitin stripped of its side-chains (i.e. backbone only): $> prody select "backbone and chain A" 1ubi --output "1ubiA_bb" Run Rosetta to model back its side chains: $> fixbb.linuxgccrelease \ -database $ROSETTA_DB \ -s 1ubiA_bb.pdb \ -resfile [RESFILE] \ # replace with your first resfile -nstruct 25 \ -scorefile scoreFixedBB.sc \ > fixbb.log Output: 25 structures, called 1ubiA_bb_0001.pdb to 1ubiA_bb_0025.pdb. Question 1: Align all 25 structures onto the native structure using prody align, to make sure indeed the backbone hadn't moved. Inspect the aligned structures in PyMOL. Generally, can you tell where there are differences between the structures? 2 Workshop in Computational Structural Biology (81855), Spring 2014 Question 2: Look at scoreFixedBB.sc. What is the range of energy scores? What is the energy of the lowest- scoring structure? Question 3: Apply the same run as above for the other two resfiles you were asked to create, each in a different directory. For each resfile, what is the range of energy scores and the energy of the lowest scoring structure? Now let's compare the lowest-energy structures to the native PDB Question 4: Use prody align to measure the side-chain RMSd of the lowest energy decoy (one for each of the runs you made). $> prody align -s "sc and noh" 1ubiA.pdb <name_of_structure_to_compare_to.pdb> What are the RMSd values you get for each of the resfiles? Question 5: Which resfile best recapitulated the rotamers of the native structure? Does the side-chain RMSd correlate with the score for these structures? Why/Why not? Redesigning core or surface residues Core residues have more structural constraints, compared to surface residues. We assume that the prediction will be more accurate for them than for surface residues, which are relatively free. As you already know, one way to inspect the movement of a residue is its B-factor. Inspect two groups of residues: residues 3, 5, 26, 30, 41, 45 and 66 (low B-factor residues, found at the core of the structure), and the surface residues 24, 32, 60, 62, 72, 73, 74. Create two resfiles, one for each group of residues. In each resfile, only the specified residues will be allowed to move, using NATAA, EX 1, and EX 2. The rest of the residues are NATRO. Question 6: Run the fixbb application to create 25 structures as before, using as input the resfiles, and 1ubiA.pdb as starting structure. Find the lowest-energy structure. for each group. What is its RMSd to the native structure? Why? SCWRL SCWRL models side-chains of structures by breaking the problem into segments. Side chains are represented as vertices in an undirected graph, where each two interacting residues are connected by an edge. The graph can be partitioned into connected subgraphs with no edges between them. These can be broken into biconnected components, which are graphs that cannot be disconnected by removal of a single vertex. The combinatorial problem is reduced to finding the minimum energy of these small biconnected components and combining the results to identify the global minimum energy conformation. Run SCWRL: $> scwrl3 \ -i 1ubiA_bb.pdb \ # input file -o 1ubi_scwrl.pdb \ # output file > 1ubi.scwrl.log Question 7: Find the RMS between the SCWRL-generated sidechains and the native structure. What is its value? Question 8: Fit the native structure, the lowest-energy structures of each resfile run, and the SCWRL-generated structure on top of one another. What are the differences between the structures? Are these differences located at the core or at the surface? Are they pinpointed to specific residues? Are they important? 3 Workshop in Computational Structural Biology (81855), Spring 2014 PyRosetta Experienced scientists with a solid foundation in programming have done great research with the power and flexibility Rosetta has to offer. However, it is difficult to access these capabilities quickly and get valuable results. To tackle this problem, the Rosetta community developed some high-level, more human-friendly tools to lower the barrier of entry to Rosetta, called RosettaScripts (see Fleishman et al.) and PyRosetta. RosettaScripts allows writing Rosetta applications using a simple XML hierarchical syntax. We will learn to use it in a future exercise, but for now we'll focus on PyRosetta. PyRosetta is a Python-based interface to Rosetta functionality. Python 3 is an easy-to-learn, general purpose programming language, so PyRosetta allows performance of customized simulations and experiments, without having to learn the Rosetta code. It also allows working with the framework interactively (similar to MATLAB®) in an exploratory style of research. This part was adapted from the PyRosetta tutorials, with a few adjustments to our setting. Copy the clean 1ubiA.pdb file you worked with in previous sections into your current directory. Open a terminal and start IPython ($> ipython). To load the PyRosetta library, type >>> from rosetta import * >>> init() The first line loads the Rosetta commands for use in the Python shell, and the second command loads Rosetta database files. The first line may require a few seconds to load. Note that we must always first import the Rosetta modules and initialize Rosetta with the init() command before loading a structure. Load the file as follows: >>> pose = pose_from_pdb("1ubiA.pdb") This creates a Pose object that you can now work with using a variety of methods. Examine the protein using the following query functions: >>> >>> >>> >>> print print print print pose pose.sequence() "Protein has", pose.total_residue(), "residues." pose.residue(28).name() Question 9: What type of amino acid is residue 28? Note Typing pose.residue(28) returns the actual 28th residue in the PDB file, but not necessarily "residue number 28" according to the deposited information in the PDB file. Many PDB files have multiple peptide chains (so that in every new chain the PDB numbering goes back to 1). Sometimes the residue numbering follows a convention from a family of homologous proteins, and often several residues of the N-terminus do not show up in a crystal structure. In this case of ubiquitin, the numbers are in sync. However, the antibody structure we studied in exercise 1 (1YY8) does have a numbering offset. To find out the chain and PDB residue number of residue 500 in the antibody, copy 1yy8 to your directory and type: 4 Workshop in Computational Structural Biology (81855), Spring 2014 >>> pose2 = pose_from_pdb("1yy8.pdb") >>> print pose2.pdb_info().chain(500) >>> print pose2.pdb_info().number(500) Lookup the Rosetta internal number for residue 7 in chain B: >>> print pose2.pdb_info().pdb2pose("B", 7) The converse command is: >>> print pose2.pdb_info().pose2pdb(220) Type in print pose.seq and hit the TAB key. IPython should complete the keyword sequence for you. Type pose. and hit the TAB key after typing the dot. You should see a list of functions available for Pose objects. The built-in help features can be used by typing any one of the following: >>> Pose? >>> ?Pose >>> help(Pose) Each of these will give a brief description of the Pose class and its purpose. The last form will also give a list of function signatures for all the available functions within the class. These methods of accessing help should work on many of the PyRosetta objects. Exploring protein geometry Find the φ, ψ, and χ1 dihedral angles of residue 32: >>> print pose.phi(32) >>> print pose.psi(32) >>> print pose.chi(1, 32) Next, we'll find out the N–Cα and Cα–C bond lengths of residue 32. Access the Cartesian coordinates and use the Vector class to find the norm of the displacement vector between the two atoms: >>> >>> >>> >>> N_xyz = pose.residue(32).xyz("N") CA_xyz = pose.residue(32).xyz("CA") N_CA_vector = CA_xyz – N_xyz print N_CA_vector.norm These bond lengths are actual, experimental bond lengths from the crystal structure. When Rosetta creates proteins de novo, it uses ideal values, similar to those from Engh & Huber (1991) 1 . Let’s check how the actual bond lengths compare to Rosetta’s ideal values. 5 Workshop in Computational Structural Biology (81855), Spring 2014 Within the Rosetta database directory ( $ROSETTA_HOME/rosetta_database), enter the sub-directory chemical/residue_type_sets/fa_standard/residue_types/l-caa and, with your text editor, open the parameter file appropriate for residue 32. The ICOOR_INTERNAL lines give the internal coordinates for an ideal conformation, including the torsion angle, bond angle, and bond length needed to build each subsequent atom in the group. Question 10: What are the N–Cα and Cα–C bond lengths? Manipulating protein geometry Question 11: What are the XYZ coordinates of N atom of residue 6? We can also alter the geometry of the protein. Perform the following commands: >>> >>> >>> >>> import math pose.set_phi(5, -60) pose.set_psi(5, -43) pose.set_chi(1, 5, 180) Question 12: What are the new XYZ coordinates of N atom of residue 6? Note that even though dihedral angles are set in degrees, the bond angle is set in radians! Visualization and the PyMOL Mover What if we wish to view the changes to geometry that we have made? We can "dump" the information in a pose object into a new PDB file with the method pose.dump_pdb("desired_filename.pdb") and then open this PDB file in our favorite visualization software. However, constantly dumping output and loading new files into a visualizer can be cumbersome; thus, the visualization process was streamlined with the PyMOL_Mover, which provides a means for observing structural changes almost instantaneously in PyMOL as they are made in PyRosetta. First, we must open PyMOL and run a script that will cause PyMOL to listen to instructions from PyRosetta: PyMOL> cd /vol/ek/share/csbw/tools/pyrosetta PyMOL> run PyMOLPyRosettaServer.py Then, in the PyRosetta terminal, we must instantiate a PyMOL_Mover and then apply it to a pose any time we make a change: >>> pymol = PyMOL_Mover() >>> pymol.apply(pose) Make some changes to the dihedral angles of a pose and apply the PyMOL mover to watch the effect of the new angles on the structure. For more advanced PyMOL mover options, visit this tutorial. Scripting You can write programs in Python to accomplish more complicated tasks. You can write an entire program (e.g.: rama.py) and then run it either from the command line: $> python rama.py or from inside an IPython shell: 6 Workshop in Computational Structural Biology (81855), Spring 2014 >>> import rama Read the following sample program, and make sure you understand what it does. from rosetta import * init() p = pose_from_pdb("1ubiA.pdb") for i in range(1, p.total_residue() + 1): print i, "phi =", p.phi(i), "psi =", p.psi(i) Copy this program into a new file that will be called rama.py, and make sure you can successfully run it from the command line. Programming Exercises For each of these tasks, submit your script file and your output. Question 13: Ideal helix: write a program to create a 20-residue ideal helix by setting the φ and ψ angles to the typical values for an α-helix. To start, use pose = pose_from_sequence('AAA') to create a new pose, except use 20 As in the argument to create a 20-residue poly-alanine. Output your structure using pose.dump_pdb('helix.pdb'). Explain how you obtained torsion angle values for the helix. Question 14: View your new file in PyMOL to check your work. How can you be sure your structure is a proper α-helix? List distinct structural characteristics that you can check or different methods to verify your structure is helical. Question 15: (Bonus) write a PyRosetta version of MyMinimize.cc. 1 2 3 R. A. Engh & R. Huber, Accurate bond and angle parameters for X-ray protein structure refinement, Acta. Cryst. A 47, 392–400 (1991). J. Parsons et al., Practical conversion from torsion space to Cartesian space for in silico protein synthesis, J. Comp. Chem. 26, 1063–1068 (2005). Python help available at http://docs.python.org. 7
© Copyright 2024 ExpyDoc