Proteins - Part I: Basics and Martinize 2
In case of issues, please contact thallmair@fias.uni-frankfurt.de or paulo.telles_de_souza@ens-lyon.fr.
Updated by c.s.brasnett@rug.nl, luis.borges@ens-lyon.fr (October 2024). Find the original version elsewhere.Summary
Introduction
Accordingly with Martini 3 rules [1], the Martini Protein model groups 2-4 non-hydrogen atoms together in one coarse-grained bead. Each residue has one backbone bead and 0-5 side-chain beads depending on the side chain of the residue. In contrast to the previous version [2-4], the secondary structure of the protein influences only the bond/angle/dihedral parameters of each residue. The backbone bead size and type depends solely of the mapping and geometry (linear or branched) of the fragment, with P2 used as default bead chemical type. Residues such as glycine, alanine, valine and proline used different bead types (with only slight differences in polarity and size) given the differences in the mapping, which were chosen to avoid overmapping problems [5]. It is noteworthy that the local secondary structure is predefined by the bonded parameters and thus imposed throughout a simulation. While simple peptides can be easily modeled with this approach, proteins usually need an extra bias to keep a stable tertiary structure, which can be provided by elastic network [6] or Gō-like models [7]. Additional improvements of the protein stability and reliability are achieved by applying the side chain corrections option to add dihedral angles which improve the side chain orientation [8].
Setting up a coarse-grained protein simulation consists basically of three steps:
- converting an atomistic protein structure into a coarse-grained structure;
- generating a suitable Martini topology;
- solvating the protein in the desired environment.
The first two steps are done using the publicly available program martinize2
. The last step can be done with the tools available in the GROMACS
package and/or with ad hoc scripts. In this part of the tutorial, basic knowledge of GROMACS
commands is assumed and not all commands will be given explicitly. Please refer to the previous tutorials and/or the GROMACS
manual.
The aim of the first module of this tutorial is to define the regular workflow and protocols to set up a coarse-grained simulation of the soluble globular protein T4-lysozyme. Three strategies are presented and compared: i- no bias; ii- elastic networks; and iii- GōMartini. In a second module of this Proteins tutorial we will cover how to insert a transmembrane peptide (KALP) in a lipid bilayer using a standard Martini 3 description.
The files required for this tutorial (including worked files) can be downloaded from here.
Soluble proteins
The downloaded file is called M3_proteins_tutorial_part1.zip
and contains a worked version (using GROMACS
2020) of this module. You do not need all folders of this file to get going on this tutorial. Only the folder template
is important. In addition, you can use the worked versions to check your own work. For this part of the tutorial, the worked files are in the martini3_worked
folder. Create your own directory to work the tutorial yourself. Instructions are given for files you need to download or get from the template folder, etc. You do not need any files to start with. Now go to your own directory.
After getting the atomistic structure of L99A T4 lysozyme (pdb code 181L) [9], you’ll need to convert it into a coarse-grained structure and to prepare a Martini 3 topology for it. Once this is done the coarse-grained structure can be minimized, solvated and simulated. The steps you need to take are the following:
- Download 181L.pdb from the Protein Data Bank. It is also available in the template folder:
wget http://www.rcsb.org/pdb/files/181L.pdb
The first step before start working with
martinize2
is to clean/fix your atomistic pdb file. You should remove water and other molecules usually used for crystallization. In case there are missing residues or loops you should complete the protein structure using your favorite modeling code (alphafold2
,modeller
, etc). Note that cofactors that can be present and might be important for the protein structure and function are not covered by this tutorial. In the case of T4 lysozyme, you only need to remove the waters and a ligand. This can simply be done using the command:grep "^ATOM" 181L.pdb > 181L_clean.pdb
The clean pdb structure can be used as input for
martinize2
(installation instructions available here), to generate both a coarse-grained structure and a Martini 3 topology file. Note thatmartinize2
might not work with older versions of python! We know it does work with versions >3.9. Have a look at the help function (i.e. runmartinize2 -h
) for the available options. Hint, valid for any system simulated with Martini: during equilibration it might be useful to have (backbone) position restraints to relax the side chains and their interaction with the solvent; we are anticipating doing this by askingmartinize2
to generate the list of atoms involved. A simple input command to martinize2 might look a bit like this:martinize2 -f 181L_clean.pdb -o t4l_only.top -x t4l_cg.pdb -p backbone -ff martini3001
We take the input file
-f
, and specify two output ones, a topology file-o
, and a coordinate file of the input structure mapped to the martini resolution-x
. The mapped structure will be to version 3 of the Martini force field via the flag-ff
.To introduce some more structure-based knowledge, we should also specify a secondary structure. Martinize2 has a couple of options for this:
-dssp
will try to use an installation of mdtraj in the python environment that martinize2 is installed in.-dssp /path/to/dssp
will look for a dssp executable in the given path, and use the output from there if it finds it. When using the-dssp
option you’ll need thedssp
binary, which determines the secondary structure classification of the protein backbone from the structure. It can be downloaded from the https://github.com/cmbi/dssp/releases. Note thatmartinize2
only works with thedssp
versions 3.1.4 or earlier, because the output changed indssp
version 4. If installed centrally, the path is usually /usr/bin/dssp.-ss etcetc
will use theetcetc
dssp formatted string (i.e. a string containing letters used by dssp to indicate different secondary structure elements) that the user provides to specify how the secondary structure is treated, which must contain the same number of characters as the residues you have in the protein. The meaning of the abbreviations is: H: α-helix, B: residue in isolated β-bridge, E: extended strand, participates in β ladder, G: 3-helix (3-10 helix), I: 5 helix (π-helix), T: hydrogen bonded turn, S: bend, C: loop or irregular. In the case of the T4 lysozyme, the string required for the-ss
argument is the following:CCHHHHHHHHHCCEEEEEECTTSCEEEETTEEEESSSCHHHHHHHHHHHHTSCCTTBCCHHHHHHHHHHHHHHHHHHHHHCTTTHHHHHHSCHHHHHHHHHHHHHHHHHHHHTCHHHHHHHHTTCHHHHHHHHHSSHHHHHSHHHHHHHHHHHHHSSSGGGC
If everything went well, the script generated three files: a coarse-grained structure (
.pdb
; Fig. 1), a master topology file (.top
), and a protein topology file (.itp
). In order to run a simulation you need two more files: the Martini topology file (martini_v3.0.0.itp
) and a run parameter file (.mdp
). You can get examples from the Martini website or from thetemplate
folder. Don’t forget to adapt the settings where needed!
Do a short minimization in vacuum (ca. 10-100 steps is enough!). Before you can generate the input files with
grompp
, you will need to check that the topology file (.top
) includes the correct Martini parameter file (.itp
). If this is not the case, change theinclude
statement. Also, you may have to generate a box, specifying the dimensions of the system, for example usinggmx editconf
. You want to make sure, the box dimensions are large enough to avoid close contacts between periodic images of the protein, but also to be considerably larger than twice the cut-off distance used in simulations. Try allowing for a minimum distance of 1 nm from the protein to any box edge. Then, copy the example parameter file, and change the relevant settings to do a minimization run. Now you are ready to do the preprocessing and minimization run:gmx editconf -f t4l_cg.pdb -d 1.0 -bt dodecahedron -o t4l_cg.gro
gmx grompp -p t4l_only.top -f minimization.mdp -c t4l_cg.gro -o minimization-vac.tpr -r t41_cg.gro
gmx mdrun -deffnm minimization-vac -v
Solvate the system with
gmx solvate
(an equilibrated water box can be downloaded here; it is calledwater.gro
. You can also find the file in the template folder. Make sure the box size is large enough (i.e. there is enough water around the molecule to avoid periodic boundary artifacts) and remember to use a larger van der Waals distance when solvating to avoid clashes, e.g.:gmx solvate -cp minimization-vac.gro -cs water.gro -radius 0.21 -o solvated.gro
As the protein is charged +8, it is recommended to add at least neutralizing counter ions; on top of that, you can add ions to reflect some ionic strength of the solution. Using a concentration of 0.15 M of NaCl is a popular choice given that is a good representation of the physiologic conditions. NOTE that you’ll need to get hold of a Martini topology file that specifies the ion topologies. Be careful in case you use strategies based on
gmx genion
. As each Martini water bead represents 4 water molecules, the concentration specified needs to be adapted to properly represent the target salt concentration.You need to update the topology to reflect the added water. Here, this is done by copying the topology for the single T4 lysozyme to
system.top
(cp t4l_only.top system.top
) and editing that to add a line with the number of water beads. The total number of W beads added bygmx solvate
can be seen in the terminal output of the command; alternatively trygrep -c W system-solvated.gro
(where W is the name of a water bead). You will then do a short energy minimization and position-restrained (NPT) equilibration of the solvated system. Since themartinize.py
script already generated position restraints (thanks to the-p
flag), all you have to do is specify“define = -DPOSRES”
in your parameter file (.mdp
).gmx grompp -p system.top -c solvated.gro -f minimization.mdp -o minimization.tpr -r solvated.gro
gmx mdrun -deffnm minimization -v
gmx grompp -p system.top -c minimization.gro -f equilibration.mdp -o equilibration.tpr -r solvated.gro
gmx mdrun -deffnm equilibration -v
Start production run (without position restraints!); if your simulation crashes, some more equilibration steps might be needed. NOTE that you will get a warning about simultaneous use of Parrinello-Rahman barostat and newly generated velocities. This can be ignored by setting the
-maxwarn 1
option.gmx grompp -p system.top -c equilibration.gro -f dynamic.mdp -o dynamic.tpr -maxwarn 1
gmx mdrun -deffnm dynamic -v
PROFIT! What sort of analysis can be done on this molecule? Start by having a look at the protein with
VMD
(use the scriptcg_bonds-v5.tcl
to show the bonds defined in the T4 lysozyme topology. More extensive instructions about viewing Martini simuations can be found in the package MartiniGlass) orpymol
. It is often convenient to convert the trajectory so that translation and rotation of the protein is removed. (NOTE, however, that the water is also rotated and this may create some unwanted effects when viewing. It is up to you, really. One thing that does make sense is to make sure that beads that belong to the same molecule, e.g. the protein, are not split across periodic boundary conditions, see the section on elastic network below for explicit instructions how to do this withgromacs
tools.) Be aware that the.gro
file given toVMD
must contain the same (number of) atoms as the.xtc
file. Therefore, if you choose to write only the protein to the.xtc
file, also prepare a.gro
file with only the protein (minimization-vac.gro
will do).gmx trjconv -f dynamic.xtc -s dynamic.tpr -fit rot+trans -o viz.xtc
Select Protein first and then System. Then, you can open the new
viz.xtc
withVMD
.vmd equilibration.gro viz.xtc
An option to display the bonds without the script
cg_bonds-v5.tcl
is to transform a.gro
file to a.pdb
file, and removed the line where is writtenENDMDL
in this.pdb
file. This option should allow you to visualize the bonds defined by youritps
in thetpr
file.gmx trjconv -f equilibration.gro -s dynamic.tpr -pbc whole -o equilibration.pdb -conect
Select System in this step. Then:
sed "/ENDMDL/d" -i equilibration.pdb
vmd equilibration.pdb viz.xtc
To take a look at your simulation with
pymol
, the trajectory must also be converted to the.pdb
format. This gives the opportunity to add bond information, which helps in viewing the molecules. Be aware that this file may be very large: you may want to reduce the number of frames. You can also reduce the size by only writing the protein coordinates (but you may want to visualize the solvent or ions as well, so it is up to you).gmx trjconv -f dynamic.xtc -s dynamic.tpr -fit rot+trans -o viz.pdb -conect
pymol viz.pdb
Standard analyses for proteins include RMSD, RMSF, and radius of gyration. Refer to tutorials for atomistic models, for example by Tsjerk Wassenaar or Justin Lemkul. For the impatient ones, or if you just need a quick reminder: RMSD and RMSF can be calculated using the
gromacs
toolsgmx rms
andgmx rmsf
, respectively.
Martini + elastic network
The aim of this second module is to see how the application of elastic networks can be combined with the Martini model to conserve the tertiary and quaternary structures more faithfully without sacrificing the structure of a protein. Please be advised that this is an active field of research and that there is as of yet no “gold standard”.
The idea is to generate a simple elastic network on the basis of a standard Martini 3 topology. The approach can be set up using martinize2
and will be shortly described below.
We recommend simulating a pure Martini coarse-grained protein (without elastic network) from the previous step, and then seeing which changes are observed when using an elastic network for the same protein. Note that you’ll need to simulate the protein for tens to hundreds of nanoseconds to see major changes in the structure; sample simulations are provided in the archive.
The first option to help preserving higher-order structure of proteins is to add to the standard Martini topology extra harmonic bonds between non-bonded beads based on a distance cut-off. Note that in standard Martini, long harmonic bonds can be already used to impose the secondary structure of extended elements (sheets) of the protein. Dihedrals could be used as well, but they tend to bring instabilities in this case – see the paper by Bulacu et al 2013 [10] for more information. Martinize2
will generate harmonic bonds between backbone beads if the option -elastic
is set. It is possible to tune the elastic bonds (e.g.: make the force constant distance dependent, change upper and lower distance cut-off, etc.) in order to make the protein behave properly. The only way to find the proper parameters is to try different options and compare the behavior of your protein to an atomistic simulation or experimental data (NMR, etc.). Be aware that elastic networks are based on your reference atomistic structure.
Here we will use basic parameters in order to show the principle.
Use martinize2 to generate the coarse-grained structure and topology as above. For the elastic network, use the following extra flags:
martinize2 -f 181L_clean.pdb -o t4l_only.top -x t4l_cg.pdb -dssp /path/to/dssp -p backbone -ff martini3001 -elastic -ef 700.0 -el 0.5 -eu 0.9 -ea 0 -ep 0
This turns on the elastic network (
-elastic
), sets the elastic bond force constant to 700 \(kJ*mol^{-1}*nm^{-2}\) (-ef 700
), the lower and upper elastic bond cut-off to 0.5 and 0.9 nm, respectively (-el 0.5
and-eu 0.9
), and makes the bond strengths independent of the bond length (elastic bond decay factor and decay power,-ea 0
and-ep 0
, respectively; these are default). Differently from the Martini 2 version, in Martini 3 the elastic network does not need to be defined as a#ifdef
statement in the.top
file. Note thatmartinize2
does not generate elastic bonds between \(i \rightarrow i+1\) and \(i \rightarrow i+2\) backbone beads, as those are already connected by bonds and angles (Fig. 2B). Also, elastic bonds are only added between beads that belong to the same chain.Proceed as before (steps 4 to 7) and start a production run. Keep in mind, we are adding a supplementary level of constraints on the protein; a supplementary relaxation step might be required (equilibration with position restraints and smaller time step for instance). If you have a set-up for the protein without elastic bonds (including solvent and possibly ions), you may use a snapshot from that simulation as the starting point here.
Martini + Gō-like model
In the third submodule of the protein tutorial, you will learn to apply Gō-like models in combination with the Martini model to conserve the tertiary and quaternary structures more faithfully. A combination of Gō-like models with Martini was presented by Poma et al. 2017 [7]. Backbone beads are connected based on a contact map and a Lennard-Jones potential describes the interaction between the connected backbone beads. Here, we give an introduction how to set up a protein with Martini 3 and a modified version of the Gō-like models. You can find more details about the new implementation in the preprint by Souza et al..
In brief, the modified version encompasses the following changes compared to the original GōMartini model: First, the Gō-like bonds are applied to virtual sites which are located at the same position as the backbone bead in the Martini protein model. They are implemented as regular non-bonded interactions which allows to treat them with the non-bonded cutoff scheme. Second, the regular non-bonded interaction between two backbone beads which are connected by a Gō-like bond is excluded. Third, a lower and upper limit for the distance between two backbone beads connected with a Gō-like bond is set.
The general work flow is the following: the Gō-like bonds are included as non-bonded interactions in special beads – called Gō beads from hereon – which will be defined in the martini_v3.0.0.itp
. For this, the martini_v3.0.0.itp
has to contain two #include
statements which are only considered if you define the variable GO_VIRT
in your topology file. This procedure is similar to the POSRES
variable in a typical Martini protein .itp
file. Two generic file names are used in the #include
statements (go_atomtypes.itp
and go_nbparams.itp
) which themselves include the specific itps for the actual protein. The coarse-grained (CG) protein structure generated by martinize2
contains the additional Gō beads; the same holds for the t4l_only.itp
(which will be the protein .itp
file generated by martinize2
in this case). The t4l_only.itp
contains also the virtual site definition of the Gō beads and an #include
statement for the exclusions between the (real) backbone beads which are connected by Gō-like bonds.
To generate the GōMartini model of T4 lysozyme, you will have to first obtain a contact map and then generate a Martini 3 protein model using martinize2
. The procedure is briefly described in the following:
To set up a GōMartini model, the contact map of the atomistic structure must first be obtained from the web server using the default settings. Alternatively, a locally executable version of the contact map generator can be found elsewhere. Please note that the requirements for the
.pdb
file uploaded to the web-server are quite strict. Thus please check carefully if your contact map is meaningful before using the contact map in the next step. In particular, the table listing the “Residue residue contacts” will be used.Once the contact map has been obtained, the topology files can be generated using martinize2 using the
-go
flag for the contact map.martinize2 -f 181L_clean.pdb -o t4l_only.top -x t4l_cg.pdb -dssp /path/to/dssp -p backbone -ff martini3001 -go contact_map.out
The above command will generate the necessary topology files for performing GōMartini simulations with this protein. As described in the in the preprint by Souza et al., there are also additional options related to the Gō model for adapting non-bonded interactions dependent on secondary structure, which may be used to optimise the dynamics of your protein. These are described further both in the preprint and in the documentation for martinize2 itself.
As mentioned above, the topology for your protein,
t4l_only.itp
now includes extra virtual sites built directly on top of backbone beads, which are used in two further files. Firstly,go_atomtypes.itp
contains an[ atomtypes ]
directive defining the virtual sites. Secondly,go_nbparams.itp
contains a[ nonbond_params ]
directive defining the Gō bonds between the virtual sites. In order to perform a simulation making use of these extra non-bonded potentials, we have to add the#include
statements to themartini_v3.0.0.itp
file. NOTE that you should do this ONLY ONCE.sed -i "s/\[ nonbond_params \]/\#ifdef GO_VIRT\n\#include \"go_atomtypes.itp\"\n\#endif\n\n\[ nonbond_params \]/" martini_v3.0.0.itp
echo -e "\n#ifdef GO_VIRT \n#include \"go_nbparams.itp\"\n#endif" >> martini_v3.0.0.itp
If you want to use the Martini 3 force field without any GōMartini model in the future, you just do not define the variable
GO_VIRT
in your topology file and the#include
statements will be skipped. But for this part of the tutorial it is important to have the line:#define GO_VIRT
in the first line of your topology file (I.E. t4l_only.top)! Please check if
martinize2
did take care of this in your master topology file (-top
). Otherwise, the Gō beads and the Gō-like bonds will not be defined and you will receive an error message fromgmx grompp
.Proceed as in the first part of the tutorial (steps 4 to 7) and start a production run. Keep in mind, we are adding a supplementary level of constraints on the protein; a supplementary relaxation step might be required (equilibration with position restraints and smaller time step for instance).
Martini + OLIVES
One final model available for Martini proteins is OLIVES, which similar to the Gō model uses non-bonded potentials between beads to retain secondary structure. OLIVES uses a different contact map approach to the standard Gō model described above, based on hydrogen bonding native contacts. Due to a number of features of the OLIVES program (such as multistate functionality), it is not integrated into Martinize2, and must be called via a dedicated routine. Further instructions for OLIVES can be found in the paper by Pedersen et al. 2017 [11], and in the examples folder of the GitHub repository.
Comparison between models
Now you’ve got three simulations of the same protein with different Martini protein models. If you do not want to wait, some pre-run trajectories can be found in the archive. One of them might fit your needs in terms of structural and dynamic behavior. If not, there are an almost infinite number of ways to further tweak the elastic network and Gō-like models.
An easy way to compare the slightly different behaviors of the proteins in the previous three cases is to follow deviation/fluctuation of the backbone during simulation (and compare it to an all-atom simulation if possible). RMSD (Fig. 3) and RMSF can be calculated using gromacs
tools (gmx rms
and gmx rmsf
). VMD
also provides a set of friendly tools to compute these quantities, but needs some tricks to be adapted to coarse-grained systems (standard keywords are not understood by VMD
on coarse-grained structures).
Tools and scripts used in this tutorial
GROMACS
(http://www.gromacs.org/)martinize2
(https://github.com/marrink-lab/vermouth-martinize)MartiniGlass
(https://github.com/Martini-Force-Field-Initiative/MartiniGlass)
Other resources
For more extensive background on the various options available in martinize2, see the documentation.
References
[1] P. C. T. Souza, et al., Martini 3: a general purpose force field for coarse-grained molecular dynamics. Nat. Methods 18, 382–388 (2021).
[2] L. Monticelli, et al., The MARTINI Coarse-Grained Force Field: Extension to Proteins. J. Chem. Theory Comput. 4, 819–834 (2008).
[3] D. H. de Jong, et al., Improved Parameters for the Martini Coarse-Grained Protein Force Field. J. Chem. Theory Comput. 9, 687–97 (2013).
[4] S. J. Marrink, H. J. Risselada, S. Yefimov, D. P. Tieleman, A. H. de Vries, The MARTINI force field: coarse grained model for biomolecular simulations. J. Phys. Chem. B 111, 7812–7824 (2007).
[5] R. Alessandri, et al., Pitfalls of the Martini Model. J. Chem. Theory Comput. 15, 5448–5460 (2019).
[6] X. Periole, M. Cavalli, S.-J. Marrink, M. A. Ceruso, Combining an Elastic Network With a Coarse-Grained Molecular Force Field: Structure, Dynamics, and Intermolecular Recognition. J. Chem. Theory Comput. 5, 2531–2543 (2009).
[7] A. B. Poma, M. Cieplak, P. E. Theodorakis, Combining the MARTINI and Structure-Based Coarse-Grained Approaches for the Molecular Dynamics Studies of Conformational Transitions in Proteins. J. Chem. Theory Comput. 13, 1366–1374 (2017).
[8] F. A. Herzog, L. Braun, I. Schoen, V. Vogel, Improved Side Chain Dynamics in MARTINI Simulations of Protein–Lipid Interfaces. J. Chem. Theory Comput. 12, 2446–2458 (2016).
[9] A. Morton, B. W. Matthews, Specificity of ligand binding in a buried nonpolar cavity of T4 lysozyme: Linkage of dynamics and structural plasticity. Biochemistry 34, 8576–8588 (1995).
[10] M. Bulacu, et al., Improved Angle Potentials for Coarse-Grained Molecular Dynamics Simulations. J. Chem. Theory Comput. 9, 3282–92 (2013).
[11] S. J. Marrink, et al., Computational Modeling of Realistic Cell Membranes. Chem. Rev. 119, 6184–6226 (2019).
[12] K. B. Pedersen, et al., OLIVES: A Go-like Model for Stabilizing Protein Structure via Hydrogen Bonding Native Contacts in the Martini 3 Coarse-Grained Force Field. J. Chem. Theory Comput. 20, 8049–8070 (2024)