Python API Documentation

class PyMEMENTO.MEMENTO(working_dir: str, start_file: str, target_file: str, residue_numbers: list, forcefield='AMBER14', forcefield_paths=None, lipid=None, ligand=None, ligand_type='rigid', PLUMED_PATH='plumed', multiple_chains=None, last_step_performed='')[source]

The MEMENTO class is the core handle for interacting with the modelling system. See the examples section on how to sequentially use its functions.

__init__(working_dir: str, start_file: str, target_file: str, residue_numbers: list, forcefield='AMBER14', forcefield_paths=None, lipid=None, ligand=None, ligand_type='rigid', PLUMED_PATH='plumed', multiple_chains=None, last_step_performed='')[source]

Constructor for the MEMENTO class, which takes some initial information that every modelling run must provide.

Parameters:
  • working_dir – Path to the directory in which all intermediate files should be written.

  • start_file (str) – Path to the file containing the morphing starting structure.

  • end_file (str) – Path to the file containing the morphing target structure.

  • residue_numbers (list) – A list of residue numbers which the protein residues should have after processing (use this to take care of starting points other than 1, for example, and multichain overlapping residue numbers).

  • forcefield (str, optional) – Which forcefield to use. ‘AMBER14’ will default to integrated amber14.sb, ‘CHARMM36’ to integrated charmm36-jul2021, ‘AMBER_SLIPIDS’ to integrated amber14.sb + slipids. ‘Other’ will read the forcefield_paths keyword argument. Defaults to ‘AMBER14’.

  • forcefield_paths (list, optional) – Provice paths to a number of forcefield folders needed for running simulations, if not ‘AMBER14’ or ‘CHARMM36’, defaults to None.

  • lipid (str, optional) – Specify the lipid selection group to be employed, None means no lipid, defaults to None

  • ligand (str, optional) – Ligand selectio string for ligand in the binding pocket, which is to be moved during morphing, defaults to None

  • ligand_type (str, optional) – Determines whether the ligand is treated as ‘rigid’ and only the COM is moved, or whether it should be morphed as ‘single’ atoms, defaults to ‘rigid’

  • PLUMED_PATH (str, optional) – Path to the plumed executable, defaults to ‘plumed’

  • multiple_chains (list<str>, optional) – If the protein contains multiple chains, pass a list of lists of chain identifiers, eg [“A”, “A”, “A”, “B”, “B”, “B”, “C”, “C”, “C”] for a trimer which has residue numbers 1-3 in chain A, 4-6 in chain B and 7-9 in chain C, defaults to None

  • last_step_performed (str, optional) – If a previous run has already done some of the steps in the appropriate folder strucutre, pass the last step that was done with this keyword argument to override order consistency checking. Can be ‘modelling’, ‘pathfinding’, ‘processing’, ‘boxpreparation’ or ‘solvation’. Defaults to ‘’

find_best_path(mc_starting_temp=50, mc_steps=10000, mc_replicates=12, poolsize=1)[source]

Call up an MCPathSampler instance to construct optimum RMSD paths through the space of the generated models.

Parameters:
  • mc_starting_temp (int, optional) – ‘Temperature’ at which to start simulated annealing, defaults to 50

  • mc_steps (int, optional) – Number of MC steps to perform, defaults to 10000

  • mc_replicates (int, optional) – Number of MC replicates to perform, defaults to 12

  • poolsize (int, optional) – Number of processes to spawn to speed up calculations, defaults to 1. Note: With large universes this can cause memory issues because of an apparent bug in how MDAnalysis handles many open files in multiple processes, hence the default is 1.

make_models(number_of_models: int, include_residues=None, poolsize=12, mutagenesis=None)[source]

Use the modeller package to generate fixed models based on the morphs already present in the folder structure. Caps are removed at this stage. Mutagenesis can be performed here if desired.

Parameters:
  • number_of_models (int) – How many models to generate.

  • include_residues (list, optional) – List of all the residue_numbers to include, None means take all residues counting from 1, defaults to None

  • poolsize (int, optional) – How many multiprocessing tasks to run, defaults to 12

  • mutagenesis – List of tuples of the form (residue_number, original_residue, new_residue), eg. (10, ‘SER’, ‘ALA’) for S10A mutation. This is not supported for multichain proteins yet. Defaults to None

minimize_boxes(mdrun_flags={}, grompp_flags={})[source]

Perform an energy minimizations on the prepared boxes in the folder structure.

Parameters:
  • mdrun_flags (dict, optional) – Extra flags to pass to gmx mdrun, related to gpu and cores etc (eg. {‘ntomp’: 6}), defaults to {}

  • mdrun_flags – Extra flags to pass to gmx grompp, (eg. {‘maxwarn’: 1}), defaults to {}

morph(number_of_intermediates: int, fitting_selection='protein and name CA')[source]

Calculate linear morphs between start and target structures which the class already holds (from constructor), with a given number of intermediates.

Parameters:
  • number_of_intermediates (int) – How many frames to create.

  • fitting_selection (str, optional) – MDAnalysis selection string for fitting, defaults to “protein and name CA”

prepare_boxes(template_folder: Optional[str] = None, mdrun_flags={}, embedding_starting_scale=1.15, embedding_end_scale=0.95, embedding_steps=5)[source]

Aggregate all required files for starting to run and preprocess them in gromacs. Combine with ligand and embed in lipids (by a procedure akin to inflate_gro) if appopriate.

Parameters:
  • template_folder (str, optional) – Specify a template folder containing mdp inputs for minim, nvt, npt, prod_sc, and prod; a template topology as well as a submission script. Defaults to None (which sources a folder form the data suitable for a basic protein in water)

  • mdrun_flags (dict, optional) – Extra flags to pass to gmx mdrun, related to gpu and cores etc (eg. {‘ntomp’: 6}), defaults to {}

  • starting_scale (float, optional) – Parameter for lipid embedding, factor by which the system should initially be stretched, defaults to 1.15

  • starting_scale – Parameter for lipid embedding, factor by which the system should end up smaller if there’s no clashes, defaults to 0.95

  • steps (int, optional) – Parameter for lipid embedding, number of steps of energy minimisation, defaults to 5

prepare_umbrella_sampling(plumed_monitor_file: str, plumed_umbrella_file: str, run_scripts: list = [], extra_plumed_files: list = [], smoothen_ladder: float = 0, folder_name: str = 'umbrella/')[source]

Prepare umbrella sampling simulations from equilibrated boxes. Appropriate input files for plumed must be provided. Also provides ‘metadata.dat’ which is an input file for the grossfield wham implementation.

Parameters:
  • plumed_monitor_file (str) – Path to a plumed input file that prints the CV of interest at stride 1 to file COLVAR_MONITOR.

  • plumed_umbrella_file (str) – Path to a plumed input file that will be used for umbrella sampling. Must contain the string $REPLICAS$, which is replaced with the determined CV values. Should have KAPPA set, which is read by this function.

  • run_scripts (list, optional) – List of file paths that should be copied into the umbrella sampling folder, eg. run scripts etc, defaults to []

  • extra_plumed_files (list, optional) – List of file paths that are required for running plumed driver for monitoring CVs. Also copied into umbrella path, defaults to []

  • smoothen_ladder (float, optional) – Smoothen the CV ladder for umbrella sampling by linear interpolation between the observed CV values from the box equilibrations and an equidistant ladder. Defaults to 0, which means no smoothening.

  • folder_name (str, optional) – Name of the umbrella sampling folder to be created, defaults to “umbrella/”

process_models(caps=True, cap_type='AMBER', proline_n_term=False, his=False, his_protonation_states=[], glu=False, glu_protonation_states=[], asp=False, asp_protonation_states=[])[source]

Perform several processing steps on the identified minimal RMSD path: fixing residue numbers, aligning with starting structure, capping termini (optional) and adding H atoms via the use of ‘gmx pdb2gmx’.

Parameters:
  • caps (bool, optional) – Whether or not to add ACE and NME caps, defaults to True. Currently not supported with multichain proteins.

  • cap_type – Caps can be of the ‘AMBER’ or ‘CHARMM’ forcefield types, defaults to ‘AMBER’

  • proline_n_term (bool, optional) – Whether or not the N-term should be capped with the dihedral angle preferences of a proline, defaults to False

  • his (bool, optional) – Whether or not his protonation states should be chosen manually, defaults to False

  • his_protonation_states (list, optional) – Those his protonation states which should be chosen (as integers prompted by pdb2gmx) , defaults to []

  • glu (bool, optional) – Whether or not glu protonation states should be chosen manually, defaults to False

  • glu_protonation_states (list, optional) – Those glu protonation states which should be chosen (as integers prompted by pdb2gmx), defaults to []

  • asp (bool, optional) – Whether or not asp protonation states should be chosen manually, defaults to False

  • asp_protonation_states (list, optional) – Those asp protonation states which should be chosen (as integers prompted by pdb2gmx), defaults to []

solvate_boxes(ion_concentration=0.15)[source]

This solvates the boxes with a consistent number of solvent molecules across the boxes.

Parameters:

ion_concentration (float, optional) – Concentration of salt (NaCl) to use, defaults to 0.15

Auxiliary functions

MC Sampling of ensemble path

This contains the MCPathSampler class which handles finding the minimum RMSD path through the models.

class PyMEMENTO.mc_path_sampling.MCPathSampler(folder: str, number_of_windows: int, models_per_window: int, starting_temperature: int, mc_steps: int, number_of_replicates: int)[source]

The MCPathSampler class handles finding a minumum RMSD path through the models which were generated by modeller.

path_energy_function(path)[source]

Computes the sum of squares of the RMSD between neighbours in a given path. Serves as the energy function in the MC sampling.

Parameters:

path (list) – Path of MDAnalysis.Universe objects.

Returns:

Sum of squares of RMSD between neighbours, acting as energy function.

Return type:

float

run_monte_carlo(name, number_of_steps, annealing_progression=None, path=None, verbose=False, seed=None)[source]

This is a simple MC run procedure for our system. Returns the minimum energy configuration as a tuple (min_configuration, min_energy).

Parameters:
  • name (str) – Name of the folder in which to store the results.

  • number_of_steps (int) – How many MC steps to perform.

  • annealing_progression (list, optional) – Stepwise list of temperatures to use for annealing. If None, no annealing is performed. Defaults to None.

  • path (list, optional) – Initial path to start from. If None, a random path is generated. Defaults to None.

  • verbose (bool, optional) – Whether to print out the progress of the MC run. Defaults to False.

  • seed (int, optional) – Seed for the random number generator. If None, gets randomly assigned from python random generator. Defaults to None.

Returns:

Minimum energy configuration (path as list of Universes) and its energy.

Return type:

tuple

sample_path(poolsize=1)[source]

Perform the MC sampling using a multiprocessing pool of the desired size.

Parameters:

poolsize (int, optional) – How many multiprocessing instances to use. Setting 1 bypasses multiprocessing, defaults to 1

PyMEMENTO.mc_path_sampling.run_one_replicate_of_mc(seed, sampler)[source]

Wrapper function in order to call run_monte_carlo from a multiprocessing pool. It needs to lie outside the class so that it is picklable and can be handled with the standard multiprocessing library.

Parameters:
  • seed (int) – Seed for the random number generator.

  • sampler (MCPathSampler) – The sampler object to run.

Returns:

The pathsampling output as from run_monte_carlo.

Return type:

tuple

Gromacs utility

Some useful functions for running gromacs commands to be used by PyMEMENTO. Uses gromacswrapper.

PyMEMENTO.gmx_util.change_boxsize(file_to_process: str, boxsize_line: str)[source]

Change the dimensions of a file to values specified by a gro-formatted string.

Parameters:
  • file_to_process (str) – Path to the coordinate file to process.

  • boxsize_line (str) – box size in format of last line of gro file

PyMEMENTO.gmx_util.create_box(file_to_process: str, padding: float = 1.0)[source]

Overwrite a coordinate file with one where a cubic box of defined padding around is created.

Parameters:
  • file_to_process (float) – Path to the coordinate file to process

  • (optional) (file_to_process) – Padding for box generation, defaults to 1.0 nm

PyMEMENTO.gmx_util.crop_top_to_itp(file: str)[source]

Crop a .top file to an .itp file, removing the original file.

Parameters:

file (str) – .top tile to be processed.

PyMEMENTO.gmx_util.edit_last_line(file_to_process: str, new_last_line: str)[source]

Replace the last line of a file with a different last line.

PyMEMENTO.gmx_util.embed_in_lipids(protein_path: str, lipid_path: str, folder_path: str, lipid_selection: str, boxsize_line: str, starting_scale=1.15, end_scale=0.95, steps=5, mdrun_flags={})[source]

Embed a protein in lipids (eg from a related conformational state), which already have the rough protein shape cut out, but there may be overlaps.

Parameters:
  • protein_path (str) – Path to the protein coordinate file.

  • lipid_path (str) – Path to the lipid coordinate file.

  • folder_path (str) – Path to the folder in which to work.

  • lipid_selection (str) – MDAnalysis selecttion string for the lipids

  • boxsize_line (str) – Last line of the original lipid gro file.

  • starting_scale (float, optional) – Factor by which the system should initially be stretched, defaults to 1.15

  • starting_scale – Factor by which the system should end up smaller if there’s no clashes, defaults to 0.95

  • steps (int, optional) – Number of steps of energy minimisation, defaults to 5

  • mdrun_flags (dict, optional) – Extra flags to pass to gmx mdrun, related to gpu and cores etc (eg. {‘ntomp’: 6}), defaults to {}

PyMEMENTO.gmx_util.generate_membrane_index(file_to_process: str, folder_path: str)[source]

Generate an index file which separates the system in to Water_and_ions and Membrane.

Parameters:
  • file_to_process (str) – Path to file to generate index from.

  • folder_path (str) – Path to folder in which to place output.

PyMEMENTO.gmx_util.generate_posre(file_to_process: str, folder_path: str, group: str, exclude_res: list = [], multiple_chains: Optional[list] = None, residue_numbers: Optional[list] = None)[source]

Generate a position restraint posre.itp file.

Parameters:
  • file_to_process (str) – Path to file to generate restraints from.

  • folder_path (str) – Path to folder in which to place output.

  • group (str) – Name of group from index file to use.

  • exclude_res (list<int>, optional) – List of residues to exclude from making restraints

  • multiple_chains (list<str>, optional) – In case of a multichain protein, list of chain ID letters for each residue of the complete protein. Defaults to None.

  • residue_numbers (list<int>, optional) – In case of a multichain protein, list of residue numbers for the complete protein. Defaults to None.

PyMEMENTO.gmx_util.get_cubic_boxsize(file_to_process: str, folder_path: str, spacing: float)[source]

Estimate the cubic boxsize including a given spacing from the box edge.

Parameters:
  • file_to_process (str) – Path to the file to be analysed.

  • folder_path (str) – Path to the folder in which the processing takes place.

  • spacing (float) – Spacing from the box edge to be achieved in nm.

Returns:

Returns a tuble (size, last_line) of the box-size as number and the last line of a respective gro file.

Return type:

tuple

PyMEMENTO.gmx_util.minimize(file_to_process: str, folder_path: str, name='em', grompp_flags={}, mdrun_flags={})[source]

Energy-minimize a box. A minim.mdp file is required in the folder.

Parameters:
  • file_to_process (str) – Path to the file to minimize.

  • folder_path (str) – Path to the folder in which to minimize.

  • name (str, optional) – Name for the minimization, defaults to “em”

  • grompp_flags (dict, optional) – Extra flags to pass to gmx grompp, eg {‘maxwarn’: 1}, defaults to {}

  • mdrun_flags (dict, optional) – Extra flags to pass to gmx mdrun, eg related to gpu and cores (eg. {‘ntomp’: 6}), defaults to {}

Returns:

ID of the overlapping atom, if failed. None if this worked.

Return type:

int

Raises:

ValueError – something went wrong with the minimizaion, but not atom overlap!

PyMEMENTO.gmx_util.parse_indexkey_to_list(file_to_process: str)[source]

This extracts an ordered list of the gmx index auto-generated labels for a given coordinate file.

Parameters:

file_to_process (str) – Path to the file to process.

Returns:

List of index handles which gmx would use for the structure.

Return type:

list

PyMEMENTO.gmx_util.pdb2gmx(file_to_process: str, folder_path: str, ignh=True, his=False, his_protonation_states=[], glu=False, glu_protonation_states=[], asp=False, asp_protonation_states=[], cap_type='AMBER', multi_chain=False)[source]

Run gmx pdb2gmx on a given pdb file.

Parameters:
  • file_to_process (str) – Path to the pdb file.

  • folder_path (str) – Path to the parent folder of the pdb file.

  • ignh (bool, optional) – Whether or not to pass the “-ignh” flag, ie rebuild hydrogen atoms, defaults to True

  • his (bool, optional) – Whether or not to manually set his protonation states for consistency, defaults to False

  • his_protonation_states (list, optional) – If his states are set manually, this is a list of the desired ones, defaults to []

  • glu (bool, optional) – Whether or not to manually set glu protonation states, defaults to False

  • glu_protonation_states (list, optional) – If glu states are set manually, this is a list of the desired ones, defaults to []

  • asp (bool, optional) – Whether or not to manually set asp protonation states, defaults to False

  • asp_protonation_states (list, optional) – If asp states are set manually, this is a list of the desired ones, defaults to []

  • cap_type (str, optional) – Can be ‘AMBER’ or ‘CHARMM’, defaults to ‘AMBER’

  • multi_chain (bool, optional) – Whether or not the protein has multiple chains, determines whether PyMEMENTO tried to edit the generated top file into an itp, or keep the individual chain itp files. Defaults to False.

PyMEMENTO.gmx_util.solvate_ions(file_to_process: str, folder_path: str, ion_concentration: float, maxsol=1000000)[source]

Solvate a box which has box size but no solvent present.

Parameters:
  • file_to_process (str) – Path to the file to solvate.

  • folder_path (str) – Path to the folder in which to solvate.

  • ion_concentration (float) – Desired ion concentration (NaCl).

  • maxsol (int, optional) – Number of solvent molecules to maximally insert, defaults to 1000000

PyMEMENTO.gmx_util.stretch_boxsize(boxsize_line: str, scale_factor: float)[source]

Scale box-size string in gro formatting by a scale_factor.

Parameters:
  • boxsize_line (str) – Box-size line in gro file format.

  • scale_factor (float) – Scale factor to apply to box-size.

PyMEMENTO.gmx_util.stretch_relative(to_strech: Universe, anchor: Universe, scale_factor: float)[source]

Act on a Universe or AtomGroup to stretch it relative to the center of geometry of another Universe, in the x-y plane.

Parameters:
  • to_strech (MDAnalysis.Universe or MDAnalysis.AtomGroup) – Universe or AtomGroup to stretch.

  • anchor (MDAnalysis.Universe or MDAnalysis.AtomGroup) – Universe or AtomGroup to use as the center of geometry.

Plumed utility

This file contains functions which interface with plumed.

PyMEMENTO.plumed_util.get_monitor_value_from_xtc(file_to_process: str, plumed_monitor_file: str, PLUMED_PATH: str = 'plumed')[source]

Use plumed to extract the average value of a CV from an xtc file.

Parameters:
  • file_to_process (str) – Path to the xtc file to be analysed.

  • plumed_monitor_file (str) – Path to the plumed input file, should print to COLVAR_MONITOR

  • PLUMED_PATH (str, optional) – Path to the plumed executable.

Returns:

Average value of the CV in the xtc file.

Return type:

float

Modeller utility

This contains methods to interact with the modeller python package.

PyMEMENTO.modeller_util.create_ali_file(pdb_file: str, out_path: str, include_residues: list, ali_file_name: str = 'morph->protein.ali', use_all_chains=False, mutagenesis=None)[source]

Create the *.ali file which is necessary for modelling. In this case, the only differnce between knowns and sequence is removing the caps. There is also the option of doing mutagenesis at this step.

Parameters:
  • pdb_file (str) – Path to file with coordinates.

  • out_path (str) – Path into which to write the ali file.

  • include_residues (list<int>) – List of all residues which should be included in the model. Warning: This currently works only for continuous ranges of residues!

  • ali_file_name (bool, optional) – Name of the ali file to be written, defaults to “morph->protein.ali”

  • use_all_chains – Whether to use all chains of a mutiple-domain protein. If False, only chain ‘X’ (as written by MDAnalysis for a protein which didn’t have any chain identifiers) is used. If True, use all residues that were present in the original coordinate file. Defaults to False.

  • mutagenesis (list<tuple<int, str, str>>, optional) – List of tuples of the form (residue_number, original_residue, new_residue), eg. (10, ‘SER’, ‘ALA’) for S10A mutation. This is not supported for multichain proteins yet. Defaults to None

PyMEMENTO.modeller_util.run_modeller(path: str, number_of_models: int, ali_file_name: str = 'morph->protein.ali')[source]

Run modeller on the ali file contained in a specified directory.

Parameters:
  • path (str) – Directory in which to run modeller.

  • number_of_models (int) – How many models to generate.

  • ali_file_name (str, optional) – Name of the ali file to be used, defaults to “morph->protein.ali”

Other molecular coordinate utility

This contains useful functions which are to do with handling pdb files.

PyMEMENTO.pdb_util.cap_termini(inpath: str, outpath: str, reference: str, first_res: int, last_res: int)[source]

Patch a protein with termini, as specified in a reference structure.

Parameters:
  • inpath (str) – Path to the pdb file to be processed.

  • outpath (str) – Path to the pdb file to be written to output.

  • reference (str) – Path to the reference pdb file. This contains a 4-amino acid sequence: cap-nterm-cterm-cap.

  • first_res (int) – Number of the first residue in the inpath file (onto which to attach the cap).

  • last_res (int) – Number of the last residue in the inpath file (onto which to attach the cap).

PyMEMENTO.pdb_util.fix_residue_numbers(inpath: str, outpath: str, corrected_residue_numbers: list)[source]

Modify the residue numbers in a pdb file to reach a target sequence.

Parameters:
  • inpath (str) – Path to the origin pdb file

  • outpath (str) – Where to write the modifed pdb file to.

  • corrected_residue_numbers (list<int>) – List with the desired residue numbers.

PyMEMENTO.pdb_util.sed(path, replace, by)[source]

Find and replace within a file.

Parameters:
  • path (str) – Path to the file to be modified.

  • replace (str) – String to be replaced.

  • by (str) – String to replace with.