Installation & usage¶
Note: At the moment, only VeloxChem and Orca can be used as the QM driver.
Installation¶
To install this package, you need:
Python 3.11 or higher (with
pipandvenv, but those are generally default), andxTB (mandatory), and, optionally,
Use:
# for the stable version
pip3 install git+https://github.com/pierre-24/pymerk.git@v0.2.0
# for the latest version
pip3 install git+https://github.com/pierre-24/pymerk.git@dev
Note: as this script install programs, you might need to add them (such as $HOME/.local/bin, if you use --user) to your $PATH.
Verify the installation:
pymerk_run --help
Usage¶
PyMERK is controlled entirely through TOML configuration files.
The main entry point is pymerk_run (see below).
Configuration File¶
Create a TOML configuration file to define the workflow parameters and override default values where needed.
A template file containing all default settings can be generated with:
pymerk_config > input.toml
This file can then be edited to suit your system and computational requirements.
The overall structure closely follows that of the CENSO .censo2rc configuration file. Some keywords have been adapted, added, or removed to reflect pyMERK-specific features.
General Settings¶
[general] temperature (float, default: 298.15)Temperature in Kelvin used for all thermochemical and Boltzmann population calculations.
[general] evaluate_rrho (bool, default: true)Enable or disable the evaluation of RRHO thermochemical contributions.
[general] sm_rrho (str, default: "gbsa")Solvation model used for RRHO corrections, typically gbsa or alpb.
[general] imagthr (float, default: -100.0)Threshold for imaginary frequencies in cm⁻¹ below which modes are treated as invalid or ignored.
[general] sthr (float, default: 50.0)Low-frequency threshold in cm⁻¹ used in the treatment of vibrational modes.
[general] solvent (str, default: "h2o")Identifier of the solvent used in the calculations, for example h2o or dmso.
[general] gas_phase (bool, default: false)If set to true, all solvation contributions are ignored and calculations are performed in the gas phase.
Note
If the value of [general] solvent is not the same in xtb and the QM driver, you can add alternate_solvent ([screening] alternate_solvent, [optimization] alternate_solvent, and [refinement] alternate_solvent) to latter stages to define the equivalent solvent.
For example, for "thf", you need to set alternate_solvent = 'tetrahydrofuran' with VeloxChem.
Program Paths¶
[paths] xtb (str, default: "")Path to the xTB executable.
[paths] vlx (str, default: "")Path to the VeloxChem executable.
This setting is used when prog = "vlx" in one of the stage.
[paths] orca (str, default: "")Path to the Orca executable.
Note that in order to use multiple processes ([paths] orca_nprocs), you need to provide the full path (see Parallel instructions for ORCA).
This setting is used when prog = "orca" in one of the stage.
[paths] orca_nprocs (int, default: 1)Number of processes used for Orca calculations, with a maximum of 64.
This setting is used when prog = "orca" in one of the stage.
Note
You can also set runners and default options, by using, e.g.,
[paths]
xtb = "xtb -v" # more verbose output with xTB
vlx = "srun vlx" # run VeloxChem via srun
Prescreening Stage (fast single-points)¶
[prescreening] prog (str, default: "orca")Program used for electronic structure calculations, typically vlx (VeloxChem) or orca.
[prescreening] func (str, default: "pbe d3")Exchange-correlation functional used for DFT calculations.
[prescreening] basis (str, default: "def2-sv(p)")Basis set used during the prescreening stage.
[prescreening] gfnv (str, default: "gfn2")xTB variant used for auxiliary contributions, such as gfn1, gfn2, or gfnff.
[prescreening] threshold (float, default: 4.0)Energy threshold in kcal/mol used to retain conformers relative to the lowest-energy structure.
Screening Stage (refined calculations)¶
[screening] prog (str, default: "orca")Program used for electronic structure calculations.
[screening] func (str, default: "r2scan-3c")Exchange-correlation functional used for the screening stage.
[screening] basis (str, default: "def2-mTZVPP")Basis set used during the screening stage.
[screening] sm (str, default: "smd")Solvation model used in this stage, such as smd, cpcm, or gbsa.
[screening] alternate_solvent (Optional[str | int], default: null)Alternate solvent name to be used by the QM driver, to be provided if it does not match [general] solvent.
[screening] gfnv (str, default: "gfn2")xTB variant used for auxiliary energy corrections.
[screening] threshold (float, default: 3.5)Relative energy cutoff in kcal/mol used to retain conformers.
[screening] gsolv_included (bool, default: false)If set to true, solvation effects are included directly in the QM driver energies, otherwise they are computed separately using xTB.
Optimization Stage (full geometry optimization)¶
Note
At this stage, the solvent must be included directly by the QM driver, as xTB-gradient correction is not yet possible.
[optimization] prog (str, default: "orca")Program used to perform geometry optimizations.
[optimization] func (str, default: "r2scan-3c")Exchange-correlation functional used during optimization.
[optimization] basis (str, default: "def2-mTZVPP")Basis set used for geometry optimizations.
[optimization] sm (str, default: "cpcm")Solvation model used during optimization.
[optimization] alternate_solvent (Optional[str | int], default: null)Alternate solvent name to be used by the QM driver, to be provided if it does not match [general] solvent.
[optimization] optlevel (str, default: "normal")Optimization convergence level. At the moment, only loose, normal, or tight are supported.
[optimization] gfnv (str, default: "gfn2")xTB variant used for RRHO thermochemical corrections.
[optimization] threshold (float, default: 3.0)Energy threshold in kcal/mol used to discard conformers after optimization.
[optimization] macrocycles (bool, default: true)Enable or disable the macrocycle optimization protocol.
[optimization] gradthr (float, default: 0.01)Gradient norm threshold in atomic units below which conformers are compared and filtered.
[optimization] maxcyc (int, default: 200)Maximum number of optimization iterations per conformer.
[optimization] optcycles (int, default: 8)Number of microcycles per macrocycle when the macrocycle protocol is enabled.
Refinement Stage (Boltzmann population filtering)¶
[refinement] prog (str, default: "orca")Program used for final single-point energy calculations.
[refinement] func (str, default: "wb97m-v")Exchange-correlation functional used for high-accuracy refinement.
[refinement] basis (str, default: "def2-TZVP")Basis set used during the refinement stage.
[refinement] sm (str, default: "smd")Solvation model used in the refinement calculations.
[refinement] alternate_solvent (Optional[str | int], default: null)Alternate solvent name to be used by the QM driver, to be provided if it does not match [general] solvent.
[refinement] gfnv (str, default: "gfn2")xTB variant used for auxiliary corrections.
[refinement] threshold (float, default: 0.95)Cumulative Boltzmann population threshold (between 0 and 1) used to select the final set of conformers.
[refinement] gsolv_included (bool, default: false)If set to true, solvation effects are included directly in the QM driver energies, otherwise they are computed separately using xTB.
Running PyMERK¶
Execute the workflow with:
pymerk_run input_ensemble.xyz -i config.toml -o output_ensemble.xyz \
-c <charge> -m <multiplicity> -w <workdir>
Where:
input_ensemble.xyzis a multi-structure XYZ file containing the initial conformers.config.tomlis the TOML configuration file defining the workflow parameters.output_ensemble.xyzis the output XYZ file containing the final selected conformers.<charge>is the total molecular charge (default: 0).<multiplicity>is the spin multiplicity (default: 1).<workdir>is the working directory for the files generated by the different programs as well as intermediate results (default: .).
Only the input structure file is required, and all other arguments are optional.
Tip
If no output file is specified, a default name will be generated automatically.
Output¶
The workflow processes the conformer ensemble through all enabled stages and produces, in the standard output:
Progress;
Energy summaries reporting relative energies for all evaluated structures; and
RMSD matrices (in Ångströms) for the conformers that pass each stage.
In the working directory, you will also find:
Log files containing the output of each program, and
Filtered ensembles at each stage as
.xyzfiles containing only retained conformers.
These outputs allow you to track how the conformer set is progressively reduced and refined.