Tutorial 1: water cluster with VeloxChem

The goal of this first tutorial is to run the pyMERK program on a set of 12 conformers of \(\ce{(H_2O)_6}\) (excerpt of a CREST calculation) to select and optimize the most relevant ones. This will be achieved using VeloxChem.

You can download the initial structures here: 6H2O_12.xyz.

Note

Running the full workflow may take several hours on a standard workstation. Precomputed output files are also provided if you prefer to follow the tutorial more quickly.

Setting up & running

To run pyMERK, you also need an input file in TOML format to override default parameters if needed. For this tutorial, we will use: input.toml.

The content of this file will be discussed progressively in the analysis section below.

This tutorial assumes that you have xTB and VeloxChem properly installed and accessible from your environment. Before running the program, verify and update the paths to xTB ([paths] xtb) and VeloxChem ([paths] vlx):

[paths]
xtb = 'xtb'
vlx = 'vlx'

Tip

On HPC systems using SLURM or MPI, you may want to adapt the VeloxChem command, e.g.:

  • vlx = 'srun vlx'

  • vlx = 'mpirun vlx'

Refer to VeloxChem cluster documentation for details on parallel execution.

Run the workflow with:

# set up 'OMP_STACKSIZE' for xtb
export OMP_STACKSIZE=4G

# define OpenMP parallelization
export OMP_PLACES=cores
export OMP_NUM_THREADS=16

# run the workflow
pymerk_run -i input.toml 6H2O_12.xyz

Hint

Adjust OMP_NUM_THREADS according to the number of cores available on your machine.

Once the workflow starts, you can monitor progress in the generated log files. If you do not want to run it yourself, you can grab run.log.

Analyzing the output

A pyMERK workflow (loosely inspired by CENSO) is organized into four successive stages. These stages are pre-screening, screening, optimization, and refinement.

Each stage is carried out at a defined level of theory set in the input file. Electronic energies are computed with the main QM engine (here, VeloxChem) and complemented by xTB contributions when required.

General settings are defined in the [general] section of the TOML input:

[general]
evaluate_rrho = true
sm_rrho = 'alpb'
solvent = 'water'
gas_phase = false

In this example, the solvent is set to water. The option [general] gas_phase is set to false (default) means that solvent effects are included in all reported energies. They will be included at the alpb level, thanks to [general] sm_rrho. Depending on the stage, these contributions may come from either VeloxChem or xTB. The option [general] evaluate_rrho is set to true (default) enables thermochemical corrections via the SPH correction using xTB.

Additional options are described in Installation & usage.

Pre-screening step

The pre-screening stage provides a fast and inexpensive way to discard clearly unfavorable conformers.

Input section:

[prescreening]
prog = 'vlx'
func = 'pbe'
basis = 'def2-sv(p)'
gfnv = 'gfn2'
threshold = 4.0

[prescreening] basis is set to a reduced basis set (def2-sv(p), default) is used to lower computational cost. Conformers within 4 kcal/mol of the lowest-energy structure ([prescreening] threshold) are retained.

The output starts with a summary of the computational setup:

* This is pymerk v0.1.0
* Starting workflow with 12 conformers

*************************
* Stage: 1_prescreening *
*************************

* Filtering on g* (threshold: 0.006374 a.u.)
* Setup Summary:
  - Components: elec=MAIN, gsolv=AUX, gtrv=NONE
  - Main driver: VlxDriver[pbe/def2-sv(p)]
  - Aux driver:  XtbDriver[gfn2,alpb(water)]
> Conformer #1: -457.71303820 a.u.
> Conformer #2: -457.71312318 a.u.
> Conformer #3: -457.71252233 a.u.
...

The notation elec=MAIN, gsolv=AUX, gtrv=NONE specifies how energy components are evaluated. The elec term corresponds to the electronic energy computed by the main QM driver (VeloxChem) at the PBE0/def2-SV(P) level. The gsolv term corresponds to the solvation correction computed by the auxiliary driver (xTB) using GFN2-xTB/GBSA(water). The gtrv term indicates that no thermochemical contribution is included at this stage.

Relative energies are then reported:

* Final values:
                      g*           Δg*
Conformer #1     -457.71303820   0.00008499 *
Conformer #2     -457.71312318   0.00000000 *
Conformer #3     -457.71252233   0.00060085 *
Conformer #4     -457.71276560   0.00035758 *
Conformer #5     -457.71195711   0.00116607 *
Conformer #6     -457.71156480   0.00155839 *
Conformer #7     -457.70707398   0.00604920 *
Conformer #8     -457.70766134   0.00546185 *
Conformer #9     -457.69820020   0.01492298
Conformer #10    -457.69290681   0.02021638
Conformer #11    -457.69315862   0.01996457
Conformer #12    -457.70610623   0.00701695

Conformers above the threshold are removed, and in this example 3 conformers are discarded. The cutoff can be tuned via the [prescreening] threshold parameter in the input file.

An RMSD matrix between the remaining conformers is also printed:

* RMSD matrix (Å) for 8 structures:
                ..r #1 ..r #2 ..r #3 ..r #4 ..r #5 ..r #6 ..r #7 ..r #8
Conformer #1     0.000
Conformer #2     2.006  0.000
Conformer #3     1.891  0.718  0.000
Conformer #4     1.913  2.032  1.899  0.000
Conformer #5     2.155  2.155  1.918  0.930  0.000
Conformer #6     2.157  2.243  2.112  1.566  1.399  0.000
Conformer #7     2.235  1.826  1.710  1.577  1.714  2.174  0.000
Conformer #8     2.007  1.811  1.840  2.095  2.382  2.418  2.373  0.000

These values provide a measure of structural diversity. In this case, the conformers are all significantly different from one another.

Screening step

The screening stage refines the selection using a higher level of theory and includes thermochemical corrections.

Input section:

[screening]
prog = 'vlx'
func = 'rcam-b3lyp'
basis = 'def2-svpd'
sm = 'smd'
gfnv = 'gfn2'
threshold = 3.5

A reduced basis set is still used to balance cost and accuracy. The energy threshold (via [screening] threshold) is slightly stricter and set to 3.5 kcal/mol by default.

The output is similar to the previous stage:

* Stage: 2_screening *
**********************

* Filtering on G* (threshold: 0.005578 a.u.)
* Setup Summary:
  - Components: elec=MAIN, gsolv=AUX, gtrv=AUX
  - Main driver: VlxDriver[rcam-b3lyp/def2-svpd]
  - Aux driver:  XtbDriver[gfn2,alpb(water)]
> Conformer #1: -458.11929003 a.u.
> Conformer #2: -458.11914097 a.u.
> Conformer #3: -458.11949591 a.u.
> Conformer #4: -458.12096825 a.u.
> Conformer #5: -458.11965384 a.u.
> Conformer #6: -458.12256161 a.u.
> Conformer #7: -458.11502923 a.u.
> Conformer #8: -458.12006107 a.u.

* Final values:
                      G*           ΔG*
Conformer #1     -458.11929003   0.00327158 *
Conformer #2     -458.11914097   0.00342063 *
Conformer #3     -458.11949591   0.00306570 *
Conformer #4     -458.12096825   0.00159336 *
Conformer #5     -458.11965384   0.00290777 *
Conformer #6     -458.12256161   0.00000000 *
Conformer #7     -458.11502923   0.00753238
Conformer #8     -458.12006107   0.00250053 *

* Done, retained 7 conformer(s)

The notation elec=MAIN, gsolv=AUX, gtrv=AUX indicates how energy contributions are evaluated. The elec term corresponds to the electronic energy computed by VeloxChem at the rcam-b3lyp/def2-svpd level. The gsolv term corresponds to the solvation correction computed by xTB at the GFN2-xTB/GBSA level. The gtrv term corresponds to thermochemical corrections computed by xTB via the SPH scheme.

At this stage, one additional conformer is discarded.

Optimization step

The optimization stage performs geometry optimizations of the retained conformers.

Input section:

[optimization]
prog = 'vlx'
func = 'rcam-b3lyp'
basis = 'def2-svpd'
sm = 'cpcm'
alternate_solvent = 80.0

macrocycles = true

The level of theory is rcam-b3lyp/def2-svpd. Solvent effects must now be included directly in the QM calculation, for example via CPCM in VeloxChem. The dielectric constant is set using the [optimization] alternate_solvent parameter.

The macrocycle procedure is used by default. Up to 8 optimization cycles are performed per conformer, controlled by [optimization] optcycles. After each macrocycle, converged structures based on [optimization] gradthr are compared. Conformers above the energy threshold of 3 kcal/mol (value of [optimization] threshold) are discarded early.

The output for this stage is:

> Macrocycle 1
  - Otimizing Conformer #1... E=-458.09069001, grad=0.000924 [NOT CONVERGED]
  - Otimizing Conformer #2... E=-458.09109825, grad=0.001303 [NOT CONVERGED]
  - Otimizing Conformer #3... E=-458.09066865, grad=0.001203 [NOT CONVERGED]
  - Otimizing Conformer #4... E=-458.09116329, grad=0.001065 [NOT CONVERGED]
  - Otimizing Conformer #5... E=-458.09055429, grad=0.000923 [NOT CONVERGED]
  - Otimizing Conformer #6... E=-458.09189787, grad=0.001786 [NOT CONVERGED]
  - Otimizing Conformer #8... E=-458.09186747, grad=0.001577 [NOT CONVERGED]
→ retained 7 conformer(s)

> Macrocycle 2
  - Otimizing Conformer #1... E=-458.09077397, grad=0.000510 [CONVERGED]
  - Otimizing Conformer #2... E=-458.09108284, grad=0.000507 [CONVERGED]
  - Otimizing Conformer #3... E=-458.09078190, grad=0.000565 [CONVERGED]
  - Otimizing Conformer #4... E=-458.09145499, grad=0.000362 [NOT CONVERGED]
  - Otimizing Conformer #5... E=-458.09077463, grad=0.000399 [NOT CONVERGED]
  - Otimizing Conformer #6... E=-458.09253505, grad=0.001662 [NOT CONVERGED]
  - Otimizing Conformer #8... E=-458.09249696, grad=0.001638 [NOT CONVERGED]
→ retained 7 conformer(s)

> Macrocycle 3
  - Otimizing Conformer #4... E=-458.09145698, grad=0.000237 [CONVERGED]
  - Otimizing Conformer #5... E=-458.09076922, grad=0.000219 [CONVERGED]
  - Otimizing Conformer #6... E=-458.09262554, grad=0.000345 [NOT CONVERGED]
  - Otimizing Conformer #8... E=-458.09279152, grad=0.000816 [NOT CONVERGED]
→ retained 7 conformer(s)

> Macrocycle 4
  - Otimizing Conformer #6... E=-458.09262669, grad=0.000268 [CONVERGED]
  - Otimizing Conformer #8... E=-458.09269508, grad=0.000862 [NOT CONVERGED]
→ retained 7 conformer(s)

> Macrocycle 5
  - Otimizing Conformer #8... E=-458.09270051, grad=0.000564 [CONVERGED]
→ retained 7 conformer(s)

* Final values:
                      G*           ΔG*
Conformer #1     -458.09077397   0.00192654 *
Conformer #2     -458.09108284   0.00161767 *
Conformer #3     -458.09078190   0.00191861 *
Conformer #4     -458.09145698   0.00124353 *
Conformer #5     -458.09076922   0.00193129 *
Conformer #6     -458.09262669   0.00007382 *
Conformer #8     -458.09270051   0.00000000 *

* Done, retained 7 conformer(s)

In this example, all 7 conformers converge after 5 macrocycles and are retained. The optimized geometries are available in 3_optimize.selected.xyz.

Refinement step

The refinement stage evaluates the conformer ensemble at a higher level of theory and selects structures based on their Boltzmann population.

\[p_i = \frac{e^{-\Delta G_i / RT}}{\sum_j e^{-\Delta G_j / RT}}\]

Input section:

[refinement]
prog = 'vlx'
func = 'wb97m-d4'
basis = 'def2-svpd'
sm = 'smd'
threshold = 0.95
gsolv_included = true

Solvation is handled directly by VeloxChem using the SMD model with [refinement] gsolv_included set to true.

In this stage, the [refinement] threshold parameter defines a cumulative Boltzmann population cutoff of 95 percent rather than an energy difference.

The output for this stage is:

* Stage: 4_refinement *
***********************

* Boltzmann population filtering on ΔG* (threshold: 0.950)
* Setup Summary:
  - Components:  elec=MAIN, gsolv=MAIN, gtrv=AUX
  - Main driver: VlxDriver[wb97m-d4/def2-svpd,smd(water)]
  - Aux driver:  XtbDriver[gfn2,alpb(water)]
> Conformer #1: -458.35599305 a.u.
> Conformer #2: -458.35614458 a.u.
> Conformer #3: -458.35607541 a.u.
> Conformer #4: -458.35658399 a.u.
> Conformer #5: -458.35634548 a.u.
> Conformer #6: -458.35667792 a.u.
> Conformer #8: -458.35756641 a.u.

* Final Boltzmann population:
                      G*           ΔG*       Pop %
Conformer #1     -458.35599305   0.00157337    7.2 *
Conformer #2     -458.35614458   0.00142183    8.4 *
Conformer #3     -458.35607541   0.00149100    7.8 *
Conformer #4     -458.35658399   0.00098243   13.4 *
Conformer #5     -458.35634548   0.00122093   10.4 *
Conformer #6     -458.35667792   0.00088849   14.8 *
Conformer #8     -458.35756641   0.00000000   38.0 *
* Done, retained 7 conformer(s)

Only the conformers required to reach the target population are retained. In this example, All 7 conformers are sufficient, since the lowest one still have a significant population of 7.2%.

The final structures are available in 4_refinement.selected.xyz.