On basis sets and GTH pseudopotentials in CP2K¤
Basis sets¤
Solving the Schrödinger equation generally resort to the use of the LCAO (linear combination of atomic orbitals) approximation. For computational reasons, while STO (slatertype orbitals, \(\propto e^{r}\)) should be used, GTO (gaussian type orbitals, \(\propto e^{r^2}\)) are preferred.
In CP2K (as in other quantum chemistry programs), basis functions (i.e., atomic orbitals) are defined as:
where \(R(r)\) is the radial part, and \(Y_{\ell m}\) is a spherical harmonic (\(\ell\) is the angular momentum and \(m\in[\ell,\ell]\)) for the angular part. A weighted sum of primitives (Gaussian functions) is used for this part:
where \(\alpha_{j}\) is the exponent, while \(c_{ij}\) is the corresponding contraction coefficient. The two numbers defines the primitive \(j\) in the basis function \(\psi_i\).
Thus, a basis sets is a library of contractions for each shell (stype [\(\ell=0\)], ptype [\(\ell=1\)], dtype [\(\ell=2\)], etc) of a given atom, containing the definition of the primitives (so a list of exponents and their corresponding contraction coefficients). Since the radial part is the same for each possible orbital in a shell (i.e., each possible value of \(m\)), this is sufficient to define a basis set.
But basis function can be grouped!
Some basis functions (even with different \(\ell\)) may be grouped to share the same exponent.
This is the case, e.g., in the (in)famous STO3G (3 gaussians for each basis function) in which socalled "SP" basis functions are found, defined by using a common exponent but different contraction coefficient for each of the 3 gaussians defining the actual stype and ptype basis function.
MOLOPT basis sets (see below) are built on this principle.
Multiple\(\zeta\) basis sets use \(\zeta\) basis functions for each atomic orbitals in the atom. For example, double\(\zeta\) basis sets use two basis function for each atomic orbital (e.g., a total of 10 basis function for the carbon). They may be grouped to share the same exponent, or not.
On top of that, two other kind of basis function might be added:
 polarization functions, which have a larger angular momentum than those of the ground state of the atom (e.g., dtype basis functions for the carbon), and
 diffuse functions, which have very small exponents (corresponding to very "large" Gaussian functions).
Thus, the number of basis function for a given atom is very different from one basis set to the other.
Note
One way to ease the communication is to report, for each shell, the number of primitives (uncontracted set) in the form (uncontracted set)
, and the number of basis functions (contracted set) in the form [contracted set]
.
For example, in a doublezeta basis set defined by 3 gaussian per basis function, the result for carbn would be (12s,6p)
and [3s,2p]
.
The form [12s6p3s2p]
, combining the two is also found.
However, for reasons that will become clear in the next section, CP2K does not only use allelectron basis sets.
GTH pseudopotentials¤
Info
For the latest CP2K review, see 10.1063/5.0007045 (May 2020). The GPW method is described, e.g., here.
Indeed, CP2K is able use an auxiliary plane wave (PW) basis set to perform its calculation with the GPW (and GAPW) method. To perform a such calculation, one needs to include all possible PW below a given threshold. In fact, increasing this threshold will monotonously improve the quality of the result (and the length of the calculation!). However, area where the electron density is rapidly changing requires PW with small wavelengths/high energy to be well described (i.e., large Fourier components), so high threshold, which would make the calculation impossible in practice. In practice, area where such changes are important are mostly located near the nuclei, "thanks" its ionic potential \(V(r) =  \frac{Z}{r}\).
Hopefully, core and valence shell are generally well (spatially and energetically) separated, and core electrons are relatively unperturbed by the surrounding (chemically inert).
So the idea behind pseudopotentials is to replace the effect of the nuclei and the core electrons (which are considered frozen) by an effective potential (below a given threshold \(r_c\)), and the valence electrons basis functions by ones with fewer nodes (since they do not need to be orthogonal to the, now removed, valence orbital), but with the same behavior outside the core region (for \(r > r_c\)).
Derivation of a pseudopotential
Say one has a set of (orthogonal) core states \(\{\chi_n\rangle\}\) (with their corresponding eigenvalue \(\{E_n\}\)). The goal is to construct a pseudostate \(\phi\rangle\) for a valence state \(\psi\rangle\) (with its corresponding eigenvalue \(E\)), in the form:
Since the core and valence state must be orthogonal, \(\langle \chi_m  \psi \rangle = 0 = \langle \chi_m  \phi \rangle + a_m\), so that
Substituting in Schrödinger equation for \(\psi\rangle\) gives
The pseudostate thus obeys \([\hat H + \hat V_{n\ell}]\, \phi\rangle = E\,\phi\rangle\) with:
where the energy of \(\phi\rangle\) is the same as the one of \(\psi\rangle\), thanks to the pseudopotential \(\hat V_{n\ell}\). This extra potential depends on the quantum numbers \(n\) and \(\ell\) due to its spherical symmetry. Furthermore, since \(E > E_{n\ell}\), it is a repulsive potenial.
In practice, pseudopotential expressions are separated into a fully nonlocal form, thanks to the KleinmanBylander Transformation (see 10.1103/PhysRevLett.48.1425). Latter on, Goedecker, Teter and Hutter (GTH, see 10.1103/PhysRevB.54.1703) derived expressions those two parts which are suited for real and Fourier space integration and only requires a few adjustable parameters (in blue):
with
and
In the former, \(Z'\) is the ionic charge (i.e., the charge of the nucleus minus the one of the core electrons), while in the latter, \(N_\ell(r)\) is a combination of spherical harmonic multiplied by a \(\ell\)dependent radial function.
All the parameters in blue, together with the number of core electrons in each shell, define a GTH pseudopotential in CP2K (see, e.g., 10.1007/s002140050655y). In particular, they are given as a local part plus a set of nonlinear projectors.
Note
It is totally possible, for a given atom, to have pseudopotentials with a different number of core electrons embeded in the potential: while a small number of core electrons ensure a good transferability (but lengthen the calculation), a large number results in a smoother potential.
Pairing GTH pseudopotentials with basis sets¤
Warning
In practice, the number electron is dictated by the pseudopotential (which lists the number of valence electrons in each shell). CP2K will blindly use whatever basis set you give, since it has no way to check the information on the basis set side (not even the names, which are merely conventions).
From the previous paragraph, it appears that one has to pair a given pseudopotential with a correctly defined basis set, that has been designed with the correct amount of core electron removed and that contains smoother (pseudo) basis functions. Names of pseudopotentials and basis sets help to achieve this association:

GTH pseudopotential are generally named
GTH<XFC>
, where<XCF>
is the name of a XCfunctional. It is thus strongly suggested to use them together with the XCF in question.For a given atom, such potentials are nicknamed
GTH<XFC>q<N>
, where<N>
is the number of valence electrons considered while building the potential. For example, a pseudopotential for carbon is nicknamedGTHBLYPq4
, indicate that this pseudopotential was designed with BLYP, and that was built using 4 valence electrons (so it embedded a total of 2 core electrons). 
Basis sets are generally named
<XZ><NAME><XCF>GTH
or<XZ>GTH<XCF>
(though this is definitely not an absolute rule!), where<XZ>
describe the content of the basis set (e.g.,DZV
for double zeta,TZVP
for triple zeta with extra polarization, etc),<NAME>
is the name of the family (e.g.,MOLOPT
) and<XCF>
is the name of the XCF used to optimize the basis set. The<XCF>
may not be present in the name: for example,TZVPMOLOPTGTH
should work with all XCF, whileTZVPMOLOPTPBEGTH
was specifically designed with PBE. The latter probably gives better results and should (probably) be preferred.Again, for each atom, a suffix
q<N>
is added, indicated how much valence electrons were not considered while building this basis set. For example, for carbon, the nicknameDZVPMOLOPTGTHq4
indicates that this is a doublezeta basis set (plus polarization functions) of theMOLOPT
family, designed to work with GTH pseudopotentials embedding 2 core electrons.
Where are the basis sets and GTH pseudopotentials?
When running a CP2K calculation, you have to provide two files, containing the basis set(s) and pseudopotential(s) used in your calculation:
&DFT
BASIS_SET_FILE_NAME BASIS_SET
POTENTIAL_FILE_NAME GTH_POTENTIALS
&END DFT
Basis sets and pseudopotentials are scattered across different file in the CP2K data
folder.
However, the web interface of this project proposes an easier way to build your own taylormade BASIS_SET
and GTH_POTENTIALS
files.
This is equivalent, since the data are obtained from the same source and just presented with a shiny interface ;)
If you are interested, the format of those two files is described here
Working with allelectron basis sets¤
Info
The GAPW method is introduced, e.g., here or in 10.1007/s002140050523.
There also exists a special pseudopotential, ALL
, which should be used for allelectron calculations.
This is the pseudopotential of choice for GAPW calculations.
It can be used with your usual allelectron basis sets (such as STO3G, e.g., found in the BSE).
A curated list is available in the EMSL_BASIS_SETS
file.
Specially designed basis sets were also derived (found, e.g., in the ALL_BASIS_SETS
file), which contains ALL
in their name or ends by ae
.
The q<N>
number, if it exists (it is generally omitted), should be equal to the atomic number.