Publication Cover
Molecular Physics
An International Journal at the Interface Between Chemistry and Physics
Volume 122, 2024 - Issue 7-8: Tim Lee Memorial Issue
492
Views
2
CrossRef citations to date
0
Altmetric
Tim Lee Memorial Issue

Can G4-like composite Ab Initio methods accurately predict vibrational harmonic frequencies?

&
Article: e2263593 | Received 14 Aug 2023, Accepted 21 Sep 2023, Published online: 29 Sep 2023

Abstract

Minimally empirical G4-like composite wavefunction theories [E. Semidalas and J. M. L. Martin, J. Chem. Theory Comput. 16, 4238–4255 and 7507–7524 (2020)] trained against the large and chemically diverse GMTKN55 benchmark suite have demonstrated both accuracy and cost-effectiveness in predicting thermochemistry, barrier heights, and noncovalent interaction energies. Here, we assess the spectroscopic accuracy of top-performing methods: G4-n, cc-G4-n, and G4-n-F12, and validate them against explicitly correlated coupled-cluster CCSD(T*)(F12*) harmonic vibrational frequencies and experimental data from the HFREQ2014 dataset, of small first- and second-row polyatomics. G4-T is three times more accurate than plain CCSD(T)/def2-TZVP, while G4-Tano is two times superior to CCSD(T)/ano-pVTZ. Combining CCSD(T)/ano-pVTZ with MP2-F12 in a parameter-free composite scheme results to a root-mean-square deviation of 5 cm1 relative to experiment, comparable to CCSD(T) at the complete basis set limit. Application to the harmonic frequencies of benzene reveals a significant advantage of composites with ANO basis sets – MP2/ano-pVmZ and [CCSD(T)-MP2]/ano-pVTZ (m = Q or 5) – over similar protocols based on CCSD(T)/def2-TZVP. Overall, G4-type composite energy schemes, particularly when combined with ANO basis sets in CCSD(T), are accurate and comparatively inexpensive tools for computational vibrational spectroscopy.

GRAPHICAL ABSTRACT

1. Introduction

In the realm of computational spectroscopy [Citation1], accurate prediction of molecular vibrations has long been a valuable tool in chemistry, biochemistry, and materials science [Citation2–7]. To achieve spectroscopic accuracy, defined as an error of less than 1 cm1 from gas-phase vibrations [Citation8, Citation9], it may be necessary to venture beyond the ‘gold-standard’ CCSD(T) method [Citation10], i.e. coupled-cluster with single, double, and perturbative triple excitations, and employ higher-order correlation approaches that include quadruple excitations, such as in the CCSDT(Q) method [Citation11]. Additionally, the slower basis set convergence [Citation12–14] of the correlation energy can quickly lead to computational constraints, especially for the electronic structure of larger molecules.

In response to the challenges that lie with computationally expensive ab initio methods, the composite wave function theories (cWFTs) have emerged as promising alternatives (for a very recent review, see Karton [Citation15]). These methods combine high-level treatments of electron correlation through additivity approximations, economical basis set extrapolations, and often empirical corrections. The end result is a robust approach aimed at the accuracy of CCSD(T) at the complete basis set limit (CBS). Examples of such cWFTs include the Pople's group Gaussian-n theories (Gn) [Citation16–21], the CBS models by Petersson and co-workers [Citation22–24], the Weizmann-n theory (n = 1 and 2) by the Martin group [Citation25–27], and the ccCA (correlation consistent composite approach) of Wilson and coworkers [Citation28–30].

Among the more economical composite energy schemes, G4 [Citation19] and G4(MP2) [Citation20, Citation31, Citation32] approaches fit in. The former combines MPn (n = 2 and 4) and CCSD(T) methods, while the latter reduces cost by omitting the MP4 step. In 2011, Radom and co-workers introduced the G4(MP2)-6X protocol [Citation33], an improved G4(MP2) variant, featuring six empirical parameters for correlation energies and another six for the high-level correction. Building on this, Chan et al. [Citation34] shifted from Pople- to Karlsruhe-type basis sets in their G4(MP2)-XK approach. Inspired by their work, we presented an hierarchy of G4-type cWFTs [Citation35, Citation36], validated against the energetics of the chemically large GMTKN55 benchmark suite of Goerigk, Grimme, and coworkers [Citation37] (general main-group thermochemistry, kinetics, and noncovalent interactions, 55 subsets)

Another avenue for improving cost-effectiveness and accuracy in cWFTs is using explicitly correlated theory [Citation38–41], where r12 terms that depend on the interelectronic distance, e.g. of the form [1-exp(γ/r12)]/γ, are added in the wave function. This inclusion accelerates basis set convergence [Citation42, Citation43], with R12/F12 methods typically requiring 2–3 additional basis set cardinal numbers or ‘zetas’ compared to conventional calculations [Citation38, Citation39, Citation41]. Presently, numerous explicitly correlated composite thermochemical protocols have been reported, including the W4-F12 [Citation44], ccCA-F12 [Citation45], G4-m-F12 [Citation35], and SVECV-f12 theories [Citation46].

For harmonic vibrational frequencies it is generally accepted that valence-only CCSD(T) suffices, particularly in systems with low static correlation that are dominated by a single reference determinant [Citation9, Citation47]. This arises from a fortuitous error compensation between the approximate treatment of the triples term, the missing core-valence correlation, and the neglect of higher order excitations.

To approach the full CI limit with composite energy schemes, post-CCSD(T) terms must be included. This has been successfully demonstrated in the well-known HEAT-n approaches [Citation48–51], the Feller-Peterson-Dixon (FPD) model, [Citation52–54], the Wilson's group ccCA, [Citation28–30], and the Weizmann-n (n = 3 and 4) theories [Citation55–57] by the Martin group. We recently examined [Citation58] the importance of post-CCSD(T) corrections in cWFTs, particularly CCSDT(Q)Λ [Citation11], for spectroscopic constants in heavy-atom diatomics at different static correlation regimes, and reported accurate predictions, including ozone vibrational frequencies.

There has been a fair amount of work on post-CCSD(T) cWFT methods in the context of vibrational spectroscopy. We note, inter alia, the 2005 work of Heckert et al. [Citation59, Citation60] and Puzzarini et al. [Citation61] on accurate geometries viz. rotational constants. Ruden et al. [Citation62] considered quadruples and quintuples terms in CCSD(T)-based composite schemes for harmonic frequencies of HF, N2, F2, and CO, while Karton and Martin [Citation8] applied pointwise W4 theory (and truncations thereof, as well as the enhanced W4.3 theory) to spectroscopic constants and electric properties of 28 first- and second-row diatomics, as well as several polyatomics [Citation8]. The spectroscopic constants of formaldehyde were obtained by Schaefer and co-workers [Citation63] through CCSDT(Q)-based focal-point analysis [Citation64–67], while Zhu and Xu [Citation68] reported static polarisabilities at CCSD(T)/CBS. Huang and Lee [Citation69], and later Lee and coworkers [Citation70, Citation71] explored the CcCR methodologies (‘C’ stands for CBS, complete basis set; ‘cC’ for core correlation; and ‘R’ for relativistic effects) for determining fundamental vibrational frequencies [Citation70] and anharmonic rotational constants [Citation71].

In contrast, as well as in comparison to combining high-quality harmonic frequencies with DFT-level anharmonic force fields (see, e.g.[Citation3] and references therein; Ref. [Citation72]), effort toward an economical cWFT approximation to CCSD(T) harmonic frequencies has been fairly limited. An unfairly overlooked paper by Bettens and coworkers [Citation73] considered the combination of MP2 in larger basis sets with CCSD(T)−MP2 in smaller ones. Barone and coworkers [Citation74] introduced what they termed their ‘cheap’ approximation, which augments an MP2/cc-pVT,QZ CBS extrapolation (the notation means ‘from cc-pVTZ and cc-pVQZ’) with diffuse function, CCSD(T)−MP2, and core-valence corrections.

The purpose of this study is to assess whether G4-type composite energy schemes can be a viable alternative to large basis set CCSD(T) harmonic vibrational frequency calculations. We shall validate these G4-type methods against basis set limit extrapolated CCSD(T) frequencies, as well as CCSD(T*)(F12*)/VQZ-F12 calculated harmonic vibrational frequencies and experimental ones of 31 molecules from the HFREQ2014 dataset [Citation75]. As a proof of principle, cWFTs are then applied to the difficult [Citation76] harmonic force field of benzene.

2. Computational details

All calculations were performed on the Chemfarm HPC cluster of the Faculty of Chemistry at the Weizmann Institute, mostly using the MOLPRO 2022.3 [Citation77] electronic structure programme system. Built on top of the ALASKA integral derivative package [Citation78], canonical MP2 [Citation79] analytical derivatives (Ref. [Citation80] and references therein; see Ref. [Citation81] for the specific MOLPRO implementation) and canonical CCSD(T) [Citation10, Citation82] analytical first derivatives (Ref. [Citation49] and references therein) were evaluated with nondegenerate symmetry enabled, while force constant matrices (Hessians) were evaluated semi-numerically using central differences of gradients. For verification purposes, MP2 Hessians in the same basis set were also calculated analytically [Citation83] using Gaussian 16; [Citation84] we found harmonic frequencies from the analytical and semi-numerical Hessians to differ by only on the order of 0.03 cm1, which is negligible in the context of this work.

The explicitly correlated density-fitted DF-MP2-F12 method [Citation85] was employed with analytic gradients [Citation86, Citation87] and the 3*C(FIX, HY1) Ansatz, in which the extended Brillouin condition is assumed and the ”HY1” hybrid approximation is used for matrix elements [Citation88] over the F12 geminal [Citation89], together with fixed geminal amplitudes [Citation89, Citation90]. The CCSD(T)(F12*) [Citation91] geometry optimizations and frequency calculations were carried out fully numerically for want of an analytical gradient.

In our study, we employed various basis set families, including the Weigend-Ahlrichs def2 family [Citation92]: def2-SVP, def2-nZVP, and def2-nZVPP, (n= T and Q), along with their augmented alternatives with diffuse functions def2-SVPD and def2-nZVPPD (n= T and Q) [Citation93]. The combination of def2-nZVPP on hydrogen atoms and def2-nZVPPD on main group elements is denoted as def2-nZVPPD'. Among the atomic natural orbital (ANOs) basis sets pioneered by Almlöf and Taylor [Citation94], we chose the ano-pVnZ (n= D, T, Q, 5) of Neese and Valeev [Citation95], as well as their aug-ano-pVnZ (diffuse function augmented) and saug-ano-pVnZ (minimally augmented) variants from the same reference. Table  provides a list of abbreviations for methods and basis sets used in this study.

Table 1. Abbreviations for methods, basis sets, and other terms in the present work.

Among the correlation consistent basis set family, we consider the cc-pVnZ and aug-cc-pVnZ basis sets (n= D, T, Q, 5) [Citation96–98] on hydrogen and the first row, and the (aug-)-cc-pV(n+d)Z basis sets [Citation99] on second-row elements, which include an additional tight d function as was previously found [Citation100, Citation101] to be important when these elements are in high oxidation states. (In this work, the largest impact is seen for SO2.) Additionally, for calculations including inner-shell correlation, we employed the core-valence weighted aug-cc-pwCVnZ (n= T and Q) basis sets [Citation102]. The shorthand haVnZ+d refers in this paper to the combination of cc-pVnZ on hydrogen with aug-cc-pVnZ on first-row atoms and aug-cc-pV(n+d)Z on second-row atoms.

Aside from the orbital basis set (OBS) employed in a standard explicitly correlated calculation with density-fitting, there are three additional auxiliary basis sets (ABS): the ‘JKFit’ basis set for the Coulomb and exchange integrals, the ‘MP2Fit’ basis set for density fitting in MP2, and the ‘CABS’ also known as complementary auxiliary basis set [Citation103, Citation104]. We utilised the cc-pVnZ-F12 [Citation105] (n= T and Q) basis sets as OBS, along with the default cc-pVnZ-F12/JKFit and cc-pVnZ-F12/MP2Fit in MOLPRO as JKFit and MP2Fit ABSs, respectively. For CABS, we used Yousaf and Peterson's cc-pVnZ-F12/OptRI [Citation106]. Slater-type geminal terms of the F12 form [1-exp (γ r12)]/γ were used with a β geminal exponent of 1.0 for both triple- and quadruple-ζ OBS, as recommended in Table V of Ref. [Citation90]. In the text below, ‘VnZ-F12’ signifies the cc-pVnZ-F12 basis sets.

To validate the accuracy of composite schemes, we used the HFREQ2014 dataset [Citation75] of harmonic frequencies for small molecules (Table ). Error statistics were estimated relative to CCSD(T*)(F12*)/VQZ-F12 calculations (reference) and experimental values from ref. [Citation75] and references therein. On a related note, Mehta et al. [Citation110] considered the same CCSD(T*)(F12*)/VQZ-F12 reference for HFREQ2014 in their study on the performance of double-hybrid density functional theory for molecular vibrations; they also carried out CCSD(T) calculations there for comparison, but excluded some of the HFREQ2014 species such as F2, HNO, and CF2. Also, Head-Gordon and coworkers [Citation111] recently introduced analytical second derivatives of VV10 dispersion corrected [Citation112] containing density functionals and evaluated their predictive accuracy for harmonic frequencies across various molecular systems including those in the HFREQ2014 dataset. They concluded that while the VV10-enhanced DFT functionals offered no advantage for small-molecule vibrational spectra, but a significant improvement was seen in vibrational spectra of noncovalent complexes.

Table 2. Molecules considered in the HFREQ2014 dataset.

Geometries were optimised using the total electronic energy as the target function for each cWFT method, employing the numerical gradient. That was accomplished through the optg procedure [Citation113] in MOLPRO. The optimizations were completed once the maximum gradient component was less than 10−5 hartree/bohr, the optimisation step was less than 105 bohr, and the change in total energy from the previous iteration was less than 1011 hartree. In numerical gradients and Hessians, the default stepsize of 0.01 a.u. was used, unless otherwise noted (see Table  below for cases where a 0.005 a.u. value was used). The cutoffs of two-electron integrals were set to 1020 for screening and 1018 for the prefactor test. The total energy in subsequent calculations of force constant calculations was converged within 1012 Eh.

Table 3. Root-mean-square deviations and mean-absolute deviations (cm1) of calculated harmonic frequencies at the CCSD(T) level from CCSD(T*)(F12*)/VQZ-F12 calculations (refered to as CCSD(T)/CBS) and experiment for the HFREQ2014 dataset.

We provide an implementation of hfreq for automated geometry optimisation and calculation of harmonic vibrational frequencies with composite energy schemes at the following Github link ‘https://github.com/msemidalas/hfreq’. hfreq is written in Python and the Git repository contains several sample files to reproduce the results of this work. In this version, geometry optimisation employs analytic gradients, while Hessians are computed numerically in MOLPRO. The mass-weighted Hessian is diagonalised in Psi4 [Citation114]. Figure  graphically shows the automated procedure for the evaluation of harmonic frequencies.

Figure 1. hfreq implementation workflow.

Figure 1. hfreq implementation workflow.

A reviewer highlighted a very recent paper by Jensen [Citation115], in which three methods for extrapolating vibrational frequencies are discussed. The first, ‘Opt-xpol’, involves geometry optimisation by minimising the basis set-extrapolated energy, followed by frequency calculation from the extrapolated Hessian at the optimised geometry: this parallels our approach in the present work and in the hfreq code. The second, ‘v-xpol’, directly extrapolates vibrational frequencies from optimised geometries using two different basis sets, an approach explored earlier by Varandas [Citation116] and Broda and colleagues [Citation117–119]. The ‘H-xpol’ approach directly extrapolates optimised Hessians, regardless of reference geometries. Jensen's findings show that all three approaches yield similar results for small molecules using double-triple ζ extrapolation in cc-pVnZ basis sets at wB97X-D [Citation120] and MP2 levels. However, for H-bonded complexes, ‘H-xpol’ yields unsatisfactory results with extrapolation from pcseg-0 and pcseg-1 basis sets [Citation121], as well as from pcseg-1 and pcseg-2, likely due to poor reference geometries.

3. Results and discussion

3.1. Performance of CCSD(T) for harmonic frequencies

Assessing CCSD(T) for harmonic frequencies is the first step to estimate the accuracy of composite energy schemes. Kesharwani and Martin reported that valence-only CCSD(T) at the complete basis set limit is about 5 cm1 as accurate as the experimental harmonic frequencies for the HFREQ2014 dataset [Citation75]. (For the avoidance of doubt, we should stress that these experimental data are truly harmonic, obtained from fitting series expansion in the vibrational quantum numbers to many vibrational band origins.) Explicitly-correlated CCSD(T*)(F12*) with VQZ-F12 achieves a root-mean-square deviation (RMSD) of 4.7 cm1 compared to experiment; T* denotes pointwise Marchetti-Werner scaling [Citation108] of the triples. In a recent DFT study of harmonic frequencies [Citation110], very similar results were obtained with fairly large basis sets, such as VQZ-F12, between either the point-wise scaling T* or scaling the triples term by a constant factor (Ts) [Citation122].

Table  presents the error statistics of CCSD(T) with different classes of basis sets compared to reference calculated harmonic frequencies and experimental values for HFREQ2014; the error distribution is also depicted as a ‘box and whiskers plot’ in Figure .

Figure 2. Box-and-whisker plot for the deviations of harmonic vibrational frequencies at CCSD(T) with various basis sets from CCSD(T*)(F12*)/VQZ-F12 for the HFREQ2014 dataset. The interquartile range (IQR) is the difference between the third and first quartiles (IQR = Q3–Q1). The upper whisker extends up to Q3 + 1.5*IQR, while the lower whisker extends down to Q1–1.5*IQR, and outliers are shown outside the whiskers. The median is indicated by a red line, while a green dotted line represents the mean. The blue band indicates ±10 cm1.

Figure 2. Box-and-whisker plot for the deviations of harmonic vibrational frequencies at CCSD(T) with various basis sets from CCSD(T*)(F12*)/VQZ-F12 for the HFREQ2014 dataset. The interquartile range (IQR) is the difference between the third and first quartiles (IQR = Q3–Q1). The upper whisker extends up to Q3 + 1.5*IQR, while the lower whisker extends down to Q1–1.5*IQR, and outliers are shown outside the whiskers. The median is indicated by a red line, while a green dotted line represents the mean. The blue band indicates ±10 cm−1.

First of all, concerning the reference, we could have made two basically equivalent choices, as CCSD(T*)(F12*)/VQZ-F12 and pointwise CCSD(T)/haVQ,5Z+d extrapolation differ from each other by just 1.0 cm1 RMS. We have selected the former throughout. (For the avoidance of doubt, doubly and triply degenerate frequencies are assigned weights of 2 and 3, respectively, in the statistics.)

Second, the addition of tight d functions to the second-row aug-cc-pVnZ basis sets has a very significant effect in SO2 for smaller n. At first sight, no similar phenomenon is seen for the ANO basis sets; however, the primitive d functions that make up the d symmetry ANOs have exponents 5.0755, 2.1833, 0.9392, 0.404, and 0.1738; hence, the high-exponent space is already adequately covered in the primitives.

Third, in both ANO and correlation consistent families, augmented basis sets have better statistics than unaugmented ones (with the exception of cc-pVDZ+d).

Fourth, among the ANO family, the fully augmented aug-ano-pVnZ basis sets have slightly better error statistics than the more economical ‘minimally augmented’ saug-ano-pVnZ basis sets.

Quadruple-ζ quality basis sets from all families outperform n = D and T members. The correlation-consistent haVQZ+d and the ANO set saug-ano-pVQZ yield similar RMSDs of 4.9 and 5.0 cm1 compared to experiment. For triple-ζ and especially double-ζ, the ANO basis sets are markedly superior. Somewhat surprisingly, def2-TZVPP and def2-QZVP appear to be superior to their cc-pVnZ counterparts, and similar error reductions occur as more zetas are added, with n= Q having a small RMSDexp (5.8 cm1). (We note that def2-QZVP and def2-QZVPP are equivalent for the molecules considered here, and that the step-size choice for numerical differentiation, 0.01 or 0.005 a.u., has no significant effect on the calculated frequencies.)

CCSD(T) with def2-TZVP is in nearly thrice worse agreement with experiment than the complete basis set limit. Given that def2-TZVP is used in the high-level correction [CCSD(T)-MP2] because of its lower cost in the G4-type cWFTs, any improvements of the error statistics over ‘pure’ CCSD(T)/def2-TZVP would make these cWFTs useful for spectroscopy.

ANO basis sets outperform similarly-sized correlation consistent basis sets with notable improvements in RMSDCBS, which are 7.2, 1.7, and 1 cm1 over VnZ with n= D, T, and Q, respectively. The lowest errors are obtained for ano-pV5Z in CCSD(T) with an RMSD of 2.8 cm1 relative to reference. Another possibility is that the ANO basis sets – which are better equipped than small-medium sized correlation consistent basis sets for predicting accurately harmonic frequencies [Citation76, Citation123, Citation124] – may offer advantages in composite wave function schemes as discussed below.

The hypersensitivity to the basis set of the acetylene bending frequencies was first noted by Lee and coworkers [Citation125] and analysed in detail in Refs. [Citation126, Citation127] as an intramolecular BSSE (basis set superposition error) problem. Another frequency that exhibits basis set hypersensitivity is the umbrella mode of ammonia – which is a conspicuous outlier even for ano-pV5Z, less so for saug-ano-pV5Z.

3.2. Composite wave function theory approaches

Composite wave function theory has paved the way for cost-effective computations without significantly compromising accuracy. Various classes of additivity schemes have been studied previously, successfully predicting geometrical parameters [Citation59, Citation60], rotational constants [Citation61], and vibrational frequencies [Citation58, Citation62, Citation63, Citation70, Citation71, Citation128].

The cost-effectiveness in these methods is achieved by combining various levels of electron correlation treatments using additive approximations, basis set extrapolations, and, when applicable, empirical corrections. By way of illustration, consider the following expression:

E = MP2/LARGE + [CCSD(T)/SMALL – MP2/SMALL]

Now, if we express MP2 as HF + E2 and CCSD(T) as HF + E2 + HLC, we can simplify it further, as

E = HF/LARGE + E2/LARGE + HF/SMALL + E2/SMALL + HLC/SMALL – HF/SMALL – MP2/SMALL where SMALL and LARGE correspond to two different basis set sizes. Simplifying this equation, we get

E = HF/LARGE + E2/LARGE + HLC/SMALL.

Therefore, the basis set for HF is effectively the same as LARGE for the MP2 correlation contribution, ensuring there is no mismatch as the other HF contributions cancel out.

Table  details error statistics for harmonic frequencies in the HFREQ2014 species, comparing them to calculated and experimental data. Detailed equations of our G4-type composite schemes have been previously provided in Refs. [Citation35, Citation36] and the top performing G4-n, cc-G4-n, and G4-n-F12 methods in prediction of reaction energetics are now validated for harmonic frequencies. Some of these approaches are parameter-free while others achieve optimal results with a maximum of two fitted parameters. In addition, we consider the performance of W1val and W2val theories, i.e.Weizmann-n theories [Citation25–27] including only valence correlation.

Table 4. Root-mean-square deviations and mean absolute deviations (cm1) of calculated harmonic frequencies with various composite wave function schemes from computed CCSD(T*)(F12*)/VQZ-F12 calculations (referred to as CCSD(T)/CBS) and experiment for the HFREQ2014 dataset.

It can be rightly argued that, beyond small molecules where spectral inversion is comparatively easy, fundamental frequencies are more relevant for practical applications than harmonic frequencies. However, as shown in Table 4 of Ref. [Citation129], even for CCSD(T)/CBS simple scaling of harmonic frequencies carries an intrinsic error of about 25 cm1, comparable to the uncertainty in hybrid DFT harmonic frequencies. The use of dual or multiple scaling factors for different frequency ranges at semi-arbitrary cutoff points (e.g.[Citation130–133]) is a half-measure at best; second-order rotation-vibration perturbation theory (VPT2) [Citation107] will require a semidiagonal quartic force field. Schneider and Thiel [Citation134] as far back as 1989 (in the context of semiempirical MO theory) pointed out that all the required force constants can be obtained by finite differences (in normal coordinates) of analytical second derivatives: this may be a viable approach for the present cWFT methods, or cWFT harmonic frequencies may be combined with DFT anharmonic force fields as demonstrated by Boese and Martin for the azabenzenes.

3.2.1. G4-type cWFTs based on CCSD(T)/def2-TZVP

The most accurate result, using a def2 basis set for the CCSD(T) part, materialises for G4-T with RMSDCCSD(T)/CBS of 4.7 cm1 and RMSDexp of 5.4 cm1. In other words, not materially different from CCSD(T) at the valence CBS limit. Fundamentally, the same result is obtained if the force constants are obtained fully numerically at CCSD(T) and analytically at MP2. The lower-cost cost approach, G4-D, based on def2-SVP in CCSD(T), is three times worse in accuracy, while in G4-D-v2 with def2-SVPD, the error statistics are cut in half. The largest errors occur in the πg and πu degenerate bending modes of acetylene, which G4-D and G4-D-v2 underestimate by 26.9 and 39.9 cm1, while the accurate G4-T falls short by only 0.7 cm1.

An important technical note for wave function calculations using basis sets augmented with diffuse functions, particularly for linear molecules, is that employing such basis sets, e.g. def2-nZVPPD (n= T, Q) for acetylene, may result in an overlap matrix S with very small eigenvalues. Gaussian implements a form of SVD (singular value decomposition) in which the eigenvectors of S with eigenvalues below a cutoff (default value: 106) are discarded. This leads to truncation of the virtual orbital space, which may not be consistent across the surface – or even along a single normal mode displacement – and hence may cause erratic harmonic frequencies. In the present work, this occurred for the bending frequencies of acetylenes. To address this, we disabled the ‘SVD screening’ in Gaussian using IOp(3/32)=2; MOLPRO has no such screening in the first place, but for acetylene, def2-nZVPPD led to S eigenvalues below the default THROVL=108, and lowering THROVL brought on numerical issues that required severely tightening the integral evaluation cutoffs. No such problems were seen if the ‘D’ functions were retained on the heavy atoms but not on H, which we denote def2-nZVPPD and applied for acetylene.

Substituting E2/def2{T,Q}ZVPD in G4 type approaches, where diffuse functions are omitted on hydrogen atoms, has no significant effect on error statistics when compared to reference or experiment. G4-T' stands out as the most accurate with an RMSDCBS of 4.74 cm1, a mere 0.11 cm1 above regular G4-T. Moreover, opting for def2-T,QZVPD' in G4-type approaches effectively addresses concerns related to small eigenvalues in the overlap matrix, particularly for linear molecules, without compromising accuracy.

In our initial studies on G4-like cWFTs [Citation35, Citation36] we did not explore a D,T extrapolation in MP2, like E2/def2-SVSP,TZVPPD +[CCSD(T)-MP2]/def2-SVSP, where def2-SVSP means no p functions on hydrogen atoms, assuming it would be less accurate than other cWFTs. We trained such a G4-D-v0 composite on the GMTKN55 dataset and found a high WTMAD2 (weighted mean absolute deviation) value of 4.77 kcal/mol compared to CCSD(T)/CBS data, extracted from the ACCDB database [Citation135], or higher level from our earlier work. The RMSD values for harmonic frequencies are unacceptably high, reaching 26.07 and 27.29 cm1 compared to CCSD(T)/CBS and experimental values: this is on par with (much cheaper) hybrid DFT functionals such as B3LYP and TPSS0 [Citation129]. However, standard CCSD(T)/def2-SVSP is far less accurate than G4-D-v0, with corresponding RMSDs of 61.37 and 60.46 cm1. Clearly, there is no advantage in a D,T extrapolation and if 10 cm1 errors are acceptable then empirical spin-scaled double-hybrid functionals, such as DSD-PBEP86-D3(BJ) [Citation136] and revDSD-PBEP86-D4 [Citation137], represent a much more cost-effective alternate.

Additionally, we examined the impact of using VDZ-F12 in MP2-F12 instead of VTZ-F12. We saw RMSD values relative to CCSD(T)/CBS rise high as 13.54 and 10.92 cm1 with MP2-F12/VDZ-F12 and either [CCSD(T)-MP2]/ano-pVDZ or ano-pVTZ, respectively, which is on par with the performance of double hybrid functionals.

What about the effects of core-core and core-valence correlation? To answer that, we consider the correlation consistent augmented aug-cc-pwCVnZ basis sets that provide the necessary radial and angular flexibility in the core-valence region. In 2018, Sylvetsky and Martin [Citation138] showed that a awCVT,QZ basis set extrapolation at CCSD(T) proved sufficient and captured the most significant part of electron correlation. Here, we follow a two-step approach to assess those effects. Firstly, we check the effects of increased basis set radial flexibility for valence correlation by replacing def2-T,QZVPPD with awCVT,QZ at the MP2 level, while only correlating the valence electrons. In doing so, no material gains in accuracy are seen, but rather the reverse. The RMSDCCSD(T)/CBS increases by 2 cm1 for cc-G4(FC)-D, 0.9 cm1 for cc-G4(FC)-D-v2, and 0.6 cm1 for cc-G4-T, each compared to the corresponding G4-n. Secondly, we correlate all electrons in MP2 and the results showed the CV correlation improves the RMSD by 2.0 cm1 in cc-G4-D relative to cc-G4(FC)-D, with the former achieving identical accuracy to G4-D. Further improvement of 2.2 cm1 occurs with def2-SVPD in CCSD(T). The lowest RMSD of 4.25 cm1 is found for cc-G4-T, this result represents a 1.0 cm1 amelioration over valence-only cc-G4(FC)-T, and even surpassing G4-T by 0.3 cm1.

For higher accuracy regimes, we refer the reader to our recent study [Citation58] on ground-state spectroscopic constants of diatomic molecules from post-CCSD(T) up to CCSDTQ56. We showed there that 2 cm1 accuracy is achievable on a semi-routine basis (see Table 5 in ref. [Citation58]), but that this requires both post-CCSD(T) valence correlation correction at least at the CCSDT(Q)Λ level and core-valence correlation corrections at the CCSD(T) level. (Including each on its own will actually make agreement worse, as valence CCSD(T) benefits from a felicitous error compensation.) We have repeated the Dunham analyses [Citation139] from Ref. [Citation58] without the scalar relativistic correction. The largest individual difference in ωe is seen for HCl (−4.3 cm1) followed by −2.8 cm1 for HF, but most effects are on the order of 1 cm1 or less. Thus, the RMSD on ωe increased only mildly, from 2.10 to 2.55 cm1, while the effect on other spectroscopic constants was negligible: obviously, compared to 5–10 cm1 RMSDs for more approximate methods, such an increase is entirely negligible. Needless to say, this will no longer be the case for heavy p-block compounds.

We attempted to add diagonal Born-Oppenheimer corrections at the CCSD/aug-cc-pVTZ level using the implementation [Citation140] in CFOUR [Citation141]. While corrections may exceed 1 cm1 for H2, BH, and the like, for heavier diatomics they are negligible compared to other remaining error sources.

3.2.2. ANO basis sets for G4-type cWFTs

ano-pVnZ basis sets are now considered in the CCSD(T) part of cWFTs for the harmonic frequencies in HFREQ2014. The energy expressions of these cWFTs are simply derived from the sum of the total MP2 energy with a larger basis set and the higher level [CCSD(T)-MP2]/ano-pVTZ correction, without introducing empirical parameters. Using E2/anopVQZ in G4-Tano-v1, we find an RMSD of 6.6 cm1 w.r.t both reference and experiment (see Table  and Figure ). Closer agreement to CCSD(T) at the complete basis set limit is achieved with ano-pV5Z in MP2, leading to the G4-Tano-v2 variant, which shows better RMSDexp than G4-T' and G4-T by by approximately 0.3 cm1.

Figure 3. Box-and-whisker plot for harmonic frequency deviations of composite wave function schemes from CCSD(T*)(F12*)/VQZ-F12 for the HFREQ2014 dataset. Plot description details are the same as for Figure .

Figure 3. Box-and-whisker plot for harmonic frequency deviations of composite wave function schemes from CCSD(T*)(F12*)/VQZ-F12 for the HFREQ2014 dataset. Plot description details are the same as for Figure 2.

In our earlier work [Citation35], we found that MP2-F12-based methods, such as the parameter-free cc-G4-F12-T, yielded the lowest WTMAD2 values (∼1.0 kcal/mol) for the energetics of the GMTKN55 benchmark suite [Citation37]. That prompts the question whether the predicted harmonic frequencies can become more accurate by substituting explicitly correlated MP2-F12 for conventional MP2 in composite energy schemes. Among the tested MP2-F12-based variants, G4-Tano-F12-v2 is the most accurate with an RMSD of 3.7 cm1 relative to reference calculated frequencies, compared to 7 cm1 for G4-Dano-F12-v2. Reducing the basis set size to n = T from n = Q in MP2-F12/VnZ-F12 worsens RMSDCCSD(T)/CBS by 1.2 cm1 for G4-Tano-F12 and by 0.95 cm1 for G4-Dano-F12. Clearly, combining MP2-F12 correlation with CCSD(T)/ano-pVnZ presents an attractive option for accurate vibrational frequencies in parameter-free cWFTs. The calculated frequencies can be inspected in the Supporting Information.

Finally, we note that the Weizmann-n methods, W1 and W2, lead to the lowest RMSDs relative to the calculated reference harmonic frequencies, at 1.96 and 1.15 cm1, respectively. There is no systematic improvement in RMSDexp values over the most accurate ANO-based method, indicating that the convergence towards CCSD(T)/CBS has been achieved.

3.3. Harmonic frequencies of benzene: successes and limitations of cWFTs

In 1997, Martin, Taylor, and Lee [Citation76] computed the CCSD(T) geometry and harmonic force field of C6H6, and noted that the two out-of-plane ring modes ω4 and ω5 exhibit a more pronounced form of the same hypersensitivity as seen for the acetylene bending frequencies [Citation125] and traced to intramolecular BSSE in Ref. [Citation126]. (Moran et al. [Citation127] later extended this discussion to benzene.) Limitations of available computers at the time precluded going to larger basis sets such as haVQZ, but since ANOs minimise the BSSE for a given contracted size, ANO4321 was attempted and found to be resilient than cc-pVTZ.

As extracting a full set of experimental harmonic frequencies and anharmonicity constants for such a large molecule would require a staggering number of vibrational band origins, the available ‘experimental’ harmonic frequencies of Miani et al. [Citation142] are in truth semi-experimental (a term introduced in the spectroscopic realm by Jean Demaison [Citation143]), namely, from combining experimental fundamentals with a DFT calculated quartic force field. (We note in passing that results of a ‘blind challenge’ on the ground-state correlation energy of benzene were recently reported [Citation144].)

Hence benzene would appear to be a good ‘proof of concept’ for the application of composite WFTs to harmonic frequencies of not-so-small molecules. Owing to the high symmetry, we can actually carry out CCSD(T) and CCSD(T*)(F12*) calculations close to the basis set limit, giving us a realistic reference. Any cWFT that would reproduce the harmonic frequencies well might be a good candidate for an anharmonic force field.

Table  showcases the calculated harmonic frequencies using ‘pure’ coupled-cluster methods and their corresponding cWFTs. For the reference, we consider CCSD(T)/ano-pV5Z harmonic frequencies. Geometry optimizations were carried out in D2h point group symmetry and the Hessian was obtained through the method of finite differences.

Table 5. Calculated and experimentally derived harmonic frequencies (in cm1) for the benzene molecule.

Overall, the RMSDs consistently improve with larger basis sets, ranging from 3.6 cm1 for CCSD(T)/ano-pVTZ to 1.5 cm1 for ano-pVQZ. CCSD(T*)(F12*)/cc-pVTZ-F12 has an RMSD of 4.5 cm1, which drops to 1.6 cm1 for cc-pVQZ-F12. Much of that is due to the two problematic ω4 and ω5 modes, however: conspicuous discrepancies of −17 and −14 cm1, respectively, are observed at the CCSD(T*)(F12*)/VTZ-F12 level; these however are drastically reduced to {7,3} cm1, for VQZ-F12, and still further to {3,+1} cm1 for VT,QZ-F12 extrapolation. Statistics without them are RMSD = 1.9 for VTZ-F12 and 0.9 cm1 for VQZ-F12, which is more in line with HFREQ2014. A T,Q extrapolation (with an exponent of 4.5960, Table X in Hill et al. [Citation90]) yields RMSD=1.0 cm1 including all modes, and 0.8 cm1 excluding ω4 and ω5.

Next, we estimate the accuracy of composite schemes for benzene. The most accurate results are obtained in parameter-free methods based on a high level correction [CCSD(T)-MP2/ano-pVTZ] that is combined with either E2/anopVQZ (RMSD = 3.0 cm1) or E2/anopV5Z (RMSD = 2.8 cm1). Similar gains are observed for the MP2-F12-based G4-Tano-F12-v2, which exhibits an RMSD of 2.6 cm1, and this result is 1.0 cm1 lower than plain CCSD(T)/ano-pVTZ. Scaling the triples term E(T) does more harm than good, resulting in an increase of ∼0.2 cm1 for combinations of MP2-F12/VnZ-F12 with CCSD(T)/ano-pVTZ.

For the low-cost G4-T, with an RMSD of 6.4 cm1, the ω4 and ω5 modes are underestimated by −27 and −8 cm1, respectively. Introducing MP2-F12 correlation, instead of conventional MP2, leads to a slight deterioration of 0.2 cm1 with VTZ-F12, but a significant improvement of 2.0 cm1 occurs with VQZ-F12.

Consequently, for accurate harmonic frequencies it is recommended to combine MP2-F12/VnZ-F12 (n= T or Q) with CCSD(T)/ano-pVTZ, as such parameter-free cWFTs offer substantial gains.

3.4. Timing comparison of cWFT and DFT methods

The reviewers requested a timing comparison of the present and alternative approaches. Figure  depicts ‘wall clock’ times of selected cWFT and DFT methods for semi-numerical evaluation of harmonic frequencies obtained using MOLPRO from non-stationary geometries. All calculations ran on identical architecture nodes using 16 physical cores of an Intel Xeon Gold 5320 CPU with a maximum memory of 46.4 GB per core.

Figure 4. Visual representation of wall clock times for selected cWFT and DFT methods. Note that the y axis is logarithmic.

Figure 4. Visual representation of wall clock times for selected cWFT and DFT methods. Note that the y axis is logarithmic.

A notable speedup occurs with increasing molecular size when using cWFTs compared to ‘brute force’ CCSD(T) in the largest basis set of the composite scheme, the latter achieves speedups for cyclobutadiene by a factor of 53 for G4-D' and 27 for G4-T'. For ethylene, those ratios decrease to 29 and 21, respectively.

The MP2-F12-based method G4-Dano-F12-v1 is on par with G4-D'. However, the former exploits the accelerated basis set convergence at the MP2-F12 level, resulting to a 3.55 cm1 improvement in RMSD for HFREQ2014. Moreover, using the triple-ζ basis set VTZ-F12 in MP2-F12 makes G4-Tano-F12-v1 nearly three times more expensive than with VDZ-F12. Despite the increased cost, G4-Tano-F12-v1 is almost an order of magnitude less expensive than CCSD(T)/def2-QZVPPD'.

To conclude, we assess the computational costs of DFT for harmonic frequencies alongside cWFTs; in order to keep the timing comparison fair, we carried out these calculations using MOLPRO. Specifically, as representative examples for timing, we picked the meta-GGA functional TPSS [Citation145] and the hybrid functionals B3LYP [Citation146, Citation147] and ωB97X-D3(BJ) [Citation148], as well as the double-hybrid B2GP-PLYP [Citation149], which will have a similar cost as DSD-PBEP86-D3(BJ) [Citation136] or revDSD-PBEP86-D4 [Citation137] (which presently do not have analytical gradients in MOLPRO). Notably, G4-D' and B2GP-PLYP exhibit similar costs. Hybrid GGA functionals, such as ωB97X-D3(BJ) and ωB97X-V [Citation150] present the lowest-cost option, if ca. 30 cm1 accuracy is acceptable.

4. Conclusions

In this study, we have thoroughly examined the performance of various coupled-cluster composite wave function approaches (cWFT) for harmonic frequencies. Our investigation involved the development of extrapolation formulas for force constants, while also enabling geometry optimisation and harmonic frequency calculations through an implementation we provide.

We have validated the top-performing composite energy schemes, based on previous evaluations for reaction energetics in Refs. [Citation35, Citation36] using the large GMTKN55 test suite, against the harmonic frequencies in the HFREQ2014 dataset from CCSD(T*)(F12*)/VQZ-F12 calculations and experimental data. G4-T is three times more accurate than plain CCSD(T)/def2-TZVP, while G4-Tano is twice as accurate as CCSD(T)/ano-pVTZ. Notably, ANO basis sets combined with explicitly correlated MP2-F12, such as G4-Tano-F12, show promising performance, achieving accuracy of 5 cm1 compared to the experiment, and they are on par with the accuracy of CCSD(T) at the complete basis set limit. Following closely was our standard G4-T approach, built upon def2-TZVP for the high level correction. Additionally, the Weizmann-n theories, W1 and W2, delivered the most accurate results when compared to the calculated reference harmonic vibrational frequencies.

The addition of diffuse functions on hydrogen does not materially help performance for neutral molecules, and in fact causes significant near-linear-dependence issues. In codes that eliminate ‘near-singular’ eigenvectors of the overlap matrix (i.e. those for which the eigenvalue drops below a threshold), adding superfluous basis functions in general – and diffuse functions where they are unneeded in particular -- can cause discontinuities on a correlated potential energy surface as orbitals drop in and out of the virtual space. When carrying out (semi)numerical frequency calculations, this can cause erratic results, as we observed here for acetylene.

In summary, we recommend the following:

  • If an accuracy of 20–30 cm1 is sufficient, or if anharmonicity's deviation from a simple scaling factor exceeds that level (and an anharmonic force field is not a practical option), then consider a DFT option such as ωB97M-V [Citation151] or the even more economical B97M-V [Citation152].

  • If 10 cm1 is satisfactory, an empirical double hybrid like DSD-PBEP86 or revDSD-PBEP86 may be the right choice.

  • For 4–5 cm1 accuracy, consider present G4-type approaches including G4-Tano-v2 and G4-Tano-F12-v1. Both methods share the same high-level correction [CCSD(T)-MP2]/ano-pVTZ, but G4-Tano-v2 is combined with MP2/ano-pV5Z and G4-Tano-F12-v1 uses MP2-F12/VTZ-F12. These two composite schemes offer similar accuracy and computational cost, and are suitable for larger molecules, particularly if analytic second derivatives at MP2 and CCSD(T) are available. It is worth noting that no empirical scaling parameters were employed in these top-performing approaches.

  • For higher accuracy within the range of 1–2 cm1, it is important to extend beyond CCSD(T) as well as consider relativistic effects and the impact of diagonal Born-Oppenheimer corrections, especially in the case of hydrides.

Supplemental material

Supplemental Material

Download MS Word (13 KB)

Acknowledgments

The authors would like to thank Dr. Mark Vilensky for technical assistance with ChemFarm, and Dr. Margarita Shepelenko for critical reading of the manuscript prior to submission. ES thanks the Feinberg Graduate School (Weizmann Institute of Science) for a doctoral fellowship and the Onassis Foundation (Scholarship ID: FZP 052-2/2021-2022).

Disclosure statement

No potential conflict of interest was reported by the author(s).

Supporting information and data availability statement

A Microsoft Excel Workbook with all calculated harmonic frequencies for the HFREQ2014 dataset and the benzene molecule can be found at the DOI http://doi.org/10.34933/2b615f91-8187-47c1-a29b-86705932634a

, or the shortened URL http://tinyurl.com/G4compfreq. The hfreq implementation for geometry optimisation and calculation of harmonic vibrational frequencies with composite energy schemes is available on GitHub at http://github.com/msemidalas/hfreq. Additional data can be obtained from the authors upon reasonable request.

Additional information

Funding

Research at Weizmann was supported by the Israel Science Foundation (grant 1969/20), by the Minerva Foundation (grant 2020/05), and by a research grant from the Artificial Intelligence for Smart Materials Research Fund, in Memory of Dr. Uriel Arnon.

References