## Monday, December 29, 2014

### Predicting binding free energies with electronic structure theory: thermodynamic considerations - Part 2

2015.01.25:  I have summarized discussion of this and related issues in this paper.  I have also made some changes in the post to reflect what I have learned since I wrote it.

Predicting absolute binding free energies of biologically relevant molecules (in aqueous solution at pH 7 using electronic structure theory without empirical adjustments to within 1 kcal/mol) is one of the holy grails of computational chemistry. Recent work by Grimme and co-workers (here and here) have come close with mean absolute deviations of about 2 kcal/mol for host-guest complexes. This and other work has shed some light on why it is so difficult to predict binding free energies in aqueous solution, which I will discuss here and in a previous post. In addition I'll talk about further improvements that could potentially increase the accuracy further.

The general approach is based on the following equation

$\Delta G^\circ = \Delta E + \Delta G^\circ_{\mathrm{RRHO}}+\Delta \delta G_{\mathrm{solv}}$

where $\Delta E$ is the change in electronic energy and $\Delta G^\circ_{\mathrm{RRHO}}$ the change in the translational, rotational, and vibrational free energy computed using the rigid-rotor and harmonic oscillator approximation, respectively. The vibrational frequencies are computed at a lower level of theory compared to that used to compute $\Delta E$. $\Delta \delta G_{\mathrm{solv}}$ is the change in solvation free energy computed using a polarizable continuum model such as COSMO.

pH and molecular charge. Virtually all binding measurements in aqueous solution are performed in buffer with a constant pH and many ligands and or receptors contain one or more ionizable groups. The charge of an ionizable (acid/base) group in aqueous solution depends on its pK$_a$ and the pH:

$q= \frac{1}{1+10^{\mathrm{pH}-\mathrm{p}K_a}}-\delta$

where $\delta$ is 1 for an acid and 0 for a base.  In most cases, selecting the wrong charge for the ligand and/or host will result in a significant error in the computed binding free energy.  The pK$_a$ can be computed using electronic structure theory or empirically using software such Marvin. However, if the pK$_a$ value is perturbed by the binding the situation may be complicated further. Here I illustrate this point for a simple example where the ligand (L) has a basic group that is neutral when deprotonated and the receptor (R) is non-ionizable.

$\mathrm{R + L(H^+) \rightleftharpoons RL(H^+)}$

The (apparent) experimental binding constant is then

$K^\prime=\mathrm{\frac{[RL]+[RLH^+]}{[R]([L]+[LH^+])}}$

and the corresponding binding free energy is

$\Delta G^{\prime\circ}=\Delta G^{\circ}(+)-RT\ln \left( \frac{1+10^{\mathrm{pH}-\mathrm{p}K_a^c}}{1+10^{\mathrm{pH}-\mathrm{p}K_a^f}} \right)$   (1)

where $\Delta G^{\circ}(+)$ is the binding free energy computed using the charged (protonated) form of the ligand and pK$_a^c$ and pK$_a^f$ are the pK$_a$ values the ligand bound to the receptor and the free ligand, respectively.

If the pK$_a$ is unaffected by the binding, the binding free energy is unaffected by pH and the chosen protonation state of the ligand. For the remaining scenarios it is instructive to plug in some numbers. For example, for pH = 7, pK$_a^f$ = 9 and pK$_a^c$ = 11, the ligand is protonated before and after binding and $\Delta G^{\prime\circ}=\Delta G^{\circ}(+)$ is a good approximation. However, if pH = 7, pK$_a^f$ = 5 and pK$_a^c$ = 3, the ligand is neutral before and after binding and assuming it is charged leads to a 2.7 kcal/mol error in the binding free energy (unless corrected by this equation).  Finally, if pH = 7, pK$_a^f$ = 8 and pK$_a^c$ = 6, the ligand is (91%) charged before and (91%) neutral after binding and assuming it remains charged leads to a 1.4 kcal/mol error in the binding free energy.

For many ligands of interest the pK$_a^f$ can be estimated fairly accurately in a matter of second using programs such as Marvin. The effect of binding on pK$_a^f$ can often be estimated by chemical intuition since hydrogen bonds to charged acid and basic groups tend to, respectively, lower or raise the pK$_a$ even further.  For example, if an amine with pK$_a^f$ = 9 binds to the receptor via hydrogen bonding, then pK$_a^c$ is likely higher than 9 and $\Delta G^{\prime\circ}=\Delta G^{\circ}(+)$ is a good approximation.  However, if pK$_a^f$ is close to 7 then pK$_a^c$ should be computed.  Also, it is possible for charged ligands to change to their neutral state if they bind to hydrophobic or similarly charged receptors.

If pK$_a^f$ is known with some degree of confidence then pK$_a^c$ can be estimated by

$\mathrm{p}K_a^c=\mathrm{p}K_a^f-\Delta G^\circ /RT\ln(10)$

where $\Delta A^\circ$ is the free energy change for

$\mathrm{RLH^+ + L \rightleftharpoons RL + LH^+}$

If there are several ionizable groups then Eq (1) generalizes to

$\Delta G^{\prime\circ}=\Delta G^{\circ}(-/+)-RT\ln \left(\sum_i \frac{1+10^{n_i(\mathrm{pH}-\mathrm{p}K_{a,i}^c)}}{1+10^{n_i(\mathrm{pH}-\mathrm{p}K_{a,i}^f)}} \right)$

where $\Delta G^{\circ}(-/+)$ is the binding free energy when all acids and bases are deprotonated and protonated, respectively, the sum runs over all ionizable groups and $n_i$ is 1 and -1 if $i$ is a base or acid, respectively.

However, this assumes that the ionizable groups titrate independently of one another, i.e. that the pK$_a$ value of one group is independent of the protonation states of all other ionizable groups.  If that is not the case then it is difficult to give a general expression for the pH-dependent free energy correction in terms of pK$_a$ values (though it can easily be derived for a specific case).

Instead a general expression can be written in terms of Legendre transformed free energies as suggested by Alberty (modified here to electronic structure calculations):

$G^{\prime\circ}(\overline{X})=-RT\ln \left( \sum_i \exp{(-G^{\prime\circ}(X_i)/RT)} \right)$

Here the sum runs over all possible protonation states and

$G^{\prime\circ}(X_i)=G^{\circ}(X_i)-n(\mathrm{H^+})[\delta G^\circ (\mathrm{H^+}) -RT\ln(10)\mathrm{pH}]$

where $G^{\circ}(X_i)$ is the usual standard free energy of protonation state $i$, $n(\mathrm{H^+})$ is the number of ionizable proton in pronation state $i$, and $\delta G^\circ (\mathrm{H^+})$ is the solvation free energy of the proton.  So in the case of ligand L considered above, $n(\mathrm{H^+})$ is 0 and 1 for L and LH$^+$, respectively. $\delta G^\circ (\mathrm{H^+})$ is usually taken from the literature where estimates vary between -265.8 and -268.6 kcal/mol.

Thus, Eq (1) can be rewritten as

$\Delta G^{\prime\circ}=G^{\prime\circ}(\overline{RL})-G^{\prime\circ}(\overline{L})-G^{\circ}(R)$

Since the electronic energy contribution to the standard free energy can be very large in magnitude this form is more easily evaluated

$G^{\prime\circ}(\overline{X})=G^{\prime\circ}({X_0})-RT\ln \left( 1+\sum_{i\ne0} \exp{(-G^{\prime\circ}(X_i)/RT)} \right)$

where $X_0$ is some arbitrarily chosen reference protonation state, for example that for which $n(\mathrm{H^+})$ = 0.  The sum can be combined with that over different conformations, discussed in a previous post.

Other ions
The buffers that are commonly used to regulate the pH also contain other ions, such as Na$^+$, Mg$^{2+}$, Cl$^-$.  At high ion concentrations, it is possible that these ions bind at certain sites in the ligand, receptor, or ligand-receptor complex with sufficient probability that they must be included in the thermodynamics. If so the exact same equations and considerations outlined above for H$^+$ also apply to, e.g. Cl$^-$ and pCl$^-$ (computed from the specified buffer concentration) is used instead of pH.

## Saturday, December 27, 2014

### Predicting binding free energies with electronic structure theory: thermodynamic considerations - Part 1

2015.01.25:  I have summarized discussion of this and related issues in this paper.  I have also made some changes in the post to reflect what I have learned since I wrote it.

Predicting absolute binding free energies of biologically relevant molecules (in aqueous solution at pH 7 using electronic structure theory without empirical adjustments to within 1 kcal/mol) is one of the holy grails of computational chemistry. Recent work by Grimme and co-workers (here and here) have come close with mean absolute deviations of about 2 kcal/mol for host-guest complexes. This and other work has shed some light on why it is so difficult to predict binding free energies in aqueous solution, which I will discuss here and in subsequent posts. In addition I'll talk about further improvements that could potentially increase the accuracy further.

The approach is based on the following equation

$\Delta G^\circ = \Delta E + \Delta G^\circ_{\mathrm{RRHO}}+\Delta \delta G_{\mathrm{solv}}$

where $\Delta E$ is the change in electronic energy and $\Delta G^\circ_{\mathrm{RRHO}}$ the change in the translational, rotational, and vibrational free energy computed using the rigid-rotor and harmonic oscillator approximation, respectively. The vibrational frequencies are computed at a lower level of theory compared to that used to compute $\Delta E$. $\Delta \delta G_{\mathrm{solv}}$ is the change in solvation free energy computed using a polarizable continuum model such as COSMO.

Electronic energy
Grimme has shown that dispersion typically makes a very big (>10 kcal/mol) contribution to binding free energies of host-guest complexes. Dispersion corrections are therefore a must if DFT is used to compute the electronic binding energy. Furthermore, Grimme has shown that three-body dispersion makes a non-negligible (2-3 kcal/mol) contribution to the electronic binding energy.  For convergent methods this effect is only included in rather expensive methods that involve triple-excitations such as MP4 and CCSD(T).

Molecular Thermodynamics
The translational, rotational and vibrational thermodynamic contribution to the binding free energy is very large (>10 kcal/mol) and must be included for accurate results.  Some years ago there was a bit of confusion in the literature about whether the RRHO approach was appropriate for condenses phase systems, but Zhou and Gilson have clarified this beautifully.

Standard state.  Most electronic structure codes compute the RRHO energy corrections for an ideal gas, where the standard state is a pressure of 1 bar.  The standard state for solution is 1 M, so the free energy of a molecule X must be corrected accordingly

$G^\circ_{\mathrm{RRHO}}(\mathrm{X}) = G^{\circ,\mathrm{1 bar}}_{\mathrm{RRHO}}(\mathrm{X})-RT\ln (V^{-1})$

where $V$ is the volume of an ideal gas at 1 bar and temperature $T$. At 298K $-RT\ln (V^{-1})$ = 1.90 kcal/mol.  This correction is already included in the solvation energy $\delta G_{\mathrm{solv}}$

Symmetry. Many host molecules and some guest molecules are symmetric, which affects the rotational entropy through the symmetry number ($\sigma$) which is a function of the point group.

$S_{rot}=R\ln\left(\frac{8\pi^2}{\sigma}\left(\frac{2\pi ekT}{h^2}\right)^{3/2}\sqrt{I_1I_2I_3}\right)$

It can be very difficult to build large molecules with the correct point group and most studies use $C_1$ symmetry.  In this case the effect of symmetry must be added manually to the free energy

$G^\circ_{\mathrm{RRHO}}(\mathrm{X}) \rightarrow G^\circ_{\mathrm{RRHO}}(\mathrm{X})+RT \ln(\sigma_X)$

As an example, the popular host molecule corcubit[7] has $D_{7h}$ symmetry and a corresponding $\sigma$ value of 14, in which case the correction contributes 1.56 kcal/mol to the free energy at 298K.

Anharmonicity and low frequency modes. Host-guest complexes can exhibit very low frequency vibrations on the order of 50 cm$^{-1}$ or less, which tend to dominate the vibrational entropy contribution.  Grimme (and many others) have questioned whether the harmonic approximation is valid for such low frequency modes and this is an open research question.  The main problem is that it is very difficult to compute the vibrational entropy exactly.  Most methods for computing anharmonic vibrational are developed to obtain the 2 or 3 lowest energy states, but for very low frequency modes 10-20 states likely significantly populated at room temperature and therefore contribute to the entropy.

In the absence of theoretical benchmarks, comparison to experiment can prove constructive. Kjærgaard and co-workers have recently measured standard binding free energies for small gas phase compounds and compared them to CCSD(T)/aug-cc-pV(T+d) calculations.  For example, in the case of acetronitrile-HCl the measured binding free energy at 295K is between 1.2 and 1.9 kcal/mol, while the predicted value is 2.3 kcal/mol using the harmonic approximation.  Since the errors in $\Delta E$ and the rigid-rotor approximation presumably are quite low, this suggest and error in the vibrational free energy of at most 1.1 kcal/mol, despite the fact that the lowest vibrational frequency is only about 30 cm$^{-1}$.  Furthermore, the error can be reduced by 0.4 kcal/mol by scaling the harmonic frequencies by anharmonic scaling factors suggested by Shields and co-workers.  Similar results were found for dimethylsulfide-HCl.  So there are some indications that the harmonic approximation yields free energy corrections that are reasonable and that can be improved upon by relatively minor corrections.

Grimme has taken a different approach by arguing that low-frequency modes resemble free rotations and using the corresponding entropy term for low frequency modes.  This changes the RRHO free energy correction by 0.5 - 4 kcal/mol, depending on the system.

Imaginary frequencies. Low frequencies are especially susceptible to numerical error and it is not unusual to see 1 or 2 imaginary frequencies of low magnitude in a vibrational analysis of a host-guest complex.  Since imaginary frequencies are excluded from the vibrational free energy this effectively removes 1 or 2 low frequency contributions to the vibrational free energy. For example, a 30 cm$^{-1}$ frequency contributes about 1.7 kcal/mol to the free energy at 298K.

Imaginary frequencies resulting from a flat PES and numerical errors can often be removed by making the convergence criteria for the geometry optimization and electronic energy minimization more stringent and making the grid size finer in the case of DFT calculations. If the Hessian is computed using finite difference it is important to use double-differencing.  If all else fails, it is probably better to pretend that the imaginary frequency is real and add the corresponding vibrational free energy contribution.

Conformations. One of the main problems in computing accurate binding free energies is to identify the structures of the host, guest and (especially) the host-guest complex with the lowest free energy. Because both the RRHO and solvation energy contributions contribute greatly to the binding free energy change, simply finding the structure with the lowest electronic energy and computing the free energy only for that conformation is probably not enough.

The free energy for a molecule (X) with N conformations the standard free energy is

$G^\circ(\mathrm{X})= G^\circ _0(\mathrm{X})-RT\ln\left(\sum^N_{i=1} \left( 1+\exp{\left( -(G^\circ _i- G^\circ _0)/RT\right)} \right)\right)$

where $G^\circ _0(\mathrm{X})$ is the conformation with lowest free energy.  Conformations with free energies higher than 1.36 kcal/mol contribute less than 0.1 to the sum at 298K.  So a significant number of very low free energy structures is needed to make even a 0.5 kcal/mol contribution to the free energy.  Conformations related by symmetry should not be included here as their effects are accounted for in the rotational entropy (see above).  Also, conformationally flexible regions of the ligand or host that are unaffected by binding need not be explored since the effect on the binding free energy will cancel.

## Tuesday, December 9, 2014

### QM computed standard free energy changes and pH

2015.01.25:  I have summarized discussion of this and related issues in this paper.  I have also made some changes in the post to reflect what I have learned since I wrote it.

The problem
Thermodynamics involving ionizable functional group at constant pH require special considerations. Robert Alberty has written extensively on this topic (example) but I confess I had some trouble relating the work to quantum chemical calculations.  This post aims to do just that.

Let's say you want to compute the standard free energy change at pH 7 for this equilibrium

$\mathrm{B \rightleftharpoons HA}$

where molecule $\mathrm{B}$ can be converted to an acid $\mathrm{HA}$, which is in equilibrium with it conjugate base and thus pH-dependent.

$\mathrm{HA \rightleftharpoons A^- + H^+}$

The apparent equilibrium constant ($K^\prime$) is

$K^\prime=\mathrm{\frac{[HA]+[A^-]}{[B]}}$

How to compute the corresponding standard free energy change?:

$\Delta G^{\prime\circ}=-RT\ln(K^\prime)$

The Solution
$K^\prime$ can be rewritten as

$K^\prime=K+\frac{KK_a}{[\mathrm{H^+}]}$

where

$K=\mathrm{\frac{[HA]}{[B]}}=\exp{(-\Delta G^\circ/RT)}$

and

$K_a=\mathrm{\frac{[A^-][H^+]}{[HA]}}=\exp{(-\Delta G^\circ_a/RT)}$

Thus,

$K^\prime=\exp{(-\Delta G^\circ/RT)} + \exp{(-\Delta G^\circ/RT)}\exp{(-\Delta G^\circ_a/RT)}10^{\mathrm{pH}}$

$=\exp{(-\Delta G^\circ/RT)} + \exp{(-(\Delta G^\circ+\Delta G^\circ_a-RT\ln(10)\mathrm{pH})/RT)}$

$=\frac{\exp{(-G^\circ(\mathrm{HA})/RT)}+\exp{(-G^{\prime\circ}(\mathrm{A^-})/RT)} }{\exp{(-G^\circ(\mathrm{B})/RT)}}$

where

$G^{\prime\circ}(\mathrm{A^-})=G^{\circ}(\mathrm{A^-})+[G^{\circ}(\mathrm{H^+})-RT\ln(10)\mathrm{pH}]$  (1)

Thus

$\Delta G^{\prime\circ}= G^{\prime\circ}(\mathrm{\overline{HA}})-G^\circ(\mathrm{B})$ (2)

where

$G^{\prime\circ}(\mathrm{\overline{HA}})=-RT\ln \left( \exp{(-G^\circ(\mathrm{HA})/RT)}+\exp{(-G^{\prime\circ}(\mathrm{A^-})/RT)} \right)$ (3)

Here $\mathrm{\overline{HA}}$ refers to "$\text{HA and A}$"

An example
How do you compute the standard free energy change in aqueous solution at pH 7 using QM for this reaction?:

$\mathrm{\text{HC(=O)-}NH_2+ H_2O \rightleftharpoons HCOOH + NH_3}$

First you optimize each molecule and perform the vibrational analysis to get the free energies.  You can account for solvation effects using a method like PCM or COSMO.  $\mathrm{HCOOH}$ and $\mathrm{NH_3}$ are acids and basis, respectively so you will also need the energies for $\mathrm{HCOO^-}$ and $\mathrm{NH_4^+}$

Most QM programs assume the molecules are in the gas phase when computing the Gibbs free energies so the standard state is 1 bar.  The free energies must be corrected for the solution standard state of 1 M:

$G^\circ(\mathrm{X}) = G^{\circ,\mathrm{1 bar}}(\mathrm{X})-RT\ln (V^{-1})$

where $V$ is the molar volume of an ideal at gas at your chosen temperature.  The exception is water since that is also the solvent.  Here the standard state is 55.34 M

$G^\circ(\mathrm{H_2O}) = G^{\circ,\mathrm{1 bar}}(\mathrm{H_2O})-RT\ln (55.34V^{-1})$

$G^{\prime\circ}(\mathrm{HCOO^-})$ and is then computed using Eq (1). $G^{\circ}(\mathrm{H^+})$ is usually taken from the literature though estimates vary between -265.8 and -268.6 kcal/mol.

$G^{\prime\circ}(\mathrm{\overline{HCOOH}})$ can then be computed using Eq (3).  The electronic energy component of $G^{\circ}(\mathrm{X})$ can be quite large in magnitude and give some numerical problems when computing the exponential function so Eq (3) and be rewritten as

$G^{\prime\circ}(\mathrm{\overline{HA}})= G^\circ(\mathrm{HA})-RT\ln \left(1+\exp{(-(G^{\prime\circ}(\mathrm{A^-})-G^\circ(\mathrm{HA}))/RT)} \right)$

Following a derivation similar to the one at the beginning of this post,

$G^{\prime\circ}(\mathrm{NH_4^+})=G^{\circ}(\mathrm{NH_4^+})-[G^{\circ}(\mathrm{H^+})-RT\ln(10)\mathrm{pH}]$

and $G^{\prime\circ}(\mathrm{\overline{NH_3}})$ is computed by Eq (3)

Finally, we put everything together

$\Delta G^{\prime\circ}=G^{\prime\circ}(\mathrm{\overline{HCOOH}})+G^{\prime\circ}(\mathrm{\overline{NH_3}})- G^\circ(\mathrm{\text{HC(=O)-}NH_2})-G^\circ(\mathrm{H_2O})$

Another way
If you happen to know the pKa values (e.g. from experiment) of the ionizable species you can use this expression

$\Delta G^{\prime\circ}=\Delta A^{\circ}-RT\ln \left( 1+10^{\mathrm{pH}- pK_{a,\mathrm{HCOOH}}} \right)-RT\ln \left( 1+10^{ pK_{a,\mathrm{NH_4^+}}-\mathrm{pH}} \right)$

where

$\Delta G^{\circ}=G^{\circ}(\mathrm{HCOOH})+G^{\circ}(\mathrm{NH_3})- G^\circ(\mathrm{\text{HC(=O)-}NH_2})-G^\circ(\mathrm{H_2O})$

## Wednesday, December 3, 2014

### Errors in the book

It seems the comment section for pages is no longer working on Blogger. So I am creating this post as a complement to this page.  Please leave comments below. Please remember to tag me (+jan jensen) so I am alerted to your comment.

## Tuesday, December 2, 2014

### Computational Chemistry Highlights: November issue

The November issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach, David Bowler, and Jan Jensen:

Quantum Chemical Approach to Estimating the Thermodynamics of Metabolic Reactions