Banner Banner

Relative and systematic methods in computational alchemy

Simon León Krug

December 17, 2024

The prediction of properties is among the central pursuits of computational chemistry, condensed matter, and materials design. However, the mathematical and computational difficulties associated with solving the underlying Schrödinger equation, and the impossibility of encyclopedic approaches due to the sheer scaling of chemical compound space, are often reasons to resort to relative and approximate methods, the latter being preferably systematic, i.e. generalizable and equipped with an error estimate. In this thesis, we contribute two such relative and systematic approaches: firstly, we establish the Alchemical Integral Transform (AIT) based on thermodynamic integration of electronic energies along an alchemical path lambda between two iso-electronic systems A and B. In its essence, the AIT allows its user to transfer the lambda-dependency inherent to a general electron density to a parametrization of coordinates in the initial electron density. We examine rigorous and approximate parametrizations for different systems. The parametrization allows a natural inclusion of external constrains, e.g. a known electric dipole moment. However, these constraints also limit the possible compounds under consideration which might be a worthwhile endeavor in the context of machine learning (ML) models---either for the purpose of efficient data selection or when informing ML architectures. Secondly, we employ a novel alchemical and harmonic Ansatz in lambda (AHA) combined with a new, electronic interatomic potential to describe the energy of all neutral iso-electronic diatomics with one calibration calculation. This is used as a baseline model in Delta-ML using kernel ridge regression to evaluate its systematic qualities. Through learning curves, we obtain sought after information about generalizability and scalability. In doing so, we find a clear example why a systematic baseline model is crucial for the accuracy of Delta-ML, as numerical proximity to reference data alone is shown to be a poor indicator for the quality of a model. So far, the AHA is limited to iso-electronic diatomics but with possible extensions to larger molecules, a matching number of valence electrons or multi-level learning.

BIFOLD AUTHORS