Multifidelity methods predict energies of organic molecules with coupled cluster accuracy

V. Vinod, D. Lyu, M. Ruth, U. Kleinekathöfer, P. R. Schreiner, and P. Zaspel. Predicting molecular energies of small organic molecules with multifidelity methods. J. Comput. Chem., 46: e70056, 2025. DOI:  https://doi.org/10.1002/jcc.70056; also available as chemrxiv-2024-9zz16.

Multifidelity methods for quantum chemistry (QC) is an effective machine learning (ML) tool to reduce computational costs without compromising on model accuracy. In this work, V. Vinod et al. assess the efficiency of several multifidelity methods in predicting energies of small organic molecules with coupled cluster triples (CCSD(T)) accuracy. In addition to an analysis of time-cost and model accuracy, the trained multifidelity models are used to predict CCSD(T) energies for a collection of atmospherically relevant molecules and highly conjugated molecules with high accuracy. This work is associated with the SPP 2363 on “Utilization and Development of Machine Learning for Molecular Applications – Molecular Machine Learning funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under its special priority program scheme.

Multifidelity data hierarchy study for excitation energies shows promising results for application of machine learning methods

V. Vinod and P. Zaspel. Investigating Data Hierarchies in Multifidelity Machine Learning for Excitation EnergiesJ. Chem. Theory Comput, 21, 6, 3077–3091, 2025. DOI: 10.1021/acs.jctc.4c01491; also available as arXiv:2410.11392.

Multifidelity machine learning (MFML) has shown to reduce the time-cost of generating training data for machine learning (ML) models used in predicting quantum chemistry (QC) properties. MFML achieves this by using training data from different accuracies, or fidelities. In this work, Vivin Vinod and Peter Zaspel investigate the effect of the multifidelity data hierarchies on the model cost and accuracy. With a new error metric, the error contours of MFML, the work systematically studies the impact of the different fidelities on the overall model error. Based on this outcome, a new multifidelity approach, the Γ-curve is implemented and shown to be a highly efficient method resulting in low model error with as little as two training samples at the costliest fidelity.

New development in multi-fidelity machine learning methods opens up possibilities for the use of heterogeneous data for the prediction of quantum chemical properties

V. Vinod and P. Zaspel. Assessing non-nested configurations of multifidelity machine learning for quantum-chemical properties. Machine Learning: Science and Technology, 5, 045005, 2024. DOI: 10.1088/2632-2153/ad7f25; also available as arXiv:2407.17087.

Multi-fidelity methods in machine learning (ML) of quantum chemistry (QC) properties have made high accuracy low cost models more accessible to the community. These have been used in application for a range of properties including excitation energies. Most multi-fidelity methods usually require a nested configuration of the training data, that is, calculations for a geometry are to be made at the lower fidelities as well as the higher fidelities. 
In a recent work, available as a preprint the authors, Vivin Vinod and Peter Zaspel assess a non-nested configuration of multi-fidelity machine learning (MFML) and optimized MFML (o-MFML) methods. Preliminary results suggest that while MFML would still require a nested data structure, o-MFML can generalize reasonably well over a non-nested training data structure. That is, o-MFML opens up avenues for the use of heterogeneous datasets reducing the requirement to make costly calculations for high-fidelity data.