Multifidelity data hierarchy study for excitation energies shows promising results for application of machine learning methods

V. Vinod and P. Zaspel. Investigating Data Hierarchies in Multifidelity Machine Learning for Excitation EnergiesJ. Chem. Theory Comput, 21, 6, 3077–3091, 2025. DOI: 10.1021/acs.jctc.4c01491; also available as arXiv:2410.11392.

Multifidelity machine learning (MFML) has shown to reduce the time-cost of generating training data for machine learning (ML) models used in predicting quantum chemistry (QC) properties. MFML achieves this by using training data from different accuracies, or fidelities. In this work, Vivin Vinod and Peter Zaspel investigate the effect of the multifidelity data hierarchies on the model cost and accuracy. With a new error metric, the error contours of MFML, the work systematically studies the impact of the different fidelities on the overall model error. Based on this outcome, a new multifidelity approach, the Γ-curve is implemented and shown to be a highly efficient method resulting in low model error with as little as two training samples at the costliest fidelity.

Dataset of diverse quantum chemical properties to enable research and benchmarking of multifidelity machine learning models released!

VinoV. Vinod, and P. Zaspel. QeMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse MoleculesSci Data 12, 202, 2025. DOI:https://doi.org/10.1038/s41597-024-04247-3; also available as arXiv:2406.14149.

V. Vinod and P. Zaspel. QeMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules (1.1.0) [Data set]. Zenodo. 2024. https://zenodo.org/records/13925688.

With research booming in the field of multifidelity methods for Quantum Chemistry (QC), it becomes important to benchmark the various methods in interest of meaningful comparison of the models. This allows for expedited research by setting standards which subsequent research assess with their own methodological developments. In interest of such a uniform comparison, the Quantum chemistry MultiFidelity (QeMFi) dataset was distributed to the community on an open source CC-BY-4.0 license. Containing 135k geometries of diverse and chemically complex molecules taken from the WS22 database, the QeMFi datatset contains QC properties ranging from excitation energies to molecular dipole moments. For each property, five fidelities of properties are provided with DFT accuracy. The fidelities themselves are formed on the basis set choice. This dataset is a major step in the direction of research and development of multifidelity machine learning methods for QC.

The authors of this work are Vivin Vinod and Peter Zaspel.