Pioneering Research on Excitation Energy Transfer in Light-Harvesting Systems Published in Advanced Theory and Simulations

D. Lyu, V. Vinod, M. Holzenkamp, Y. M. Holtkamp, S. Maity, C. R. Salazar, U. Kleinekathöfer, P. Zaspel. Excitation Energy Transfer between Porphyrin Dyes on a Clay Surface: A Study Employing Multifidelity Machine Learning. Adv. Theory Simul., e00271, 2025. DOI: 10.1002/adts.202500271; also available as arXiv.2410.20551.

Our research group is excited to announce the publication of our latest paper, “Excitation Energy Transfer between Porphyrin Dyes on a Clay Surface: A Study Employing Multifidelity Machine Learning,” in the journal Advanced Theory and Simulations. This work, authored by Dongyu Lyu, Vivin Vinod, Matthias Holzenkamp, Yannick Marcel Holtkamp, Sayan Maity, Carlos R. Salazar, Ulrich Kleinekathöfer, and Peter Zaspel, marks a significant advancement in computational chemistry and effective use of multifidelity methods in machine learning.

The study presents an application of understanding excitation energy transfer within complex synthetic light-harvesting systems. Inspired by nature’s efficient mechanisms, the research focuses on modeling the intricate interactions of 90-atom porphyrin molecules arranged on an anionic clay surface.

Key Highlights of the Research:

  • High-Accuracy Modeling of Large Systems: The team successfully developed a computational framework to accurately model a large system, processing an extensive dataset of 640,000 molecular geometries. This was achieved while maintaining high-level quantum chemical precision, specifically reaching Density Functional Theory (DFT) accuracy with the def2-SVP basis set.
  • Revolutionizing Efficiency with Multifidelity Machine Learning (MFML): A central innovation of this work is the strategic integration of a novel multifidelity machine learning approach. This method dramatically enhanced computational efficiency, yielding over 800x time savings compared to conventional high-fidelity calculations. By optimally leveraging data from multiple levels of theoretical fidelity, the MFML approach made the exploration of such a vast chemical space computationally feasible.
  • Insights into Energy Transfer: The findings provide crucial insights into the fundamental processes of excitation energy transfer among porphyrin dyes, demonstrating the immense potential of porphyrin-clay systems for energy applications.

This publication underscores our group’s commitment to pushing the boundaries of computational modeling to solve complex challenges in materials science and energy research. We invite you to explore the full details of this pioneering work.

Benchmarking Data Efficiency in Advanced Machine Learning Models for Quantum Chemistry

V. Vinod, P. Zaspel; Benchmarking data efficiency in Δ-ML and multifidelity models for quantum chemistry. J. Chem. Phys. 163 (2): 024134, 2025. DOI: 10.1063/5.0272457; also available as arXiv.2410.11391.

We are pleased to announce the publication of new research titled “Benchmarking data efficiency in Δ-ML and multifidelity models for quantum chemistry” in The Journal of Chemical Physics. This work, co-authored by Vivin Vinod and Peter Zaspel, benchmarks a critical component in machine learning for computational quantum chemistry: the high overhead cost associated with generating training data for machine learning (ML) models.

The development of ML methods has significantly enhanced the accessibility of quantum chemistry (QC) calculations by lowering their computational expense. However, this has shifted the focus to the efficiency of training data generation. Our latest study provides a comprehensive benchmark of the time-cost versus model accuracy for various cutting-edge multifidelity machine learning approaches in addition to a newly contributed methodological development called Multifidelity Δ Machine Learning.

Key Contributions of the Research:

  • Comprehensive Benchmarking: The study rigorously compares the data costs of several advanced ML methods: Δ-ML, multifidelity machine learning (MFML), optimized MFML (o-MFML), and a newly introduced method, Multifidelity Δ-Machine Learning (MFΔML). This assessment is based on the cost of generating training data for each model, directly contrasted with the single-fidelity kernel ridge regression approach.
  • Leveraging the QeMFi Dataset: For a uniform and robust assessment, the research utilized the QeMFi dataset, which comprises 135,000 geometries of nine chemically diverse molecules, each with five different fidelities of QC properties calculated using the time-dependent density functional theory (TD-DFT) formalism (STO-3G, 3-21G, 6-31G, def2-SVP, and def2-TZVP basis sets).
  • Predictive Capabilities: The models were evaluated for their ability to predict essential QC properties, including ground state energies, first and second vertical excitation energies, and the magnitude of electronic contribution to molecular dipole moments.
  • Optimizing for Different Prediction Scenarios: The results indicate that multifidelity methods generally outperform standard Δ-ML approaches when a large number of predictions are required. Furthermore, the newly developed MFΔML method offers a distinct advantage over conventional Δ-ML in applications where only a limited number of predictions or evaluations are needed.

This research is instrumental in guiding the selection of optimal ML methodologies for quantum chemistry, significantly contributing to the development of more efficient and cost-effective computational pipelines. It provides valuable insights for researchers aiming to accelerate discoveries in materials science and chemistry by minimizing the computational burden of high-accuracy calculations.

Multifidelity methods predict energies of organic molecules with coupled cluster accuracy

V. Vinod, D. Lyu, M. Ruth, U. Kleinekathöfer, P. R. Schreiner, and P. Zaspel. Predicting molecular energies of small organic molecules with multifidelity methods. J. Comput. Chem., 46: e70056, 2025. DOI:  https://doi.org/10.1002/jcc.70056; also available as chemrxiv-2024-9zz16.

Multifidelity methods for quantum chemistry (QC) is an effective machine learning (ML) tool to reduce computational costs without compromising on model accuracy. In this work, V. Vinod et al. assess the efficiency of several multifidelity methods in predicting energies of small organic molecules with coupled cluster triples (CCSD(T)) accuracy. In addition to an analysis of time-cost and model accuracy, the trained multifidelity models are used to predict CCSD(T) energies for a collection of atmospherically relevant molecules and highly conjugated molecules with high accuracy. This work is associated with the SPP 2363 on “Utilization and Development of Machine Learning for Molecular Applications – Molecular Machine Learning funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under its special priority program scheme.

Multifidelity data hierarchy study for excitation energies shows promising results for application of machine learning methods

V. Vinod and P. Zaspel. Investigating Data Hierarchies in Multifidelity Machine Learning for Excitation EnergiesJ. Chem. Theory Comput, 21, 6, 3077–3091, 2025. DOI: 10.1021/acs.jctc.4c01491; also available as arXiv:2410.11392.

Multifidelity machine learning (MFML) has shown to reduce the time-cost of generating training data for machine learning (ML) models used in predicting quantum chemistry (QC) properties. MFML achieves this by using training data from different accuracies, or fidelities. In this work, Vivin Vinod and Peter Zaspel investigate the effect of the multifidelity data hierarchies on the model cost and accuracy. With a new error metric, the error contours of MFML, the work systematically studies the impact of the different fidelities on the overall model error. Based on this outcome, a new multifidelity approach, the Γ-curve is implemented and shown to be a highly efficient method resulting in low model error with as little as two training samples at the costliest fidelity.

New development in multi-fidelity machine learning methods opens up possibilities for the use of heterogeneous data for the prediction of quantum chemical properties

V. Vinod and P. Zaspel. Assessing non-nested configurations of multifidelity machine learning for quantum-chemical properties. Machine Learning: Science and Technology, 5, 045005, 2024. DOI: 10.1088/2632-2153/ad7f25; also available as arXiv:2407.17087.

Multi-fidelity methods in machine learning (ML) of quantum chemistry (QC) properties have made high accuracy low cost models more accessible to the community. These have been used in application for a range of properties including excitation energies. Most multi-fidelity methods usually require a nested configuration of the training data, that is, calculations for a geometry are to be made at the lower fidelities as well as the higher fidelities. 
In a recent work, available as a preprint the authors, Vivin Vinod and Peter Zaspel assess a non-nested configuration of multi-fidelity machine learning (MFML) and optimized MFML (o-MFML) methods. Preliminary results suggest that while MFML would still require a nested data structure, o-MFML can generalize reasonably well over a non-nested training data structure. That is, o-MFML opens up avenues for the use of heterogeneous datasets reducing the requirement to make costly calculations for high-fidelity data.

Dataset of diverse quantum chemical properties to enable research and benchmarking of multifidelity machine learning models released!

VinoV. Vinod, and P. Zaspel. QeMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse MoleculesSci Data 12, 202, 2025. DOI:https://doi.org/10.1038/s41597-024-04247-3; also available as arXiv:2406.14149.

V. Vinod and P. Zaspel. QeMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules (1.1.0) [Data set]. Zenodo. 2024. https://zenodo.org/records/13925688.

With research booming in the field of multifidelity methods for Quantum Chemistry (QC), it becomes important to benchmark the various methods in interest of meaningful comparison of the models. This allows for expedited research by setting standards which subsequent research assess with their own methodological developments. In interest of such a uniform comparison, the Quantum chemistry MultiFidelity (QeMFi) dataset was distributed to the community on an open source CC-BY-4.0 license. Containing 135k geometries of diverse and chemically complex molecules taken from the WS22 database, the QeMFi datatset contains QC properties ranging from excitation energies to molecular dipole moments. For each property, five fidelities of properties are provided with DFT accuracy. The fidelities themselves are formed on the basis set choice. This dataset is a major step in the direction of research and development of multifidelity machine learning methods for QC.

The authors of this work are Vivin Vinod and Peter Zaspel.

Optimal Combination with Multifidelity Machine Learning Achieves Coupled Cluster Accuracy

Vinod, V., Kleinekathöfer, U., & Zaspel, P. (2024). Optimized multifidelity machine learning for quantum chemistry. Machine Learning: Science and Technology, 5(1), 015054 http://doi.org/10.1088/2632-2153/ad2cef.

Recent research in Multifidelity Machine Learning (MFML) has resulted in ML methods that reduce the cost of generating a training set without compromising on the accuracy of the predictions. This is achieved by the combination of cheaper and less accurate data with high accuracy (or fidelity) and high cost data. In this work, a novel methodological improvement of MFML is benchmarked for various quantum chemical (QC) properties. Optimized MFML (o-MFML) performs the combination of the different fidelities of data are using an Optimal Combination method. With this improvement, it is shown that high accuracy methods such as Coupled Cluster Singlets Double (Triplet) are now more accessible that ever to the ML-QC community. The work is available in the Machine Learning: Science and Technology journal from IOPScience and is authored by Vivin Vinod, Ulrich Kleinekathöfer, and Peter Zaspel.

Reducing Compute Costs of Generating Training Data for Excitation Energy Prediction Using Multifidelity Methods

Vinod, V., Maity, S., Zaspel, P., & Kleinekathöfer, U. (2023). Multifidelity machine learning for molecular excitation energies. Journal of Chemical Theory and Computation, 19(21), 7658-7670 https://doi.org/10.1021/acs.jctc.3c00882.

A major challenge to accurate predictions of quantum chemical (QC) properties with machine learning methods is the lack of high accuracy data. Generating high accuracy training data for machine learning (ML) is computationally expensive. With the multifidelity machine learning (MFML) method, cheaper and less accurate data is used alongside very little high accuracy data to result in a model with better accuracy in predicting high fidelity data. In this work, the MFML method is benchmarked for vertical excitation energies, a QC property vital to understanding elementary life processes such as photosynthesis. Numerical results indicate a time benefit over a factor of 30. This is a strong step towards development of ML methods for QC reducing the compute cost of generating a training set. This work is authored by Vivin Vinod, Sayan Maity, Peter Zaspel, and Ulrich Kleinekathöfer and has been published in the Journal of Chemical Theory and Computation.