New development in multi-fidelity machine learning methods opens up possibilities for the use of heterogeneous data for the prediction of quantum chemical properties

Vinod, V., & Zaspel, P. (2024). Assessing Non-Nested Configurations of Multifidelity Machine Learning for Quantum-Chemical Properties. arXiv preprint 2407.17087, http://arxiv.org/abs/2407.17087 

Multi-fidelity methods in machine learning (ML) of quantum chemistry (QC) properties have made high accuracy low cost models more accessible to the community. These have been used in application for a range of properties including excitation energies. Most multi-fidelity methods usually require a nested configuration of the training data, that is, calculations for a geometry are to be made at the lower fidelities as well as the higher fidelities. 
In a recent work, available as a preprint the authors, Vivin Vinod and Peter Zaspel assess a non-nested configuration of multi-fidelity machine learning (MFML) and optimized MFML (o-MFML) methods. Preliminary results suggest that while MFML would still require a nested data structure, o-MFML can generalize reasonably well over a non-nested training data structure. That is, o-MFML opens up avenues for the use of heterogeneous datasets reducing the requirement to make costly calculations for high-fidelity data.

Dataset of diverse quantum chemical properties to enable research and benchmarking of multifidelity machine learning models released!

Vinod, V., & Zaspel, P. (2024). CheMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules. arXiv preprint arXiv:2406.14149 https://doi.org/10.48550/arXiv.2406.14149.

Vinod, V., & Zaspel, P. (2024). CheMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules (1.0) [Data set]. Zenodo. https://zenodo.org/records/11636903.

With research booming in the field of multifidelity methods for Quantum Chemistry (QC), it becomes important to benchmark the various methods in interest of meaningful comparison of the models. This allows for expedited research by setting standards which subsequent research assess with their own methodological developments. In interest of such a uniform comparison, the quantum Chemistry MultiFidelity (CheMFi) dataset was distributed to the community on an open source CC-BY-4.0 license. Containing 135k geometries of diverse and chemically complex molecules taken from the WS22 database, the CheMFi datatset contains QC properties ranging from excitation energies to molecular dipole moments. For each property, five fidelities of properties are provided with DFT accuracy. The fidelities themselves are formed on the basis set choice. This dataset is a major step in the direction of research and development of multifidelity machine learning methods for QC.

The authors of this work are Vivin Vinod and Peter Zaspel. The dataset is available on ZENODO and the accompanying manuscript is available as a preprint.

Optimal Combination with Multifidelity Machine Learning Achieves Coupled Cluster Accuracy

Vinod, V., Kleinekathöfer, U., & Zaspel, P. (2024). Optimized multifidelity machine learning for quantum chemistry. Machine Learning: Science and Technology, 5(1), 015054 http://doi.org/10.1088/2632-2153/ad2cef.

Recent research in Multifidelity Machine Learning (MFML) has resulted in ML methods that reduce the cost of generating a training set without compromising on the accuracy of the predictions. This is achieved by the combination of cheaper and less accurate data with high accuracy (or fidelity) and high cost data. In this work, a novel methodological improvement of MFML is benchmarked for various quantum chemical (QC) properties. Optimized MFML (o-MFML) performs the combination of the different fidelities of data are using an Optimal Combination method. With this improvement, it is shown that high accuracy methods such as Coupled Cluster Singlets Double (Triplet) are now more accessible that ever to the ML-QC community. The work is available in the Machine Learning: Science and Technology journal from IOPScience and is authored by Vivin Vinod, Ulrich Kleinekathöfer, and Peter Zaspel.

Reducing Compute Costs of Generating Training Data for Excitation Energy Prediction Using Multifidelity Methods

Vinod, V., Maity, S., Zaspel, P., & Kleinekathöfer, U. (2023). Multifidelity machine learning for molecular excitation energies. Journal of Chemical Theory and Computation, 19(21), 7658-7670 https://doi.org/10.1021/acs.jctc.3c00882.

A major challenge to accurate predictions of quantum chemical (QC) properties with machine learning methods is the lack of high accuracy data. Generating high accuracy training data for machine learning (ML) is computationally expensive. With the multifidelity machine learning (MFML) method, cheaper and less accurate data is used alongside very little high accuracy data to result in a model with better accuracy in predicting high fidelity data. In this work, the MFML method is benchmarked for vertical excitation energies, a QC property vital to understanding elementary life processes such as photosynthesis. Numerical results indicate a time benefit over a factor of 30. This is a strong step towards development of ML methods for QC reducing the compute cost of generating a training set. This work is authored by Vivin Vinod, Sayan Maity, Peter Zaspel, and Ulrich Kleinekathöfer and has been published in the Journal of Chemical Theory and Computation.

Up to two open PhD positions in numerical methods for large-scale training of Gaussian processes

Up to two open PhD positions in numerical methods for large-scale training of Gaussian processes

Are you interested in developing new numerical methods for training of machine learning models on large to huge data sets? We are currently looking for PhD candidates that support us in breaking the barriers of computational complexity of Gaussian processes and kernel-based machine learning models.

Up to two PhD positions are available in the team of Prof. Peter Zaspel at University of Wuppertal, Germany. The positions are focused on the development of novel (numerical) methods with fast implementations that help to break the computational complexity in the training of Gaussian processes and kernel-based machine learning models. Depending on the candidate’s interest, a stronger focus can be put on the methods development in fast matrix approximations (e.g. hierarchical matrices, low-rank methods, sparse GPs, etc.) or on the fast implementation on GPUs, i.e. hardware-aware numerics and parallelization.

The team of Prof. Peter Zaspel is located at Bergische Universität Wuppertal. The international team focuses on methods development in machine learning, uncertainty quantification and high performance computing with context of applications from the natural sciences, engineering and beyond. It is embedded in the research group on Scientific Computing and High Performance Computing. For more details, see https://www.peter-zaspel.de/ and https://hpc.uni-wuppertal.de.

A successful applicant is expected to have a Master’s degree (or equivalent) in computer science, mathematics, physics, data science or similar discipline, strong analytical skills in context of machine learning and/or (numerical) mathematics, very good to excellent proficiency in a programming language (preferable Python or C/C++) and interest in developing novel training methods for kernel-based / Gaussian Process machine learning. Experience in matrix approximation techniques or hardware-aware programming on GPUs is an advantage. A good command of English is essential, both as the local working language and because of international collaborations. We look for a competent personality with initiative and commitment, who has the ability to work independently and who enjoys teaching (support).

We offer a 3 year PhD position. The salary will be paid in accordance with the Collective Agreement for the Public Service of the Federation (Tarifvertrag des öffentlichen Dienstes, TVöD Bund), with salary level 13 (75%). The position has teaching (support) duties. The place of employment will be Wuppertal, Germany.

The positions are available immediately and applications will be considered on a rolling basis, but not later than until July 15, 2024. In order to apply, please submit a letter of motivation, a CV, copies of transcripts and optionally a copy of your MSc thesis (all as one PDF). If you you would like to apply or have questions on the position please contact Prof. Peter Zaspel via zaspel(at)uni-wuppertal.de.

Expired: PhD position in large-scale GPU-based training in molecular machine learning

Are you interested in developing new machine learning training methods for large to huge data sets and have strong programming skills ideally on GPUs, then apply for our just opened PhD position!

A PhD position is available in the team of Prof. Peter Zaspel at University of Wuppertal, Germany. The position is focused on the development of novel machine learning training algorithms on GPUs in context of molecular simulations for materials design under the project “Multi-fidelity methods for fast large-scale mixed-precision molecular machine learning on GPUs”. The respective research will involve the further development of large-scale machine learning models and hardware-aware algorithms in an interdisciplinary application.

The team of Prof. Peter Zaspel is located at Bergische Universität Wuppertal. The international team focuses on machine learning, uncertainty quantification and high performance computing in context of applications from the natural sciences, engineering and beyond. It is embedded in the research group on Scientific Computing and High Performance Computing. For more details, see https://www.peter-zaspel.de/ and https://hpc.uni-wuppertal.de.

A successful applicant is expected to have a Master’s degree (or equivalent) in computer science, mathematics, physics, data science or similar discipline, strong analytical skills in context of machine learning and/or (numerical) mathematics, excellent proficiency in a programming language (preferable Python or C/C++) and interest in developing novel, hardware-aware training methods on GPUs in context of kernel-based / Gaussian Process machine learning for simulation applications. Experience in hardware-aware programming on GPUs or parallelization of algorithms is an advantage. A good command of English is essential, both as the local working language and because of our international collaborations. We look for a competent personality with initiative and commitment, who has the ability to work independently and who enjoys teaching (support).

We offer a 3 year PhD position. The salary will be paid in accordance with the Collective Agreement for the Public Service of the Federation (Tarifvertrag des öffentlichen Dienstes, TVöD Bund), with salary level 13 (75%). The position has teaching (support) duties. The place of employment will be Wuppertal, Germany.

The position is available immediately and applications will be considered until January 22, 2024. Applicants have to use the online portal of the University of Wuppertal https://stellenausschreibungen.uni-wuppertal.de, where the job offer is available under the ID 23417. There, you need to submit a letter of motivation, a CV, copies of transcripts and optionally a copy of your MSc thesis (all as one PDF). International applicants should switch to the English version of the web page via the flag symbol on the top of the page. Then, applicants will find the row with the above given ID and click on “Jetzt bewerben” on the right-hand side of that row. The remaining process is available in English. If you have questions on the position or with respect to the application process please contact Prof. Peter Zaspel via zaspel(at)uni-wuppertal.de.

Neuer Experte will innovative Forschungs-Software entwickeln

Peter Zaspel ist neuer Professor für Software für datenintensive Anwendungen an der Bergischen Universität Wuppertal.

Forschende in Wissenschaft und Technik arbeiten mit großen Datenmengen. Diese beinhalten Messungen, Simulationen und Kombinationen hiervon, wofür eine umfangreiche IT-Infrastruktur nötig ist. Die Entwicklung von effizienter und erweiterbarer Software sowie von Methoden für datenintensive Anwendungen in den Naturwissenschaften, der Technik und darüber hinaus ist das Ziel der Forschungsarbeit von Peter Zaspel. Anwendungsfelder solcher Programme können zum Beispiel Quantenchemie, Klimarekonstruktion und Medikamentenforschung sein.

Please find the full press release here.


PhD position “Digital Ice-Cores” – Paleo-Climate reconstruction using Bayesian modeling (m/f/d)

Project Goals

To understand future climate change, it is critical to fully understand the past and present climate system. Information about this is encrypted in paleoclimate archives such as ice- cores or bore-hole temperatures reflecting the temperature of glaciers such as Antarctica. However, it is difficult to de-convolve and combine this knowledge as proxy records are often time-uncertain, are noisy, sparse and record the climate in very different ways.

Recently, there have been advances in understanding the ice-core recording process. Based on this, proxy system models enabling the production of “digital” cores from climate model simulations, i.e. numerical forward models, were developed. Further, bore-hole temperatures and isotope data deliver complementary information. Still, the challenge to optimally invert the process from the climate to the ice-core record and to reconstruct the climate state from such sparse, noisy and diverse data is largely unresolved.

This PhD project aims to improve on the borehole inversion methods as well as on the climate field reconstruction technique by means of modern techniques in numerical modeling and Bayesian inference to optimally combine the various information sources.  In collaboration with data science and statistics experts and experts from paleo-climate research, this project aims to develop and test a new reconstruction technique that could provide a better quantitative access of paleo-climate data and insight into the past climate evolution.

Tasks

You will

  • Set up, i.e. discretize and efficiently implement forward models, given by advection diffusion equations, for glacier bore-hole temperature profiles.
  • Develop and test inversion methods for the relationship of water isotopes and temperature using Bayesian inference  
  • Use advanced Bayesian hierarchical modeling techniques to combine the information from water-isotopes and borehole temperatures to reconstruct the local temperature evolution. Ideally, this will be extended to spatio-temporal field reconstructions making use of the spatial physical covariance structure from reanalysis data.
    You will test this model using surrogate data from simulated (‘digital’) ice-cores –   Based on simulated cores and the developed framework, you will optimize the sampling strategy and show the feasibility and limitations of combined isotope and borehole thermometry to reconstruct the temperature evolution of Antarctica.

Requirements

  • Strong analytical, mathematical and statistical skills
  • Proficiency in a programming language (preferably Python or C/C++)
  • A degree (Master, Diploma) in mathematics, computer science, physics, climate sciences, or a related field
  • Excellent English language skills, both written and spoken
  • Experience in numerical analysis and Bayesian methods is a benefit
    Previous experience with ice-cores or (paleo)climate research is an advantage.

Further information

For further information please contact Thomas Laepple (tlaepple at awi dot de) or Peter Zaspel (p dot zaspel at jacobs-university.de). The place of employment will be the Jacobs University Bremen

This position is limited to 3 years. The salary will be paid in accordance with the German “Collective Agreement for the Public Service of the Federation” (Tarifvertrag des öffentlichen Dienstes, TVöD Bund), up to salary level 13 (100%).

All doctoral candidates will be members of AWI’s postgraduate program POLMAR and thus benefit from a comprehensive training program and extensive support measures.

PhD position in “Bayesian Chronology-Modeling for Paleoclimate archives”

Background

This cutting-edge project at the interface of data-science, statistics and geochronology aims to develop and apply a methodological framework needed to fuse chronologic information from different Earth components into one consistent picture. Different Bayesian statistical models will be combined to synthesize absolute and relative age-information across published timeseries in a flexible and extendable way. This will not only result in a new approach to investigate complex systems in a data-driven way, but also in a consistently dated network of past environmental changes.

This project is part of the Helmholtz School for Marine Data Science (MarDATA) which aims to define and educate a new type of “marine data scientist” by introducing and embedding researchers from computer sciences and mathematics into ocean sciences, covering a broad range from supercomputing and modelling, (bio)informatics, robotics, to statistics and big data methodologies.

Tasks

  • Develop a Bayesian approach to synthesize stratigraphic information from different paleoclimate records
  • Implement and couple forward models of sediment and tracer deposition from the literature
  • Apply the coupled model to existing environmental records
  • Present at international conferences
  • Publish in peer-reviewed scientific journals

Requirements

  • a degree (Master’s, Diploma) in mathematics, computer science, physics, climate sciences, or a related field
  • Strong analytical, mathematical and statistical skills
  • proficiency in a programming language (preferably Python, C/C++ or R)
  • excellent English language skills, both written and spoken
  • experience in Bayesian statistics and/or machine learning techniques is a benefit
  • previous experience with (paleo)climate research is an advantage

Further information

For further information please contact Dr. Florian Adolphi, Tel: +49(471)4831-1008, Florian dot Adolphi at awi dot de or Prof. Dr. Peter Zaspel, Tel: +49(421)200-3051, p dot zaspel at jacobs-university dot de .

This position is limited to 3 years. The salary will be paid in accordance with the Collective Agreement for the Public Service of the Federation (Tarifvertrag des öffentlichen Dienstes, TVöD Bund), up to salary level 13 (100%).

The place of employment will be Jacobs University, Bremen, with the option of short-term/interims stays at Alfred Wegener Institute, Bremerhaven.

The candidate will participate in the Helmholtz School for Marine Data Science MarDATA. All doctoral candidates will be members of AWI’s postgraduate program POLMAR or another graduate school and thus benefit from a comprehensive training program and extensive support measures.

Research Project on Machine Learning for Molecular Systems

Priority Programs of the German Research Foundation (DFG) are intended to provide tangible impetus for the further development of science in important topics. Two researchers from Jacobs University Bremen are involved in a newly launched program on machine learning for molecular systems in physics and chemistry: Peter Zaspel, Professor of Computer Science, and Ulrich Kleinekathöfer, Professor of Theoretical Physics. This also gives them access to a nationwide network of experts for topical exchange.

Please find the full press release here.