Multi-fidelity machine learning

Imagine that we are not just given training samples (i.e. inputs and outputs) but specifically each of the samples is associated to a “cost” and might have a given “accuracy”.

This project investigates, how to find optimal machine learning models that do not only have a minimum loss / maximum accuracy but also a minimal overall cost. That is, we face a much more complex optimization task.

In the project, we touch topics in:

  • various regression ML methods
  • optimization
  • approximation theory

WARNING: Again a rather mathematical, but very beautiful topic. Could go from a very applied view to a pretty theoretical one.

This is highly research relevant and has very important applications in machine learning in simulation and other fields.

Some first links:

Machine Learning in Quantum Chemistry

In this research project, we are interested in the prediction of properties for molecules.

The project touches the following fields:

  • prediction of properties of molecules by quantum chemistry simulation software
  • optimal feature representation for molecules in machine learning
  • various types of machine learning techniques

This has tremendously important applications in the field of virtual material design, drug discovery, …

The actual work can range from utilizing and comparing existing machine learning methods in that field to developing completely new approaches.

Here some links:

Machine Learning in Fluid Mechanics

Computational Fluid Mechanics is a field in engineering, in which the computer is used to solve mathematical equations that describe the behavior of fluids like air or water. The just mentioned solution process is typically called “simulation”. The objective of this research topic is to investigate the application of Machine Learning models in fluid simulations. That is, the typically expensive simulation process is replaced by a Machine Learning problem.

This research project touches the following topics:

  • Modeling of fluids by the Navier-Stokes equations
  • use of an existing Navier Stokes fluid solver to generate training snapshots
  • further development of machine learning techniques for prediction of outcomes of fluid simulations
  • time-series prediction / quantity of interest prediction / spatial prediction

This is another hot topic, at least in the “simulation business”. Research-relevant questions are:

  • Can we find ML models that nicely predict bifurcation-like behavior?
  • Can we use ML models as sub-models (homogenization-like) in bigger models?

Here some links:

Wavelets as Features for Time Series ML

In this project, the idea would be to familiarize oneself further with the following concepts

  • time series data
  • Wavelet analysis to generate features
  • several types of machine learning models
    • kernel ridge regression
    • multilayer perceptron
    • radial basis function networks
    • transfer learning using some well-known image classifier

Application data can range from Quantum Chemistry over Finance to Health, hence is very broad.
The main objective would be to start with a “black box” approach, i.e. using some existing implementation of a continuous wavelet filter bank and then to develop a deeper understanding on how the choice of some parameters in the wavelet filter bank influences the prediction quality.

A first reference:

Radial Basis Function networks

This topic combines prior knowledge on kernel ridge regression with neural networks. The following content will be considered:

  • (deep) neural networks
  • kernel ridge regression
  • radial basis function networks

The objective would be to study the relationship of the predictive power of kernel ridge regression and radial basis function networks based on given data from quantum chemistry or other relevant science application.

A few first links:

Neural Network Compression by Low Rank Approximation

This is a very technical topic, which I would be interested to explore. It involves:

  • neural networks
  • low rank matrix approximation

Here the idea is to speed up neural network inference and maybe even training by approximating fully connected layers (i.e. matrices) by low-rank approximations of them.

WARNING: This is again a very mathematical topic.
References to be collected:

Fast Kernel Ridge Regression by matrix approximation techniques

The topic of this project is the efficient training of Machine Learning by Kernel Ridge Regression.

Relevant content will be:

  • Kernel Ridge Regression
  • iterative solvers for linear systems
  • matrix approximation techniques:
    • low rank approximation (SVD, ACA, …)
    • Askit
    • Hierarchical Matrices

Application data should be large-scale and science-related. Maybe the first starting point would be data from quantum chemistry that I have access to.
The beauty of this project would be to further develop and analyze the impact of non-exact solvers for linear systems on the quality of the prediction of Kernel Ridge Regression. This is highly research relevant.

WARNING: Some flavor of this topic (e.g. hierarchical matrices) requires a profound mathematical background.

Some first links: