### Overview

#### Official course description

Machine learning (ML) is about algorithms which are fed with (large quantities of) real-world data, and which return a compressed “model” of the data. An example is the “world model” of a robot: the input data are sensor data streams, from which the robot learns a model of its environment — needed, for instance, for navigation. Another example is a spoken language model: the input data are speech recordings, from which ML methods build a model of spoken English — useful, for instance, in automated speech recognition systems. There exist many formalisms in which such models can be cast, and an equally large diversity of learning algorithms. However, there is a relatively small number of fundamental challenges which are common to all of these formalisms and algorithms. The lecture introduces such fundamental concepts and illustrates them with a choice of elementary model formalisms (linear classifiers and regressors, radial basis function networks, clustering, neural networks, hidden Markov models). Furthermore, the lecture also provides a refresher of required mathematical material from probability theory and linear algebra.

#### Literature

**Primary text**:

Hastie, Tibshirani, Friedman: “The Elements of Statistical Learning” (Second Edition), Springer

**Recommended reading**:

Shalev-Shwartz, Ben-David: “Understanding Machine Learning: From Theory to Algorithms”, Cambridge University Press

**Further useful references for the math background:**Linear algebra and probability reviews available at http://cs229.stanford.edu/syllabus.html

#### Grading

The grades for this lecture will be determined as follows:

- final exam (100 %)

There will be no other formal requirements.

#### Final exam

All rules, times, etc. are consolidated in the Final exam announcement.

### Lecture style, tutorials, homeworks and further information

#### Online teaching

Online classes are carried out as follows:

**Video recordings of the class content**that would have been normally presented in the lecture slot. The videos can be either watched via the embedded player or downloaded by clicking on the name of the video.- Online quizzes that can be carried out and repeated at any time.
- The slides that were uploaded for each lecture anyway.
**Questions & Answer Video-Conferencing sessions**

The Video conferencing sessions take place in Microsoft Teams in the respective course on

- Wednesdays, starting at 9:00
- Thursdays, starting at 17:30

The instructor will keep the meeting running for at least ten minutes. If no student shows up in these ten minutes, the meeting is stopped.

#### Online tutorials

- Offered via video conferencing in MS Teams in the “Team” of this lecture
- Weekly tutorial classes offered by TAs:
- Mondays, 15:45-17:00
- Wednesdays, 17:15-18:30

- Content:
- Repetition and discussion of lecture content
- Discussion of upcoming and graded homework

- No mandatory attendance.

→ attendance highly recommended in order to be successful

#### Homeworks

**Flavour**

- one assignment sheet per week, published on moodle
- contents of each assignment sheet:
- ∼3 tasks: theory (manually computing predictor, proving, …)
- ∼1 task: programming (implementing ML algorithms)

→ programming language C/C++

**Submission**

- weekly deadline: Friday, 12:00 (noon)
- submission format:
- theory: via moodle
- programming: via moodle

- submissions in groups of 1 – 3

→ depending on class size and homework participation

→ might be subject to adjustments

**Grading (of non-mandatory homeworks)**

- exercises graded with points by TAs
- The points that students receive for their homework will have no influence on the final grade, i.e. doing the exercises is not mandatory.
- However: Students that are not able to achieve at least 50% of the points from the exercises should expect that they have not got enough training in the content and therefore will most likely have issues in the final exam.

### Code demos

- kNN regression
- kNN classification
- Linear regression (on 1d input)
- Linear regression (on 2d input)
- Bivariate Gaussian density
- Z score
- Training error
- Generalization error approximation by validation set approach
- Generalization error approximation by different methods for synthetic data
- Generalization error approximation by different methods for cancer data
- Bias – Variance tradeoff
- Linear vs. non-linear classification
- Linear vs. quadratic discriminant analysis
- kMeans clustering
- PCA compression
- PCA synthesis
- Regression using quadratic polynomial basis expansion (1D)
- Regression using quadratic polynomial basis expansion (2D)
- Regression using kernel model
- Gradient Decent methods applied in linear regression
- Neural Network regression in Keras
- Neural Network weight initialization
- Neural Network batch normalization
- Neural Network image classification with MLP and CNN URL

### Lecture content

#### Content until March 12 (i.e. in-person teaching)

- Lecture slides of February 5
- Lecture slides of February 6
- Lecture slides of February 12
- Lecture slides of February 13
- Lecture slides of February 19
- Lecture slides of February 20
- Lecture slides of February 26
- Lecture slides of February 27
- Lecture slides of March 4 (repetition class)
- Lecture slides of March 5 (second part of repetition)
- Lecture slides of March 5
- Lecture slides of March 11
- Lecture slides of March 12

#### Material for March 18

- Lecture slides (part 1/3 and 2/3)
- Lecture slides (part 3/3)
- Video: Introduction to Bias vs. Variance
- Video: Proof of Bias-Variance decomposition
- Video: Bias-Variance decomposition for kNN regression

#### Material for March 19

- Lecture slides (part 1/6 and 2/6)
- Lecture slides (part 3,4,5,6 of 6)
- Video: Proof of Bias-Variance dec. for kNN regression
- Video: Demo for Bias-Variance Tradeoff in kNN regression
- Video: Introduction to estimation of prediction error
- Video: What is training error?
- Video: Training error by example
- Video: Why training error is still important

#### Material for March 25

- Lecture slides
- Video: What is generalization error?
- Video: Expected generalization and the strange T
- Video: Introduction to empirical error estimation

#### Material for March 26

- Lecture slides (part 1/4)
- Lecture slides (part 2,3,4 of 4)
- Video: More advanced generalization error estimators
- Video: Introduction to classification
- Video: Linear methods in classification
- Video: Classification by linear regression

#### Material for April 1

- Lecture slides
- Video: Introduction into Linear Discriminant Analysis
- Video: The theorem behind LDA
- Video: Proving the theorem behind LDA
- Video: The LDA algorithm
- Video: How to measure prediction error & further classification methods

#### Material for April 2

- Lecture slides
- Video: Introduction to unsupervised learning
- Video: Introduction to clustering
- Video: Towards efficient combinatorial clustering
- Video: Proof of the loss reformulation
- Video: The K-means clustering algorithm
- Video: Examples

#### Material for April 15

- Lecture slides
- Video: Introduction to PCA & Compression motivation
- Video: Compression motivation cont.
- Video: Compression example
- Video: Data predictor motivation
- Video: Data predictor motivation cont.
- Video: Data predictor – Simple example

#### Material for April 16

- Lecture slides (part 1,2,3/7)
- Lecture slides (part 4,5,6,7/7)
- Video: Data predictor – Digits example
- Video: How to compute PCA
- Video: PCA algorithms
- Video: Building more complex models by basis expansions
- Video: Least squares regression for basis expansion models
- Video: Quadratic polynomial model
- Video: General polynomial models

#### Material for April 22

- Lecture slides (part 1,2/5)
- Lecture slides (part 3,4,5/5)
- Video: Kernel-based models
- Video: Least squares regression for kernel-based models
- Video: Motivation for Ridge Regression
- Video: Theory of Ridge Regression
- Video: Kernel Ridge Regression

#### Material for April 23

- Lecture slides for April 23 File
- Video: Introduction to Neural Networks and the multilayer perceptron
- Video: Definition and graphical representation of the MLP
- Video: Activations and how to do regression with the MLP
- Video: How to do classification with the MLP

#### Material for April 29

- Lecture Slides
- Video: Some important remarks on MLPs
- Video:Our first “deep” network
- Video: Introduction to Gradient Decent methods
- Video: Understanding Gradient Decent File

#### Material for April 30

- Lecture slides
- Video: Stochastic Gradient Decent
- Video: Mini-batch Gradient Decent and some remarks
- Video: Example of using Gradient Decent for Linear Regression

#### Material for May 6

- Lecture slides
- Video: Introduction towards backpropagation
- Video: Forward propagation
- Video: A central statement on how to compute gradient entries
- Video: The backpropagation formula

#### Material for May 7

- Lecture slides (parts 1,2 / 5)
- Lecture slides (parts 3,4,5 / 5)
- Video: Proof of the backpropagation formula
- Video: The backpropagation algorithm
- Video: Training an FFNN
- Video: The vanishing / exploding gradients problem
- Video: The batch normalization layer (1)

#### Material for May 13

- Lecture slides
- Video: The batch normalization layer (2)
- Video: Overfitting in FFNN training
- Video: Motivation for CNNs
- Video: The convolutional layer
- Video: The pooling layer and CNN architectures
- Video: Image classification by MLP and CNN

#### Final exam discussion lecture (May 14)

The final lecture will be again a live lecture without lecture recording. It will take place in the original lecture slot.