8–12 juil. 2024
BÂTIMENT D’ENSEIGNEMENT MUTUALISÉ (BEM)
Fuseau horaire Europe/Paris

Syllabus Bootcamp

Session 1 - O. Colliot - Introduction to ML with a focus on validation
------------------------------------------------------------------

Goal: Introduce the basics of ML and describe in details how to perform validation

  • History and terminology
  • Problem setup for ML basics (Model, loss, learning procedure, features)
  • Generalization in ML (overfitting, underfitting and model selection)
  • Validation (performance metrics, validation strategies, statistical analysis)

 

Session 2 - G. Lemaitre - The scikit-learn API
----------------------------------------------

Goal: Introduce the `scikit-learn` API, with a focus on practical insights on the model validation and selection.

  • Overview of a simple cross-validation scheme k-fold
  • Overview of metrics (Regression, Classification)
  • Model selection through SearchCV
  • Cross validation in complex settings (stratification, groups, non-iid data)


Session 3 - T. Moreau - Learning with non-tabular data
------------------------------------------------------

Goal: Introduce the different types of data, with a focus on time-series, and the different methodologies to apply on each type.

  • Overview of the different types of data: tabular data, time series, images, graph, signals.
  • Overview of the specific problems and jargon with time series and signals.
  • How to get back to a "classical" ML framework?
  • Practical illustrations with time series.


Session 4 - R. Menegaux - Intro to deep learning
------------------------------------------------

Goal: Describe the main types of deep learning architectures, and apply them on a concrete example from life sciences.

  • Introduction: what is deep learning and why is everyone doing it?
  • Overview of the main types of deep learning architectures: MLP, convolutional and transformers. When to use one or the other?
  • Overview of the different training and regularization techniques.
  • Practical session on a simplified open-research problem.