Reinforcement Learning

Code

USEET8

Domaine

Informatique Télécoms Médias numériques Cybersécurité

Description

This course will introduce machine learning techniques based on stochastic approximations and MDP models, i.e., SARSA, Q-learning, policy gradient. Two homework assignments will focus on implementing these techniques, in order to learn how to master them by direct implementation. A project in teams of 2/3 students will permit to address more advanced techniques and problems in the field of RL and more in general the application of Markov theory for modeling and optimization.

Lectures:

Course Overview. Introduction to Markov decision theory, stochastic approximations, and reinforcement learning;
Stochastic approximations: the Robbins-Monro algorithm;
Criteria for convergence;
Application to admission control problems;
Markov decision processes: definitions, average cost and discounted cost;
Bellman equations. Solutions based on Dynamic Programming;
Monte Carlo methods for Reinforcement Learning;
Time Difference methods: SARSA and Q-Learning;
Proof of convergence of Q-Learning;
Policy gradient: REINFORCE;
Actor-critic methods;
Multi-armed bandits;
Deep-reinforcement Learning.

Lab assignments:

Practice of stochastic approximation on a traffics admission problem;
Practice of Montecarlo, Q-learning and SARSA on gridworld (discounted cost);
Practice of buffer management with admission control (average cost).

Finalité

This course provides an overview of reinforcement learning (RL) methods. Both theoretical and programming aspects will be extensively explored in this course in order to acquire a solid expertise on both. By the end of the course, students should:

Understand the notion of stochastic approximations and their relation with RL;
Understand the basis of Markov decision theory;
Apply Dynamic Programming methods to solve the Bellman equations;
Master the basic techniques of Reinforcement Learning: Monte Carlo, Time-difference and Policy Gradient;
Study a proof of convergence for RL algorithms;
Master more advanced techniques such as actor-critic methods and deep RL.

Description des modalités d'évaluation

Final exam, lab and research project reports.

All students in the class will also conduct a research project in the field of reinforcement learning and write a short 5-page paper. Subjects will be provided during the first-class session, related to Constrained RL and Delayed RL.

Public

Students are required to have taken an introductory machine learning course.
Good knowledge on probability and statistics is expected.
Bases on Markov Chains are recommended, but this is not a prerequisite.

Nombre d’ECTS: 3

Modalité(s) d'évaluation: Contrôle continu; Examen final; Mémoire; Projet(s)

Date de début de validité: 01/09/2024

Date de fin de validité: 31/08/9999

Déployabilité: Offre déployable dans le réseau en cas d'agrément

Vérifier l'éligibilité de cette formation au CPF

Modalités et délais d'accès

Contactez-nous pour avoir plus d'informations concernant la formation qui vous intéresse.
En savoir plus sur nos modalités et délais d'accès

Équivalences, passerelles & suite de parcours

En savoir plus sur les équivalences, passerelles & suite de parcours

Financez votre formation

Tarifs et modes de financement

Statistiques

Accessibilité handicap

Bibliographie

S. Russell, P. Norvig, Prentice Hall : Artificial Intelligence: A modern approach, 3rd edition, 2010.
R. S. Sutton, A. G. Barto, MIT Press : Reinforcement Learning: An Introduction, 1992

Diplômes dans lesquels apparaît cette UE

Master Sciences, technologies, santé mention Informatique parcours Réseaux et objets connectés AI for connected industries - réseaux, objets connectés et intelligence artificielle à Mulhouse