Reinforcement Learning

Code
USEET8

Description

This course will introduce machine learning techniques based on stochastic approximations and MDP models, i.e., SARSA, Q-learning, policy gradient. Two homework assignments will focus on implementing these techniques, in order to learn how to master them by direct implementation. A project in teams of 2/3 students will permit to address more advanced techniques and problems in the field of RL and more in general the application of Markov theory for modeling and optimization.  

 Lectures: 

  • Course Overview. Introduction to Markov decision theory,  stochastic approximations, and reinforcement learning; 
  • Stochastic approximations: the Robbins-Monro algorithm;  
  • Criteria for convergence;  
  • Application to admission control problems;  
  • Markov decision processes: definitions, average cost and discounted cost;  
  • Bellman equations. Solutions based on Dynamic Programming;  
  • Monte Carlo methods for Reinforcement Learning;  
  • Time Difference methods: SARSA and Q-Learning; 
  • Proof of convergence of Q-Learning; 
  • Policy gradient: REINFORCE; 
  • Actor-critic methods; 
  • Multi-armed bandits;  
  • Deep-reinforcement Learning. 

 Lab assignments:   

  • Practice of stochastic approximation on a traffics admission problem; 
  • Practice of Montecarlo, Q-learning and SARSA on gridworld (discounted cost);  
  • Practice of buffer management with admission control (average cost). 

Finalité

This course provides an overview of reinforcement learning (RL) methods. Both theoretical and programming aspects will be extensively explored in this course in order to acquire a solid expertise on both. By the end of the course, students should: 

  • Understand the notion of stochastic approximations and their relation with RL;  
  • Understand the basis of Markov decision theory;  
  • Apply Dynamic Programming methods to solve the Bellman equations;  
  • Master the basic techniques of Reinforcement Learning: Monte Carlo, Time-difference and Policy Gradient;  
  • Study a proof of convergence for RL algorithms; 
  • Master more advanced techniques such as actor-critic methods and deep RL. 

Description des modalités d'évaluation

Final exam, lab and research project reports. 

All students in the class will also conduct a research project in the field of reinforcement learning and write a short 5-page paper. Subjects will be provided during the first-class session, related to Constrained RL and Delayed RL. 

Public

  • Students are required to have taken an introductory machine learning course. 
  • Good knowledge on probability and statistics is expected.  
  • Bases on Markov Chains are recommended, but this is not a prerequisite. 
Nombre d’ECTS
3
Modalité(s) d'évaluation
Contrôle continu
Examen final
Mémoire
Projet(s)
Date de fin de validité
Déployabilité
Offre déployable dans le réseau en cas d'agrément

Contactez-nous au sujet de cette unité