Reinforcement Learning

Mis à jour le

Responsable(s) : M. Stefano SECCI

  • Cours
Code Cnam : USEET8

Envie d'en savoir plus sur cette formation ?

Afin d’obtenir les tarifs, le calendrier de la formation, en distanciel, en présentiel, le lieu de la formation et un contact, remplissez les critères suivants :

Afficher le centre adapté à mes besoins

Afin d’obtenir les tarifs, le calendrier de la formation et le lieu de la formation, remplissez les critères suivants :

  • Durée : 30 heures
  • Package
  • 3 crédits

Présentation

Public, conditions d'accès et prérequis

Prérequis

  • Students are required to have taken an introductory machine learning course. 
  • Good knowledge on probability and statistics is expected.  
  • Bases on Markov Chains are recommended, but this is not a prerequisite. 

Objectifs

This course provides an overview of reinforcement learning (RL) methods. Both theoretical and programming aspects will be extensively explored in this course in order to acquire a solid expertise on both. By the end of the course, students should: 

  • Understand the notion of stochastic approximations and their relation with RL;  
  • Understand the basis of Markov decision theory;  
  • Apply Dynamic Programming methods to solve the Bellman equations;  
  • Master the basic techniques of Reinforcement Learning: Monte Carlo, Time-difference and Policy Gradient;  
  • Study a proof of convergence for RL algorithms; 
  • Master more advanced techniques such as actor-critic methods and deep RL. 

Compétences et débouchés

Programme

Contenu

This course will introduce machine learning techniques based on stochastic approximations and MDP models, i.e., SARSA, Q-learning, policy gradient. Two homework assignments will focus on implementing these techniques, in order to learn how to master them by direct implementation. A project in teams of 2/3 students will permit to address more advanced techniques and problems in the field of RL and more in general the application of Markov theory for modeling and optimization.  

 Lectures: 

  • Course Overview. Introduction to Markov decision theory,  stochastic approximations, and reinforcement learning; 
  • Stochastic approximations: the Robbins-Monro algorithm;  
  • Criteria for convergence;  
  • Application to admission control problems;  
  • Markov decision processes: definitions, average cost and discounted cost;  
  • Bellman equations. Solutions based on Dynamic Programming;  
  • Monte Carlo methods for Reinforcement Learning;  
  • Time Difference methods: SARSA and Q-Learning; 
  • Proof of convergence of Q-Learning; 
  • Policy gradient: REINFORCE; 
  • Actor-critic methods; 
  • Multi-armed bandits;  
  • Deep-reinforcement Learning. 

 Lab assignments:   

  • Practice of stochastic approximation on a traffics admission problem; 
  • Practice of Montecarlo, Q-learning and SARSA on gridworld (discounted cost);  
  • Practice of buffer management with admission control (average cost). 

Modalités d'évaluation

Final exam, lab and research project reports. 

All students in the class will also conduct a research project in the field of reinforcement learning and write a short 5-page paper. Subjects will be provided during the first-class session, related to Constrained RL and Delayed RL. 

Bibliographie

  • S. Russell, P. Norvig, Prentice Hall . Artificial Intelligence: A modern approach, 3rd edition, 2010.
  • R. S. Sutton, A. G. Barto, MIT Press . Reinforcement Learning: An Introduction, 1992