Doina Precup

Mohammad Sami Nur Islam Islam

Jesse Farebrother

Doctorat - McGill

Superviseur⋅e principal⋅e :

Marc Gendron-Bellemare

Doctorat - McGill

Superviseur⋅e principal⋅e :

Eilif Benjamin Muller

Doctorat - McGill

Doctorat - McGill

Maîtrise recherche - McGill

Arushi Jain

Doctorat - McGill

Doctorat - McGill

Postdoctorat - McGill

Elaine Lau

Maîtrise recherche - McGill

Jonathan Lebensold

Collaborateur·rice alumni - McGill

Site web

Ray Luo

Doctorat - McGill

Superviseur⋅e principal⋅e :

G McCracken

Doctorat - McGill

Nazanin Mohammadi Sepahvand

Doctorat - McGill

Shahrad Mohammadzadeh

Maîtrise recherche - McGill

Superviseur⋅e principal⋅e :

Gabriela Moisescu-Pareja

Maîtrise recherche - McGill

Padideh Nouri

Doctorat - UdeM

Co-superviseur⋅e :

Charles Onu

Doctorat - McGill

Doctorat - McGill

Co-superviseur⋅e :

Nate Rahn

Doctorat - McGill

Superviseur⋅e principal⋅e :

Marc Gendron-Bellemare

Sahand Rezaei-Shoshtari

Doctorat - McGill

Co-superviseur⋅e :

Doctorat - McGill

Co-superviseur⋅e :

Doctorat - McGill

Co-superviseur⋅e :

Doctorat - McGill

Nishanth Anand Vemgal

Doctorat - McGill

Doctorat - McGill

Doctorat - McGill

Co-superviseur⋅e :

Samira Ebrahimi Kahou

Stagiaire de recherche - McGill

Steve Wen

Maîtrise recherche - McGill

Co-superviseur⋅e :

Gregory Dudek

Skipper : combiner l’abstraction spatiale et temporelle afin d’améliorer la généralisation

Zijing Wu

Doctorat - McGill

Co-superviseur⋅e :

Doctorat - McGill

Harry Zhao

Doctorat - McGill

Co-superviseur⋅e :

Billets de blogue

Generic thumbnail for Mila Blog articles.

22 février 2024

par

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Lire l'article

Publications

Plasticity as the Mirror of Empowerment

David Abel

Michael Bowling

Andre Barreto

Will Dabney

Shi Dong

Steven Hansen

Anna Harutyunyan

Clare Lyle

Razvan Pascanu

Georgios Piliouras

Jonathan Richens

Mark Rowland

Tom Schaul

Satinder Singh

2025-05-15

ArXiv (prépublication)

Plasticity as the Mirror of Empowerment

David Abel

Michael Bowling

Andre Barreto

Will Dabney

Shi Dong

Steven Hansen

Anna Harutyunyan

Clare Lyle

Razvan Pascanu

Georgios Piliouras

Jonathan Richens

Mark Rowland

Tom Schaul

Satinder Singh

2025-05-15

ArXiv (prépublication)

Language Agents Mirror Human Causal Reasoning Biases. How Can We Help Them Think Like Scientists?

Anthony GX-Chen

Dongyan Lin

Mandana Samiei

Blake Richards

Rob Fergus

Kenneth Marino

2025-05-14

ArXiv (prépublication)

Understanding the Effectiveness of Learning Behavioral Metrics in Deep Reinforcement Learning

Ziyan Luo

Tianwei Ni

Pierre-Luc Bacon

Xujie Si

A key approach to state abstraction is approximating behavioral metrics (notably, bisimulation metrics) in the observation space, and embed … (voir plus)these learned distances in the representation space. While promising for robustness to task-irrelevant noise shown in prior work, accurately estimating these metrics remains challenging, requiring various design choices that create gaps between theory and practice. Prior evaluations focus mainly on final returns, leaving the quality of learned metrics and the source of performance gains unclear. To systematically assess how metric learning works in deep RL, we evaluate five recent approaches. We unify them under isometric embedding, identify key design choices, and benchmark them with baselines across 20 state-based and 14 pixel-based tasks, spanning 250+ configurations with diverse noise settings. Beyond final returns, we introduce the denoising factor to quantify the encoder’s ability to filter distractions. To further isolate the effect of metric learning, we propose an isolated metric estimation setting, where the encoder is influenced solely by the metric loss. Our results show that metric learning improves return and denoising only marginally, as its benefits fade when key design choices, such as layer normalization and self-prediction loss, are incorporated into the baseline. We also find that commonly used benchmarks (e.g., grayscale videos, varying state-based Gaussian noise dimensions) add little difficulty, while Gaussian noise with random projection and pixel-based Gaussian noise remain challenging even for the best methods. Finally, we release an open-source, modular codebase to improve reproducibility and support future research on metric learning in deep RL.

2025-05-09

rl-conference.cc/RLC/2025/Conference (accepté)

Understanding the Effectiveness of Learning Behavioral Metrics in Deep Reinforcement Learning

Ziyan Luo

Tianwei Ni

Pierre-Luc Bacon

Xujie Si

2025-05-09

rl-conference.cc/RLC/2025/Conference (publié)

Generative AI: Hype, Hope, and Responsible Use in Science and Everyday Life

2025-05-01

Biological Psychiatry (publié)

doi.org

Capturing Individual Human Preferences with Reward Features

Andre Barreto

Vincent Dumoulin

Yiran Mao

Nicolas Perez-Nieves

Bobak Shahriari

Yann Dauphin

Hugo Larochelle

Reinforcement learning from human feedback usually models preferences using a reward model that does not distinguish between people. We argu… (voir plus)e that this is unlikely to be a good design choice in contexts with high potential for disagreement, like in the training of large language models. We propose a method to specialise a reward model to a person or group of people. Our approach builds on the observation that individual preferences can be captured as a linear combination of a set of general reward features. We show how to learn such features and subsequently use them to quickly adapt the reward model to a specific individual, even if their preferences are not reflected in the training data. We present experiments with large language models comparing the proposed architecture with a non-adaptive reward model and also adaptive counterparts, including models that do in-context personalisation. Depending on how much disagreement there is in the training data, our model either significantly outperforms the baselines or matches their performance with a simpler architecture and more stable training.

2025-03-21

ArXiv (prépublication)

doi.org

Capturing Individual Human Preferences with Reward Features

Andr'e Barreto

Vincent Dumoulin

Yiran Mao

Nicolas Perez-Nieves

Bobak Shahriari

Yann Dauphin

Hugo Larochelle

2025-03-21

ArXiv (prépublication)

Cracking the Code of Action: a Generative Approach to Affordances for Reinforcement Learning

Lynn Cherif

Flemming Kondrup

David Venuto

Ankit Anand

Agents that can autonomously navigate the web through a graphical user interface (GUI) using a unified action space (e.g., mouse and keyboar… (voir plus)d actions) can require very large amounts of domain-specific expert demonstrations to achieve good performance. Low sample efficiency is often exacerbated in sparse-reward and large-action-space environments, such as a web GUI, where only a few actions are relevant in any given situation. In this work, we consider the low-data regime, with limited or no access to expert behavior. To enable sample-efficient learning, we explore the effect of constraining the action space through *intent-based affordances* -- i.e., considering in any situation only the subset of actions that achieve a desired outcome. We propose **Code as Generative Affordances (

2025-03-05

ICLR.cc/2025/Workshop/DL4C (publié)

doi.org

Cracking the Code of Action: A Generative Approach to Affordances for Reinforcement Learning

Lynn Cherif

Flemming Kondrup

David Venuto

Ankit Anand

Agents that can autonomously navigate the web through a graphical user interface (GUI) using a unified action space (e.g., mouse and keyboar… (voir plus)d actions) can require very large amounts of domain-specific expert demonstrations to achieve good performance. Low sample efficiency is often exacerbated in sparse-reward and large-action-space environments, such as a web GUI, where only a few actions are relevant in any given situation. In this work, we consider the low-data regime, with limited or no access to expert behavior. To enable sample-efficient learning, we explore the effect of constraining the action space through intent-based affordances -- i.e., considering in any situation only the subset of actions that achieve a desired outcome. We propose **Code as Generative Affordances**

2025-03-05

ICLR.cc/2025/Workshop/DL4C (publié)

Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts

Samin Yeasar Arnob

Zhan Su

Minseon Kim

Oleksiy Ostapenko

Lucas Caccia

Alessandro Sordoni

Merging parameter-efficient task experts has recently gained growing attention as a way to build modular architectures that can be rapidly a… (voir plus)dapted on the fly for specific downstream tasks, without requiring additional fine-tuning. Typically, LoRA (Low-Rank Adaptation) serves as the foundational building block of such parameter-efficient modular architectures, leveraging low-rank weight structures to reduce the number of trainable parameters. In this paper, we study the properties of sparse adapters, which train only a subset of weights in the base neural network, as potential building blocks of modular architectures. First, we propose a simple method for training highly effective sparse adapters, which is conceptually simpler than existing methods in the literature and surprisingly outperforms both LoRA and full fine-tuning in our setting. Next, we investigate the merging properties of these sparse adapters by merging adapters for up to 20 natural language processing tasks, thus scaling beyond what is usually studied in the literature. Our findings demonstrate that sparse adapters yield superior in-distribution performance post-merging compared to LoRA or full model merging. Achieving strong held-out performance remains a challenge for all methods considered.

2025-03-05

ICLR.cc/2025/Workshop/MCDC (accepté)

Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts

Samin Yeasar Arnob

Zhan Su

Minseon Kim

Oleksiy Ostapenko

Lucas Caccia

Alessandro Sordoni

2025-03-05

ICLR.cc/2025/Workshop/MCDC (accepté)