Yoshua Bengio

Biography

*For media requests, please write to [email protected].

For more information please contact Marie-Josée Beauchamp, Administrative Assistant at [email protected].

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific advisor of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as special advisor and founding scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Jamal Abou Haibeh

Collaborating Alumni - McGill University

Mohammed Abukalam

Collaborating Alumni - Université de Montréal

Berkes Anaïs

Collaborating researcher - Cambridge University

Principal supervisor :

Rim Assouel

PhD - Université de Montréal

Stefan Bauer

Independent visiting researcher

Co-supervisor :

Guillaume Lajoie

Paul Bertin

PhD - Université de Montréal

Shahana Chatterjee

Collaborating researcher - N/A

Principal supervisor :

David Rolnick

Xiaoyin Chen

PhD - Université de Montréal

Sanghyeok Choi

Collaborating researcher - KAIST

PhD - Université de Montréal

PhD - Université de Montréal

Research Intern - Université de Montréal

Co-supervisor :

Loubna Benabbou

Eric Elmoznino

PhD - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Jean-Pierre Falet

PhD - Université de Montréal

Co-supervisor :

Leo Feng

PhD - Université de Montréal

[email protected]

Ivan Grega

Research Intern - Université de Montréal

PhD

PhD - Université de Montréal

Edward Hu

PhD - Université de Montréal

Moksh Jain

PhD - Université de Montréal

PhD - Université de Montréal

Principal supervisor :

Collaborating Alumni - Université de Montréal

Hyeonah Kim

Postdoctorate - Université de Montréal

Principal supervisor :

Alex Hernandez

Yaroslav KIVVA

Collaborating researcher - Université de Montréal

Salem Lahlou

Collaborating Alumni - Université de Montréal

Tabitha Edith Lee

Postdoctorate - Université de Montréal

Principal supervisor :

Seanie Lee

Collaborating Alumni - Université de Montréal

Collaborating Alumni

Zhen Liu

Collaborating Alumni - Université de Montréal

Principal supervisor :

Liam Paull

Kanika Madan

PhD - Université de Montréal

Nikolay Malkin

Collaborating Alumni - Université de Montréal

Cristian Dragos Manta

PhD - Université de Montréal

Co-supervisor :

Dhanya Sridhar

Sören Mindermann

Collaborating researcher - Université de Montréal

Sarthak Mittal

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

Principal supervisor :

Postdoctorate - Université de Montréal

Principal supervisor :

Independent visiting researcher - Université de Montréal

Padideh Nouri

PhD - Université de Montréal

Principal supervisor :

Ali Parviz

Collaborating researcher - Ying Wu Coll of Computing

Lena Podina

PhD - University of Waterloo

Principal supervisor :

Collaborating Alumni - Max-Planck-Institute for Intelligent Systems

Amine RAZIG

Research Intern - Université de Montréal

Co-supervisor :

Loubna Benabbou

Jarrid Rector-Brooks

PhD - Université de Montréal

Danyal REHMAN

Postdoctorate - Université de Montréal

James Requeima

Independent visiting researcher - Université de Montréal

Oli RICHARDSON

Postdoctorate - Université de Montréal

Camille Rochefort-Boulanger

PhD - Université de Montréal

Principal supervisor :

Julie Hussin

Victor Schmidt

Collaborating Alumni - Université de Montréal

Postdoctorate - Université de Montréal

Master's Research - Université de Montréal

Marcin Sendera

Collaborating Alumni - Université de Montréal

Vedant Shah

Master's Research - Université de Montréal

Postdoctorate

Marco Stock

Independent visiting researcher - Technical University of Munich

[email protected]

Mélisande Astrid Crystal Teng

PhD - Université de Montréal

Co-supervisor :

Hugo Larochelle

Alex Tong

Postdoctorate - Université de Montréal

Postdoctorate - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Principal supervisor :

Collaborating researcher - Université de Montréal

Omar G. Younis

Collaborating researcher

Collaborating researcher - KAIST

Tianyu Zhang

PhD - Université de Montréal

PhD - McGill University

Principal supervisor :

PhD - Université de Montréal

Principal supervisor :

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Harry Zhao

PhD - McGill University

Principal supervisor :

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Scaling in the Service of Reasoning & Model-Based ML

April 4, 2023

Yoshua Bengio

Edward J. Hu

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

March 23, 2022

Paul Bertin

Jake P. Taylor-King

Yoshua Bengio

March 15, 2022

Generative Flow Networks

Yoshua Bengio

Publications

Deep Complex Networks

Chiheb Trabelsi

Olexa Bilaniuk

Dmitriy Serdyuk

Sandeep Subramanian

Joao Felipe Santos

Soroush Mehri

2017-05-27

ArXiv (preprint)

Deep Complex Networks

Chiheb Trabelsi

Olexa Bilaniuk

Dmitriy Serdyuk

Sandeep Subramanian

Joao Felipe Santos

Soroush Mehri

2017-05-27

ArXiv (preprint)

Deep Complex Networks

Chiheb Trabelsi

Olexa Bilaniuk

Dmitriy Serdyuk

Sandeep Subramanian

Joao Felipe Santos

Soroush Mehri

2017-05-27

ArXiv (preprint)

Deep Complex Networks

Chiheb Trabelsi

Olexa Bilaniuk

Dmitriy Serdyuk

Sandeep Subramanian

Joao Felipe Santos

Soroush Mehri

2017-05-27

ArXiv (preprint)

Deep Complex Networks

Chiheb Trabelsi

Olexa Bilaniuk

Dmitriy Serdyuk

Sandeep Subramanian

Joao Felipe Santos

Soroush Mehri

At present, the vast majority of building blocks, techniques, and architectures for deep learning are based on real-valued operations and re… (see more)presentations. However, recent work on recurrent neural networks and older fundamental theoretical analysis suggests that complex numbers could have a richer representational capacity and could also facilitate noise-robust memory retrieval mechanisms. Despite their attractive properties and potential for opening up entirely new neural architectures, complex-valued deep neural networks have been marginalized due to the absence of the building blocks required to design such models. In this work, we provide the key atomic components for complex-valued deep neural networks and apply them to convolutional feed-forward networks and convolutional LSTMs. More precisely, we rely on complex convolutions and present algorithms for complex batch-normalization, complex weight initialization strategies for complex-valued neural nets and we use them in experiments with end-to-end training schemes. We demonstrate that such complex-valued models are competitive with their real-valued counterparts. We test deep complex models on several computer vision tasks, on music transcription using the MusicNet dataset and on Speech Spectrum Prediction using the TIMIT dataset. We achieve state-of-the-art performance on these audio-related tasks.

2017-05-27

ArXiv (preprint)

Deep Complex Networks

Chiheb Trabelsi

Olexa Bilaniuk

Dmitriy Serdyuk

Sandeep Subramanian

Joao Felipe Santos

Soroush Mehri

2017-05-27

ArXiv (preprint)

Deep Complex Networks

Chiheb Trabelsi

Olexa Bilaniuk

Dmitriy Serdyuk

Sandeep Subramanian

Joao Felipe Santos

Soroush Mehri

2017-05-27

ArXiv (preprint)

Char2Wav: End-to-End Speech Synthesis

Jose Sotelo

Soroush Mehri

Kundan Kumar

Joao Felipe Santos

Kyle Kastner

We present Char2Wav, an end-to-end model for speech synthesis. Char2Wav has two components: a reader and a neural vocoder . The reader is an… (see more) encoder-decoder model with attention. The encoder is a bidirectional recurrent neural network that accepts text or phonemes as inputs, while the decoder is a recurrent neural network (RNN) with attention that produces vocoder acoustic features. Neural vocoder refers to a conditional extension of SampleRNN which generates raw waveform samples from intermediate representations. Unlike traditional models for speech synthesis, Char2Wav learns to produce audio directly from text

2017-02-17

International Conference on Learning Representations (published)

openreview.net

A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues

Iulian V. Serban

Ryan Lowe

Sequential data often possesses hierarchical structures with complex dependencies between sub-sequences, such as found between the utterance… (see more)s in a dialogue. To model these dependencies in a generative framework, we propose a neural network-based generative architecture, with stochastic latent variables that span a variable number of time steps. We apply the proposed model to the task of dialogue response generation and compare it with other recent neural-network architectures. We evaluate the model performance through a human evaluation study. The experiments demonstrate that our model improves upon recently proposed models and that the latent variables facilitate both the generation of meaningful, long and diverse responses and maintaining dialogue state.

2017-02-12

Proceedings of the AAAI Conference on Artificial Intelligence (published)

doi.org

Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation

Iulian V. Serban

Tim Klinger

Gerald Tesauro

Kartik Talamadupula

Bowen Zhou

We introduce a new class of models called multiresolution recurrent neural networks, which explicitly model natural language generation at m… (see more)ultiple levels of abstraction. The models extend the sequence-to-sequence framework to generate two parallel stochastic processes: a sequence of high-level coarse tokens, and a sequence of natural language words (e.g. sentences). The coarse sequences follow a latent stochastic process with a factorial representation, which helps the models generalize to new examples. The coarse sequences can also incorporate task-specific knowledge, when available. In our experiments, the coarse sequences are extracted using automatic procedures, which are designed to capture compositional structure and semantics. These procedures enable training the multiresolution recurrent neural networks by maximizing the exact joint log-likelihood over both sequences. We apply the models to dialogue response generation in the technical support domain and compare them with several competing models. The multiresolution recurrent neural networks outperform competing models by a substantial margin, achieving state-of-the-art results according to both a human evaluation study and automatic evaluation metrics. Furthermore, experiments show the proposed models generate more fluent, relevant and goal-oriented responses.

2017-02-12

Proceedings of the AAAI Conference on Artificial Intelligence (published)

doi.org

An Actor-Critic Algorithm for Sequence Prediction

Dzmitry Bahdanau

Philemon Brakel

Kelvin Xu

Anirudh Goyal

Ryan Lowe

Joelle Pineau