Portrait of David Ifeoluwa Adelani

David Ifeoluwa Adelani

Core Academic Member
Canada CIFAR AI Chair
McGill University
Research Topics
Deep Learning
Natural Language Processing
Representation Learning
Speech Processing

Biography

David Adelani is an assistant professor at McGill University’s School of Computer Science under the Fighting Inequities initiative, and a core academic member of Mila – Quebec Artificial Intelligence Institute.

Adelani’s research focuses on multilingual natural language processing with special attention to under-resourced languages.

Current Students

Research Intern - McGill University
PhD - McGill University
Research Intern - McGill University
Master's Research - McGill University
Collaborating Alumni - McGill University
McGill University
Professional Master's - Université de Montréal
Research Intern - McGill University
Master's Research - McGill University

Publications

AfroBench: How Good are Large Language Models on African Languages?
Jessica Ojo
Kelechi Ogueji
Pontus Stenetorp
AfroBench: How Good are Large Language Models on African Languages?
Jessica Ojo
Kelechi Ogueji
Pontus Stenetorp
How good are Large Language Models on African Languages?
Jessica Ojo
Kelechi Ogueji
Pontus Stenetorp
Better Quality Pre-training Data and T5 Models for African Languages
Akintunde Oladipo
Mofetoluwa Adeyemi
Orevaoghene Ahia
Abraham Toluwase Owodunni
Odunayo Ogundepo
Jimmy Lin
In this study, we highlight the importance of enhancing the quality of pretraining data in multilingual language models. Existing web crawl… (see more)s have demonstrated quality issues, particularly in the context of low-resource languages. Consequently, we introduce a new multilingual pretraining corpus for
Improving Language Plasticity via Pretraining with Active Forgetting
Yihong Chen
Kelly Marchisio
Roberta Raileanu
Pontus Stenetorp
Sebastian Riedel
Mikel Artetxe
Pretrained language models (PLMs) are today the primary model for natural language processing. Despite their impressive downstream performan… (see more)ce, it can be difficult to apply PLMs to new languages, a barrier to making their capabilities universally accessible. While prior work has shown it possible to address this issue by learning a new embedding layer for the new language, doing so is both data and compute inefficient. We propose to use an active forgetting mechanism during pretraining, as a simple way of creating PLMs that can quickly adapt to new languages. Concretely, by resetting the embedding layer every K updates during pretraining, we encourage the PLM to improve its ability of learning new embeddings within limited number of updates, similar to a meta-learning effect. Experiments with RoBERTa show that models pretrained with our forgetting mechanism not only demonstrate faster convergence during language adaptation, but also outperform standard ones in a low-data regime, particularly for languages that are distant from English. Code will be available at https://github.com/facebookresearch/language-model-plasticity.
YORC: Yoruba Reading Comprehension dataset
Aremu Anuoluwapo
Jesujoba Oluwadara Alabi
In this paper, we create YORC: a new multi-choice Yoruba Reading Comprehension dataset that is based on Yoruba high-school reading comprehen… (see more)sion examination. We provide baseline results by performing cross-lingual transfer using existing English RACE dataset based on a pre-trained encoder-only model. Additionally, we provide results by prompting large language models (LLMs) like GPT-4.
Consultative engagement of stakeholders toward a roadmap for African language technologies
Kathleen Siminyu
Jade Abbott
Kọ́lá Túbọ̀sún
Aremu Anuoluwapo
Blessing Kudzaishe Sibanda
Kofi Yeboah
Masabata Mokgesi-Selinga
Frederick R. Apina
Angela Thandizwe Mthembu
Arshath Ramkilowan
Babatunde Oladimeji
NollySenti: Leveraging Transfer Learning and Machine Translation for Nigerian Movie Sentiment Classification
Iyanuoluwa Shode
Jing Peng
Anna Feldman
Africa has over 2000 indigenous languages but they are under-represented in NLP research due to lack of datasets. In recent years, there hav… (see more)e been progress in developing labelled corpora for African languages. However, they are often available in a single domain and may not generalize to other domains. In this paper, we focus on the task of sentiment classification for cross-domain adaptation. We create a new dataset, Nollywood movie reviews for five languages widely spoken in Nigeria (English, Hausa, Igbo, Nigerian Pidgin, and Yoruba). We provide an extensive empirical evaluation using classical machine learning methods and pre-trained language models. By leveraging transfer learning, we compare the performance of cross-domain adaptation from Twitter domain, and cross-lingual adaptation from English language. Our evaluation shows that transfer from English in the same target domain leads to more than 5% improvement in accuracy compared to transfer from Twitter in the same language. To further mitigate the domain difference, we leverage machine translation from English to other Nigerian languages, which leads to a further improvement of 7% over cross-lingual evaluation. While machine translation to low-resource languages are often of low quality, our analysis shows that sentiment related words are often preserved.
AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages
Odunayo Ogundepo
Tajuddeen Gwadabe
Clara E. Rivera
Jonathan H. Clark
Sebastian Ruder
Bonaventure F. P. Dossou
Abdoulahat Diop
Claytone Sikasote
Gilles HACHEME
Happy Buzaaba
Ignatius Ezeani
Rooweither Mabuya
Salomey Osei
Chris Emezue
Albert Kahira
Shamsuddeen Hassan Muhammad
Akintunde Oladipo
Abraham Toluwase Owodunni
Atnafu Lambebo Tonja … (see 32 more)
Iyanuoluwa Shode
Akari Asai
Tunde Oluwaseyi Ajayi
Clemencia Siro
Stephen Arthur
Mofetoluwa Adeyemi
Orevaoghene Ahia
Aremu Anuoluwapo
Oyinkansola Awosan
Chiamaka Ijeoma Chukwuneke
Bernard Opoku
A. Ayodele
Verrah Akinyi Otiende
Christine Mwase
Boyd Sinkala
Andre Niyongabo Rubungo
Daniel Ajisafe
Emeka Felix Onwuegbuzia
Habib Mbow
Emile Niyomutabazi
Eunice Mukonde
Falalu Lawan
Ibrahim Ahmad
Jesujoba Oluwadara Alabi
Martin Namukombo
Mbonu Chinedu
Mofya Phiri
Neo Putini
Ndumiso Mngoma
Priscilla A. Amuok
Ruqayya Nasir Iro
Sonia Adhiambo34
SemEval-2023 Task 12: Sentiment Analysis for African Languages (AfriSenti-SemEval)
Shamsuddeen Hassan Muhammad
Idris Abdulmumin
Seid Muhie Yimam
Ibrahim Ahmad
Nedjma OUSIDHOUM
Abinew Ayele
Saif Mohammad
Meriem Beloucif
Varepsilon kú mask: Integrating Yorùbá cultural greetings into machine translation
Idris Akinade
Jesujoba Oluwadara Alabi
Clement Odoje
Dietrich Klakow
This paper investigates the performance of massively multilingual neural machine translation (NMT) systems in translating Yorùbá greetings… (see more) (kú mask), which are a big part of Yorùbá language and culture, into English. To evaluate these models, we present IkiniYorùbá, a Yorùbá-English translation dataset containing some Yorùbá greetings, and sample use cases. We analysed the performance of different multilingual NMT systems including Google and NLLB and show that these models struggle to accurately translate Yorùbá greetings into English. In addition, we trained a Yorùbá-English model by fine-tuning an existing NMT model on the training split of IkiniYorùbá and this achieved better performance when compared to the pre-trained multilingual NMT models, although they were trained on a large volume of data.
ε KÚ <MASK>: INTEGRATING YORÙBÁ CULTURAL GREETINGS INTO MACHINE TRANSLATION
Idris Akinade
Jesujoba Oluwadara Alabi
Clement Oyeleke Odoje
Dietrich Klakow
This paper investigates the performance of massively multilingual neural machine translation (NMT) systems in translating Yorùbá greetings… (see more) (ε kú ), which are a big part of Yorùbá language and culture, into English. To evaluate these models, we present IkiniYorùbá, a Yorùbá-English translation dataset containing some Yorùbá greetings, and sample use cases. We analysed the performance of different multilingual NMT systems including Google and NLLB and show that these models struggle to accurately translate Yorùbá greetings into English. In addition, we trained a Yorùbá-English model by finetuning an existing NMT model on the training split of IkiniYorùbá and this achieved better performance when compared to the pre-trained multilingual NMT models, although they were trained on a large volume of data.