David Ifeoluwa Adelani

Biography

David Adelani is an assistant professor at McGill University’s School of Computer Science under the Fighting Inequities initiative, and a core academic member of Mila – Quebec Artificial Intelligence Institute.

Adelani’s research focuses on multilingual natural language processing with special attention to under-resourced languages.

Current Students

Jonah Dauvet

Research Intern - McGill University

Senyu Li Li

PhD - McGill University

Sifan Liu

Research Intern - McGill University

Jessica Ojo

Master's Research - McGill University

Fabian Schmidt

Collaborating Alumni - McGill University

McGill University

Vivek Verma

Professional Master's - Université de Montréal

Tianyi Xu Xu

Research Intern - McGill University

[email protected]

Peter Yu

Master's Research - McGill University

Website

Publications

AfroBench: How Good are Large Language Models on African Languages?

Jessica Ojo

Kelechi Ogueji

Pontus Stenetorp

2023-11-14

ArXiv (preprint)

AfroBench: How Good are Large Language Models on African Languages?

Jessica Ojo

Kelechi Ogueji

Pontus Stenetorp

2023-11-14

ArXiv (preprint)

How good are Large Language Models on African Languages?

Jessica Ojo

Kelechi Ogueji

Pontus Stenetorp

2023-11-14

ArXiv (preprint)

Better Quality Pre-training Data and T5 Models for African Languages

Akintunde Oladipo

Mofetoluwa Adeyemi

Orevaoghene Ahia

Abraham Toluwase Owodunni

Odunayo Ogundepo

Jimmy Lin

In this study, we highlight the importance of enhancing the quality of pretraining data in multilingual language models. Existing web crawl… (see more)s have demonstrated quality issues, particularly in the context of low-resource languages. Consequently, we introduce a new multilingual pretraining corpus for

2023-10-07

EMNLP/2023/Conference (accepted)

openreview.net

Improving Language Plasticity via Pretraining with Active Forgetting

Yihong Chen

Kelly Marchisio

Roberta Raileanu

Pontus Stenetorp

Sebastian Riedel

Mikel Artetxe

Pretrained language models (PLMs) are today the primary model for natural language processing. Despite their impressive downstream performan… (see more)ce, it can be difficult to apply PLMs to new languages, a barrier to making their capabilities universally accessible. While prior work has shown it possible to address this issue by learning a new embedding layer for the new language, doing so is both data and compute inefficient. We propose to use an active forgetting mechanism during pretraining, as a simple way of creating PLMs that can quickly adapt to new languages. Concretely, by resetting the embedding layer every K updates during pretraining, we encourage the PLM to improve its ability of learning new embeddings within limited number of updates, similar to a meta-learning effect. Experiments with RoBERTa show that models pretrained with our forgetting mechanism not only demonstrate faster convergence during language adaptation, but also outperform standard ones in a low-data regime, particularly for languages that are distant from English. Code will be available at https://github.com/facebookresearch/language-model-plasticity.

openreview.net

YORC: Yoruba Reading Comprehension dataset

Aremu Anuoluwapo

Jesujoba Oluwadara Alabi

In this paper, we create YORC: a new multi-choice Yoruba Reading Comprehension dataset that is based on Yoruba high-school reading comprehen… (see more)sion examination. We provide baseline results by performing cross-lingual transfer using existing English RACE dataset based on a pre-trained encoder-only model. Additionally, we provide results by prompting large language models (LLMs) like GPT-4.

2023-08-18

ArXiv (preprint)

Consultative engagement of stakeholders toward a roadmap for African language technologies

Kathleen Siminyu

Jade Abbott

Kọ́lá Túbọ̀sún

Aremu Anuoluwapo

Blessing Kudzaishe Sibanda

Kofi Yeboah

Masabata Mokgesi-Selinga

Frederick R. Apina

Angela Thandizwe Mthembu

Arshath Ramkilowan

Babatunde Oladimeji

2023-08-01

Patterns (published)

NollySenti: Leveraging Transfer Learning and Machine Translation for Nigerian Movie Sentiment Classification

Iyanuoluwa Shode

Jing Peng

Anna Feldman

Africa has over 2000 indigenous languages but they are under-represented in NLP research due to lack of datasets. In recent years, there hav… (see more)e been progress in developing labelled corpora for African languages. However, they are often available in a single domain and may not generalize to other domains. In this paper, we focus on the task of sentiment classification for cross-domain adaptation. We create a new dataset, Nollywood movie reviews for five languages widely spoken in Nigeria (English, Hausa, Igbo, Nigerian Pidgin, and Yoruba). We provide an extensive empirical evaluation using classical machine learning methods and pre-trained language models. By leveraging transfer learning, we compare the performance of cross-domain adaptation from Twitter domain, and cross-lingual adaptation from English language. Our evaluation shows that transfer from English in the same target domain leads to more than 5% improvement in accuracy compared to transfer from Twitter in the same language. To further mitigate the domain difference, we leverage machine translation from English to other Nigerian languages, which leads to a further improvement of 7% over cross-lingual evaluation. While machine translation to low-resource languages are often of low quality, our analysis shows that sentiment related words are often preserved.

2023-05-18

ArXiv (preprint)

AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages

Odunayo Ogundepo

Tajuddeen Gwadabe

Clara E. Rivera

Jonathan H. Clark

Sebastian Ruder

Bonaventure F. P. Dossou

Abdoulahat Diop

Claytone Sikasote

Gilles HACHEME

Happy Buzaaba

Ignatius Ezeani

Rooweither Mabuya

Salomey Osei

Chris Emezue

Albert Kahira

Shamsuddeen Hassan Muhammad

Akintunde Oladipo

Abraham Toluwase Owodunni

Atnafu Lambebo Tonja … (see 32 more)

Iyanuoluwa Shode

Akari Asai

Tunde Oluwaseyi Ajayi

Clemencia Siro

Stephen Arthur

Mofetoluwa Adeyemi

Orevaoghene Ahia

Aremu Anuoluwapo

Oyinkansola Awosan

Chiamaka Ijeoma Chukwuneke

Bernard Opoku

A. Ayodele

Verrah Akinyi Otiende

Christine Mwase

Boyd Sinkala

Andre Niyongabo Rubungo

Daniel Ajisafe

Emeka Felix Onwuegbuzia

Habib Mbow

Emile Niyomutabazi

Eunice Mukonde

Falalu Lawan

Ibrahim Ahmad

Jesujoba Oluwadara Alabi

Martin Namukombo

Mbonu Chinedu

Mofya Phiri

Neo Putini

Ndumiso Mngoma

Priscilla A. Amuok

Ruqayya Nasir Iro

Sonia Adhiambo34

2023-05-11

ArXiv (preprint)

SemEval-2023 Task 12: Sentiment Analysis for African Languages (AfriSenti-SemEval)

Shamsuddeen Hassan Muhammad

Idris Abdulmumin

Seid Muhie Yimam

Ibrahim Ahmad

Nedjma OUSIDHOUM

Abinew Ayele

Saif Mohammad

Meriem Beloucif

2023-04-13

ArXiv (preprint)

Varepsilon kú mask: Integrating Yorùbá cultural greetings into machine translation

Idris Akinade

Jesujoba Oluwadara Alabi

Clement Odoje

Dietrich Klakow

This paper investigates the performance of massively multilingual neural machine translation (NMT) systems in translating Yorùbá greetings… (see more) (kú mask), which are a big part of Yorùbá language and culture, into English. To evaluate these models, we present IkiniYorùbá, a Yorùbá-English translation dataset containing some Yorùbá greetings, and sample use cases. We analysed the performance of different multilingual NMT systems including Google and NLLB and show that these models struggle to accurately translate Yorùbá greetings into English. In addition, we trained a Yorùbá-English model by fine-tuning an existing NMT model on the training split of IkiniYorùbá and this achieved better performance when compared to the pre-trained multilingual NMT models, although they were trained on a large volume of data.

2023-03-31

ArXiv (preprint)

ε KÚ <MASK>: INTEGRATING YORÙBÁ CULTURAL GREETINGS INTO MACHINE TRANSLATION

Idris Akinade

Jesujoba Oluwadara Alabi