David Ifeoluwa Adelani

Biography

David Adelani is an assistant professor at McGill University’s School of Computer Science under the Fighting Inequities initiative, and a core academic member of Mila – Quebec Artificial Intelligence Institute.

Adelani’s research focuses on multilingual natural language processing with special attention to under-resourced languages.

Current Students

Jonah Dauvet

Research Intern - McGill University

Senyu Li Li

PhD - McGill University

Sifan Liu

Research Intern - McGill University

Jessica Ojo

Master's Research - McGill University

Fabian Schmidt

Collaborating Alumni - McGill University

McGill University

Vivek Verma

Professional Master's - Université de Montréal

Tianyi Xu Xu

Research Intern - McGill University

[email protected]

Peter Yu

Master's Research - McGill University

Website

Publications

ANGOFA: Leveraging OFA Embedding Initialization and Synthetic Data for Angolan Language Model

Osvaldo Luamba Quinjica

In recent years, the development of pre-trained language models (PLMs) has gained momentum, showcasing their capacity to transcend linguisti… (see more)c barriers and facilitate knowledge transfer across diverse languages. However, this progress has predominantly bypassed the inclusion of very-low resource languages, creating a notable void in the multilingual landscape. This paper addresses this gap by introducing four tailored PLMs specifically finetuned for Angolan languages, employing a Multilingual Adaptive Fine-tuning (MAFT) approach. In this paper, we survey the role of informed embedding initialization and synthetic data in enhancing the performance of MAFT models in downstream tasks. We improve baseline over SOTA AfroXLMR-base (developed through MAFT) and OFA (an effective embedding initialization) by 12.3 and 3.8 points respectively.

2024-03-03

ICLR.cc/2024/Workshop/AfricaNLP (published)

EkoHate: Offensive and Hate Speech Detection for Code-switched Political discussions on Nigerian Twitter

Comfort Eseohen Ilevbare

Jesujoba Oluwadara Alabi

Bakare Firdous Damilola

Abiola Oluwatoyin Bunmi

ADEYEMO Oluwaseyi Adesina

Nigerians have a notable online presence and actively discuss political and topical matters. This was particularly evident throughout the 20… (see more)23 general election, where Twitter was utilized for campaigning, fact-checking and verification, and even positive and negative discourse. However, little or none has been done in the detection of abusive language and hate speech in Nigeria. In this paper, we curate code-switched Twitter data directed at three musketeers of the governorship election on the most populous and economically vibrant state in Nigeria; Lagos state, with the view to detect offensive and hate speech on political discussion. We develop EkoHate---an abusive language and hate speech dataset for political discussions between the three candidates and their followers using a binary (normal vs offensive) and fine-grained four-label annotation scheme. We analysed our dataset and provide an empirical evaluation of state-of-the-art methods across both supervised and cross-lingual transfer learning settings. In the supervised setting, our evaluation results in both binary and four-label annotation schemes show that we can achieve 95.1 and 70.3 F1 points respectively. Furthermore, we show that our dataset adequately transfers very well to two publicly available offensive datasets (OLID and HateUS2020) with at least 62.7 F1 points.

2024-03-03

ICLR.cc/2024/Workshop/AfricaNLP (published)

Enhancing Transformer Models for Igbo Language Processing: A Critical Comparative Study

Anthony Soronnadi

Olubayo Adekanmbi

Chinazo Anebelundu

2024-03-03

ICLR.cc/2024/Workshop/AfricaNLP (published)

NaijaRC: A Multi-choice Reading Comprehension Dataset for Nigerian Languages

Aremu Anuoluwapo

Jesujoba Oluwadara Alabi

Daud Abolade

Nkechinyere Faith Aguobi

Shamsuddeen Hassan Muhammad

In this paper, we create NaijaRC— a new multi-choice Nigerian Reading Comprehension dataset that is based on high-school RC examination fo… (see more)r three Nigerian national languages: Hausa (hau), Igbo (ibo), and \yoruba (yor). We provide baseline results by performing cross-lingual transfer using the Belebele training data which is majorly from RACE {RACE is based on English exams for middle and high school Chinese students, very similar to our dataset.} dataset based on several pre-trained encoder-only models. Additionally, we provide results by prompting large language models (LLMs) like GPT-4.

2024-03-03

ICLR.cc/2024/Workshop/AfricaNLP (published)

YAD: Leveraging T5 for improved automatic diacritization of Yorùbá text

Akindele Michael Olawole

Jesujoba Oluwadara Alabi

Aderonke Busayo Sakpere

In this work we present Yorùbá automatic diacritization (YAD) benchmark dataset for evaluating Yorùbá diacritization systems. In additio… (see more)n, we pre-train text-to-text transformer, T5 model for Yorùbá and showed that this model outperform several multilingually trained T5 models. Lastly, we showed that more data and bigger models are better at diacritization for Yorùbá

2024-03-03

ICLR.cc/2024/Workshop/AfricaNLP (published)

Are LLMs Breaking MT Metrics? Results of the WMT24 Metrics Shared Task

Markus Freitag

Nitika Mathur

Daniel Deutsch

Chi-kiu Lo

Eleftherios Avramidis

Ricardo Rei

Brian Thompson

Frédéric Blain

Tom Kocmi

Jiayi Wang

Marianna Buchicchio

Chrysoula Zerva

2024-01-01

Conference on Machine Translation (published)

Are LLMs Breaking MT Metrics? Results of the WMT24 Metrics Shared Task

Markus Freitag

Nitika Mathur

Daniel Deutsch

Chi-kiu Lo

Eleftherios Avramidis

Ricardo Rei

Brian Thompson

Frédéric Blain

Tom Kocmi

Jiayi Wang

Marianna Buchicchio

Chrysoula Zerva

2024-01-01

Conference on Machine Translation (published)

Evaluating WMT 2024 Metrics Shared Task Submissions on AfriMTE (the African Challenge Set)

Jiayi Wang

Pontus Stenetorp

2024-01-01

Conference on Machine Translation (published)

Evaluating WMT 2024 Metrics Shared Task Submissions on AfriMTE (the African Challenge Set)

Jiayi Wang

Pontus Stenetorp

2024-01-01

Conference on Machine Translation (published)

Findings of the 2nd Shared Task on Multi-lingual Multi-task Information Retrieval at MRL 2024

Francesco Tinner

Raghav Mantri

Mammad Hajili

Chiamaka Ijeoma Chukwuneke

Dylan Massey

Benjamin A. Ajibade

Bilge Kocak

Abolade Dawud

Jonathan Atala

Hale Sirin

Kayode Olaleye

Anar Rzayev

Duygu Ataman

2024-01-01

MRL (published)

Findings of the 2nd Shared Task on Multi-lingual Multi-task Information Retrieval at MRL 2024

Francesco Tinner

Raghav Mantri

Mammad Hajili

Chiamaka Ijeoma Chukwuneke

Dylan Massey

Benjamin A. Ajibade

Bilge Kocak

Abolade Dawud

Jonathan Atala

Hale Sirin

Kayode Olaleye

Anar Rzayev

Duygu Ataman

Large language models (LLMs) demonstrate exceptional proficiency in both the comprehension and generation of textual data, particularly in E… (see more)nglish, a language for which extensive public benchmarks have been established across a wide range of natural language processing (NLP) tasks. Nonetheless, their performance in multilingual contexts and specialized domains remains less rigorously validated, raising questions about their reliability and generalizability across linguistically diverse and domain-specific settings. The second edition of the Shared Task on Multilingual Multitask Information Retrieval aims to provide a comprehensive and inclusive multilingual evaluation benchmark which aids assessing the ability of multilingual LLMs to capture logical, factual, or causal relationships within lengthy text contexts and generate language under sparse settings, particularly in scenarios with under-resourced languages. The shared task consists of two subtasks crucial to information retrieval: Named entity recognition (NER) and reading comprehension (RC), in 7 data-scarce languages: Azerbaijani, Swiss German, Turkish and , which previously lacked annotated resources in information retrieval tasks. This year specifally focus on the multiple-choice question answering evaluation setting which provides a more objective setting for comparing different methods across languages.

2024-01-01

MRL (published)

Findings of the Association for Computational Linguistics: NAACL 2024, Mexico City, Mexico, June 16-21, 2024

Mohamed Abdalla

Gavin Abercrombie