Siva Reddy

Gaurav Kamath

Doctorat - McGill

Aditi Khandelwal

Doctorat - McGill

Superviseur⋅e principal⋅e :

Doctorat - McGill

Co-superviseur⋅e :

Timothy O'Donnell

Github

Aravind Krishnan

Collaborateur·rice alumni - UNIVERSITÄT DES SAARLANDES

Doctorat - McGill

Zichao Li

Doctorat - McGill

Co-superviseur⋅e :

Jackie Cheung

Xing Han Lu

Doctorat - McGill

Stagiaire de recherche - McGill

Doctorat - McGill

Postdoctorat - McGill

Site web

Arkil Patel

Doctorat - McGill

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche

Github

Karolina Ewa Stańczak

Collaborateur·rice alumni - McGill

Site web

Github

Comment expliquer l’IA et s’assurer que cette explication est vraie? Les modèles mesurables de fidélité vous indiquent comment y parvenir

Ada Tur

Stagiaire de recherche - McGill

Collaborateur·rice alumni - McGill

Billets de blogue

1 octobre 2024

par

Andrea Madsen

Siva Reddy

Sarath Chandar

Lire l'article

Publications

Learning Action and Reasoning-Centric Image Editing from Videos and Simulations

Benno Krojer

Dheeraj Vattikonda

Luis Lara

Varun Jampani

Eva Portelance

Chris Pal

An image editing model should be able to perform diverse edits, ranging from object replacement, changing attributes or style, to performing… (voir plus) actions or movement, which require many forms of reasoning. Current general instruction-guided editing models have significant shortcomings with action and reasoning-centric edits. Object, attribute or stylistic changes can be learned from visually static datasets. On the other hand, high-quality data for action and reasoning-centric edits is scarce and has to come from entirely different sources that cover e.g. physical dynamics, temporality and spatial reasoning. To this end, we meticulously curate the AURORA Dataset (Action-Reasoning-Object-Attribute), a collection of high-quality training data, human-annotated and curated from videos and simulation engines. We focus on a key aspect of quality training data: triplets (source image, prompt, target image) contain a single meaningful visual change described by the prompt, i.e., truly minimal changes between source and target images. To demonstrate the value of our dataset, we evaluate an AURORA-finetuned model on a new expert-curated benchmark (AURORA-Bench) covering 8 diverse editing tasks. Our model significantly outperforms previous editing models as judged by human raters. For automatic evaluations, we find important flaws in previous metrics and caution their use for semantically hard editing tasks. Instead, we propose a new automatic metric that focuses on discriminative understanding. We hope that our efforts : (1) curating a quality training dataset and an evaluation benchmark, (2) developing critical evaluations, and (3) releasing a state-of-the-art model, will fuel further progress on general image editing.

2024-07-03

ArXiv (prépublication)

Reframing linguistic bootstrapping as joint inference using visually-grounded grammar induction models

Eva Portelance

Timothy John O'donnell

Semantic and syntactic bootstrapping posit that children use their prior knowledge of one linguistic domain, say syntactic relations, to hel… (voir plus)p later acquire another, such as the meanings of new words. Empirical results supporting both theories may tempt us to believe that these are different learning strategies, where one may precede the other. Here, we argue that they are instead both contingent on a more general learning strategy for language acquisition: joint learning. Using a series of neural visually-grounded grammar induction models, we demonstrate that both syntactic and semantic bootstrapping effects are strongest when syntax and semantics are learnt simultaneously. Joint learning results in better grammar induction, realistic lexical category learning, and better interpretations of novel sentence and verb meanings. Joint learning makes language acquisition easier for learners by mutually constraining the hypotheses spaces for both syntax and semantics. Studying the dynamics of joint inference over many input sources and modalities represents an important new direction for language modeling and learning research in both cognitive sciences and AI, as it may help us explain how language can be acquired in more constrained learning settings.

2024-06-17

ArXiv (prépublication)

Evaluating In-Context Learning of Libraries for Code Generation

Arkil Patel

Dzmitry Bahdanau

Pradeep Dasigi

2024-06-01

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) (publié)

Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering

Vaibhav Adlakha

Parishad BehnamGhader

Xing Han Lu

Nicholas Meade

Retriever-augmented instruction-following models are attractive alternatives to fine-tuned approaches for information-seeking tasks such as … (voir plus)question answering (QA). By simply prepending retrieved documents in its input along with an instruction, these models can be adapted to various information domains and tasks without additional fine-tuning. While the model responses tend to be natural and fluent, the additional verbosity makes traditional QA evaluation metrics such as exact match (EM) and F1 unreliable for accurately quantifying model performance. In this work, we investigate the performance of instruction-following models across three information-seeking QA tasks. We use both automatic and human evaluation to evaluate these models along two dimensions: 1) how well they satisfy the user's information need (correctness), and 2) whether they produce a response based on the provided knowledge (faithfulness). Guided by human evaluation and analysis, we highlight the shortcomings of traditional metrics for both correctness and faithfulness. We then propose simple token-overlap based and model-based metrics that reflect the true performance of these models. Our analysis reveals that instruction-following models are competitive, and sometimes even outperform fine-tuned models for correctness. However, these models struggle to stick to the provided knowledge and often hallucinate in their responses. We hope our work encourages a more holistic evaluation of instruction-following models for QA. Our code and data is available at https://github.com/McGill-NLP/instruct-qa

2024-05-16

Transactions of the Association for Computational Linguistics (publié)

Interpretability Needs a New Paradigm

Andreas Madsen

Himabindu Lakkaraju

Sarath Chandar

2024-05-08

ArXiv (prépublication)

Faithfulness Measurable Masked Language Models

Andreas Madsen

Sarath Chandar

2024-05-01

ICML.cc/2024/Conference (spotlight)

openreview.net

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Xing Han Lu

Zdeněk Kasner

2024-05-01

ICML.cc/2024/Conference (spotlight)

openreview.net

Investigating Adversarial Trigger Transfer in Large Language Models

Nicholas Meade

Arkil Patel

2024-04-24

ArXiv (prépublication)

Investigating Adversarial Trigger Transfer in Large Language Models

Nicholas Meade

Arkil Patel

2024-04-24

ArXiv (prépublication)

Universal Adversarial Triggers Are Not Universal

Nicholas Meade

Arkil Patel

2024-04-24

ArXiv (prépublication)

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Parishad BehnamGhader

Vaibhav Adlakha

Marius Mosbach

Dzmitry Bahdanau

Nicolas Chapados

Large decoder-only language models (LLMs) are the state-of-the-art models on most of today's NLP tasks and benchmarks. Yet, the community is… (voir plus) only slowly adopting these models for text embedding tasks, which require rich contextualized representations. In this work, we introduce LLM2Vec, a simple unsupervised approach that can transform any decoder-only LLM into a strong text encoder. LLM2Vec consists of three simple steps: 1) enabling bidirectional attention, 2) masked next token prediction, and 3) unsupervised contrastive learning. We demonstrate the effectiveness of LLM2Vec by applying it to 4 popular LLMs ranging from 1.3B to 8B parameters and evaluate the transformed models on English word- and sequence-level tasks. We outperform encoder-only models by a large margin on word-level tasks and reach a new unsupervised state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB). Moreover, when combining LLM2Vec with supervised contrastive learning, we achieve state-of-the-art performance on MTEB among models that train only on publicly available data (as of May 24, 2024). Our strong empirical results and extensive analysis demonstrate that LLMs can be effectively transformed into universal text encoders in a parameter-efficient manner without the need for expensive adaptation or synthetic GPT-4 generated data.

2024-04-09

ArXiv (prépublication)

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Parishad BehnamGhader

Vaibhav Adlakha

Marius Mosbach

Dzmitry Bahdanau

Nicolas Chapados