Portrait of Marco Pedersoli

Marco Pedersoli

Affiliate Member
Associate Professor, École de technologie suprérieure
Research Topics
Building Energy Management Systems
Computer Vision
Deep Learning
Generalization
Generative Models
Multimodal Learning
Representation Learning
Robustness
Satellite Imagery
Vision and Language
Weak Supervision

Biography

I am an Associate Professor at ÉTS Montreal, a member of LIVIA (le Laboratoire d'Imagerie, Vision et Intelligence Artificielle), and part of the International Laboratory of Learning Systems (ILLS). I am also a member of ELLIS, the European network of excellence in AI. Since 2021, I have co-held the Distech Industrial Research Chair on Embedded Neural Networks for Connected Building Control.

My research centers on Deep Learning methods and algorithms, with a focus on visual recognition, and the automatic interpretation and understanding of images and videos. A key objective of my work is to advance machine intelligence by minimizing two critical factors: computational load and the need for human supervision. These reductions are essential for scalable AI, enabling more efficient, adaptive, and embedded systems. In my recent work, I have contributed to developing neural networks for smart buildings, integrating AI-driven solutions to enhance energy efficiency and comfort in intelligent environments.

Publications

Distilling semantically aware orders for autoregressive image generation
Rishav Pramanik
Antoine Poupon
Juan A. Rodriguez
Masih Aminbeidokhti
David Vazquez
Zhaozheng Yin
Distilling semantically aware orders for autoregressive image generation
Rishav Pramanik
Antoine Poupon
Juan A. Rodriguez
Masih Aminbeidokhti
David Vazquez
Zhaozheng Yin
Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation
Muhammad Haseeb Aslam
Clara Martinez
Alessandro Lameiras Koerich
Ali Etemad
Eric Granger
StarVector: Generating Scalable Vector Graphics Code from Images and Text
Juan A. Rodriguez
Abhay Puri
Shubham Agarwal
Issam Hadj Laradji
Pau Rodriguez
Sai Rajeswar
David Vazquez
Progressive Multi-Source Domain Adaptation for Personalized Facial Expression Recognition
Muhammad Osama Zeeshan
Alessandro Lameiras Koerich
Eric Grange
Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation
Muhammad Haseeb Aslam
Clara Martinez
Alessandro Lameiras Koerich
Ali Etemad
Eric Granger
Advances in self-distillation have shown that when knowledge is distilled from a teacher to a student using the same deep learning (DL) arch… (see more)itecture, the student performance can surpass the teacher particularly when the network is overparameterized and the teacher is trained with early stopping. Alternatively, ensemble learning also improves performance, although training, storing, and deploying multiple models becomes impractical as the number of models grows. Even distilling an ensemble to a single student model or weight averaging methods first requires training of multiple teacher models and does not fully leverage the inherent stochasticity for generating and distilling diversity in DL models. These constraints are particularly prohibitive in resource-constrained or latency-sensitive applications such as wearable devices. This paper proposes to train only one model and generate multiple diverse teacher representations using distillation-time dropout. However, generating these representations stochastically leads to noisy representations that are misaligned with the learned task. To overcome this problem, a novel stochastic self-distillation (SSD) training strategy is introduced for filtering and weighting teacher representation to distill from task-relevant representations only, using student-guided knowledge distillation (SGKD). The student representation at each distillation step is used as authority to guide the distillation process. Experimental results on real-world affective computing, wearable/biosignal datasets from the UCR Archive, the HAR dataset, and image classification datasets show that the proposed SSD method can outperform state-of-the-art methods without increasing the model size at both training and testing time, and incurs negligible computational complexity compared to state-of-the-art ensemble learning and weight averaging methods.
Progressive Multi-Source Domain Adaptation for Personalized Facial Expression Recognition
Muhammad Osama Zeeshan
Alessandro Lameiras Koerich
Eric Grange
Disentangled Source-Free Personalization for Facial Expression Recognition with Neutral Target Data
Masoumeh Sharafi
Emma Ollivier
Muhammad Osama Zeeshan
Soufiane Belharbi
Alessandro Lameiras Koerich
Simon Bacon
Eric Granger
Disentangled Source-Free Personalization for Facial Expression Recognition with Neutral Target Data
Masoumeh Sharafi
Emma Ollivier
Muhammad Osama Zeeshan
Soufiane Belharbi
Alessandro Lameiras Koerich
Simon Bacon
Eric Granger
Attention-based Class-Conditioned Alignment for Multi-Source Domain Adaptation of Object Detectors
Atif Belal
Akhil Meethal
Francisco Perdigon Romero
Eric Granger
Domain adaptation methods for object detection (OD) strive to mitigate the impact of distribution shifts by promoting feature alignment acro… (see more)ss source and target domains. Multi-source domain adaptation (MSDA) allows leveraging multiple annotated source datasets and unlabeled target data to improve the accuracy and robustness of the detection model. Most state-of-the-art MSDA methods for OD perform feature alignment in a class-agnostic manner. This is challenging since the objects have unique modality information due to variations in object appearance across domains. A recent prototype-based approach proposed a class-wise alignment, yet it suffers from error accumulation caused by noisy pseudo-labels that can negatively affect adaptation with imbalanced data. To overcome these limitations, we propose an attention-based class-conditioned alignment method for MSDA, designed to align instances of each object category across domains. In particular, an attention module combined with an adversarial domain classifier allows learning domain-invariant and class-specific instance representations. Experimental results on multiple benchmarking MSDA datasets indicate that our method outperforms state-of-the-art methods and exhibits robustness to class imbalance, achieved through a conceptually simple class-conditioning strategy. Our code is available at: https://github.com/imatif17/ACIA.
Mixed Patch Visible-Infrared Modality Agnostic Object Detection
Heitor Rapela Medeiros
David Latortue
Eric Granger
In real-world scenarios, using multiple modalities like visible (RGB) and infrared (IR) can greatly improve the performance of a predictive … (see more)task such as object detection (OD). Multimodal learning is a common way to leverage these modalities, where multiple modality-specific encoders and a fusion module are used to improve performance. In this paper, we tackle a different way to employ RGB and IR modalities, where only one modality or the other is observed by a single shared vision encoder. This realistic setting requires a lower memory footprint and is more suitable for applications such as autonomous driving and surveillance, which commonly rely on RGB and IR data. However, when learning a single encoder on multiple modalities, one modality can dominate the other, producing un-even recognition results. This work investigates how to efficiently leverage RGB and IR modalities to train a common transformer-based OD vision encoder while countering the effects of modality imbalance. For this, we introduce a novel training technique to Mix Patches (MiPa)from the two modalities, in conjunction with a patch-wise modality agnostic module, for learning a common representation of both modalities. Our experiments show that MiPa can learn a representation to reach competitive results on traditional RGB/IR benchmarks while only requiring a single modality during inference. Our code is available at: https://github.com/heitorrapela/MiPa.
A Realistic Protocol for Evaluation of Weakly Supervised Object Localization
Shakeeb Murtaza
Soufiane Belharbi
Eric Granger
Weakly Supervised Object Localization (WSOL) allows training deep learning models for classification and localization (LOC) using only globa… (see more)l class-level labels. The absence of bounding box (bbox) supervision during training raises challenges in the literature for hyper-parameter tuning, model selection, and evaluation. WSOL methods rely on a validation set with bbox annotations for model selection, and a test set with bbox annotations for threshold estimation for producing bboxes from localization maps. This approach, however, is not aligned with the WSOL setting as these annotations are typically unavailable in real-world scenarios. Our initial empirical analysis shows a significant decline in LOC performance when model selection and threshold estimation rely solely on class labels and the image itself, respectively, compared to using manual bbox annotations. This highlights the importance of incorporating bbox labels for optimal model performance. In this paper, a new WSOL evaluation protocol is proposed that provides LOC information without the need for manual bbox annotations. In particular, we generated noisy pseudo-boxes from a pretrained off-the-shelf region proposal method such as Selective Search, CLIP, and RPN for model selection. These bboxes are also employed to estimate the threshold from LOC maps, circumventing the need for test-set bbox annotations. Our experiments with several WSOL methods on ILSVRC and CUB datasets show that using the proposed pseudo-bboxes for validation facilitates the model selection and threshold estimation, with LOC performance comparable to those selected using GT bboxes on the validation set and threshold estimation on the test set. It also outperforms models selected using class-level labels, and then dynamically thresholded based solely on LOC maps.