Portrait of Marco Pedersoli

Marco Pedersoli

Affiliate Member
Associate Professor, École de technologie suprérieure
Research Topics
Building Energy Management Systems
Computer Vision
Deep Learning
Generalization
Generative Models
Multimodal Learning
Representation Learning
Robustness
Satellite Imagery
Vision and Language
Weak Supervision

Biography

I am an Associate Professor at ÉTS Montreal, a member of LIVIA (le Laboratoire d'Imagerie, Vision et Intelligence Artificielle), and part of the International Laboratory of Learning Systems (ILLS). I am also a member of ELLIS, the European network of excellence in AI. Since 2021, I have co-held the Distech Industrial Research Chair on Embedded Neural Networks for Connected Building Control.

My research centers on Deep Learning methods and algorithms, with a focus on visual recognition, and the automatic interpretation and understanding of images and videos. A key objective of my work is to advance machine intelligence by minimizing two critical factors: computational load and the need for human supervision. These reductions are essential for scalable AI, enabling more efficient, adaptive, and embedded systems. In my recent work, I have contributed to developing neural networks for smart buildings, integrating AI-driven solutions to enhance energy efficiency and comfort in intelligent environments.

Publications

Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Juan A. Rodriguez
Haotian Zhang
Abhay Puri
Aarash Feizi
Rishav Pramanik
Pascal Wichmann
Arnab Mondal
Mohammad Reza Samsami
Rabiul Awal
Perouz Taslakian
Spandana Gella
Sai Rajeswar
David Vazquez
Scalable Vector Graphics (SVG) offer a powerful format for representing visual designs as interpretable code. Recent advances in vision-lang… (see more)uage models (VLMs) have enabled high-quality SVG generation by framing the problem as a code generation task and leveraging large-scale pretraining. VLMs are particularly suitable for this task as they capture both global semantics and fine-grained visual patterns, while transferring knowledge across vision, natural language, and code domains. However, existing VLM approaches often struggle to produce faithful and efficient SVGs because they never observe the rendered images during training. Although differentiable rendering for autoregressive SVG code generation remains unavailable, rendered outputs can still be compared to original inputs, enabling evaluative feedback suitable for reinforcement learning (RL). We introduce RLRF(Reinforcement Learning from Rendering Feedback), an RL method that enhances SVG generation in autoregressive VLMs by leveraging feedback from rendered SVG outputs. Given an input image, the model generates SVG roll-outs that are rendered and compared to the original image to compute a reward. This visual fidelity feedback guides the model toward producing more accurate, efficient, and semantically coherent SVGs. RLRF significantly outperforms supervised fine-tuning, addressing common failure modes and enabling precise, high-quality SVG generation with strong structural understanding and generalization.
BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Behavioural Change
Manuela Gonz'alez-Gonz'alez
Soufiane Belharbi
Muhammad Osama Zeeshan
Masoumeh Sharafi
Muhammad Haseeb Aslam
Alessandro Lameiras Koerich
Simon Bacon
Eric Granger
Recognizing complex emotions linked to ambivalence and hesitancy (A/H) can play a critical role in the personalization and effectiveness of … (see more)digital behaviour change interventions. These subtle and conflicting emotions are manifested by a discord between multiple modalities, such as facial and vocal expressions, and body language. Although experts can be trained to identify A/H, integrating them into digital interventions is costly and less effective. Automatic learning systems provide a cost-effective alternative that can adapt to individual users, and operate seamlessly within real-time, and resource-limited environments. However, there are currently no datasets available for the design of ML models to recognize A/H. This paper introduces a first Behavioural Ambivalence/Hesitancy (BAH) dataset collected for subject-based multimodal recognition of A/H in videos. It contains videos from 224 participants captured across 9 provinces in Canada, with different age, and ethnicity. Through our web platform, we recruited participants to answer 7 questions, some of which were designed to elicit A/H while recording themselves via webcam with microphone. BAH amounts to 1,118 videos for a total duration of 8.26 hours with 1.5 hours of A/H. Our behavioural team annotated timestamp segments to indicate where A/H occurs, and provide frame- and video-level annotations with the A/H cues. Video transcripts and their timestamps are also included, along with cropped and aligned faces in each frame, and a variety of participants meta-data. We include results baselines for BAH at frame- and video-level recognition in multi-modal setups, in addition to zero-shot prediction, and for personalization using unsupervised domain adaptation. The limited performance of baseline models highlights the challenges of recognizing A/H in real-world videos. The data, code, and pretrained weights are available.
Distilling semantically aware orders for autoregressive image generation
Rishav Pramanik
Antoine Poupon
Juan A. Rodriguez
Masih Aminbeidokhti
David Vazquez
Zhaozheng Yin
Distilling semantically aware orders for autoregressive image generation
Rishav Pramanik
Antoine Poupon
Juan A. Rodriguez
Masih Aminbeidokhti
David Vazquez
Zhaozheng Yin
Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation
Muhammad Haseeb Aslam
Clara Martinez
Alessandro Lameiras Koerich
Ali Etemad
Eric Granger
StarVector: Generating Scalable Vector Graphics Code from Images and Text
Juan A. Rodriguez
Abhay Puri
Shubham Agarwal
Issam Hadj Laradji
Pau Rodriguez
Sai Rajeswar
David Vazquez
Progressive Multi-Source Domain Adaptation for Personalized Facial Expression Recognition
Muhammad Osama Zeeshan
Alessandro Lameiras Koerich
Eric Grange
Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation
Muhammad Haseeb Aslam
Clara Martinez
Alessandro Lameiras Koerich
Ali Etemad
Eric Granger
Advances in self-distillation have shown that when knowledge is distilled from a teacher to a student using the same deep learning (DL) arch… (see more)itecture, the student performance can surpass the teacher particularly when the network is overparameterized and the teacher is trained with early stopping. Alternatively, ensemble learning also improves performance, although training, storing, and deploying multiple models becomes impractical as the number of models grows. Even distilling an ensemble to a single student model or weight averaging methods first requires training of multiple teacher models and does not fully leverage the inherent stochasticity for generating and distilling diversity in DL models. These constraints are particularly prohibitive in resource-constrained or latency-sensitive applications such as wearable devices. This paper proposes to train only one model and generate multiple diverse teacher representations using distillation-time dropout. However, generating these representations stochastically leads to noisy representations that are misaligned with the learned task. To overcome this problem, a novel stochastic self-distillation (SSD) training strategy is introduced for filtering and weighting teacher representation to distill from task-relevant representations only, using student-guided knowledge distillation (SGKD). The student representation at each distillation step is used as authority to guide the distillation process. Experimental results on real-world affective computing, wearable/biosignal datasets from the UCR Archive, the HAR dataset, and image classification datasets show that the proposed SSD method can outperform state-of-the-art methods without increasing the model size at both training and testing time, and incurs negligible computational complexity compared to state-of-the-art ensemble learning and weight averaging methods.
Progressive Multi-Source Domain Adaptation for Personalized Facial Expression Recognition
Muhammad Osama Zeeshan
Alessandro Lameiras Koerich
Eric Grange
Disentangled Source-Free Personalization for Facial Expression Recognition with Neutral Target Data
Masoumeh Sharafi
Emma Ollivier
Muhammad Osama Zeeshan
Soufiane Belharbi
Alessandro Lameiras Koerich
Simon Bacon
Eric Granger
Disentangled Source-Free Personalization for Facial Expression Recognition with Neutral Target Data
Masoumeh Sharafi
Emma Ollivier
Muhammad Osama Zeeshan
Soufiane Belharbi
Alessandro Lameiras Koerich
Simon Bacon
Eric Granger
Attention-based Class-Conditioned Alignment for Multi-Source Domain Adaptation of Object Detectors
Atif Belal
Akhil Meethal
Francisco Perdigon Romero
Eric Granger
Domain adaptation methods for object detection (OD) strive to mitigate the impact of distribution shifts by promoting feature alignment acro… (see more)ss source and target domains. Multi-source domain adaptation (MSDA) allows leveraging multiple annotated source datasets and unlabeled target data to improve the accuracy and robustness of the detection model. Most state-of-the-art MSDA methods for OD perform feature alignment in a class-agnostic manner. This is challenging since the objects have unique modality information due to variations in object appearance across domains. A recent prototype-based approach proposed a class-wise alignment, yet it suffers from error accumulation caused by noisy pseudo-labels that can negatively affect adaptation with imbalanced data. To overcome these limitations, we propose an attention-based class-conditioned alignment method for MSDA, designed to align instances of each object category across domains. In particular, an attention module combined with an adversarial domain classifier allows learning domain-invariant and class-specific instance representations. Experimental results on multiple benchmarking MSDA datasets indicate that our method outperforms state-of-the-art methods and exhibits robustness to class imbalance, achieved through a conceptually simple class-conditioning strategy. Our code is available at: https://github.com/imatif17/ACIA.