Publications

GalLoP: Learning Global and Local Prompts for Vision-Language Models

Published in European Conference on Computer Vision (ECCV 2024), 2024

Prompt learning has been widely adopted to efficiently adapt vision-language models (VLMs), e.g. CLIP, for few-shot image classification. Despite their success, most prompt learning methods trade-off between classification accuracy and robustness, e.g. in domain generalization or out-of-distribution (OOD) detection. In this work, we introduce Global-Local Prompts (GalLoP), a new prompt learning method that learns multiple diverse prompts leveraging both global and local visual features. The training of the local prompts relies on local features with an enhanced vision-text alignment. To focus only on pertinent features, this local alignment is coupled with a sparsity strategy in the selection of the local features. We enforce diversity on the set of prompts using a new “prompt dropout” technique and a multiscale strategy on the local prompts. GalLoP outperforms previous prompt learning methods on accuracy on eleven datasets in different few shots settings and with various backbones. Furthermore, GalLoP shows strong robustness performances in both domain generalization and OOD detection, even outperforming dedicated OOD detection methods. Code and instructions to reproduce our results will be open-sourced..

Recommended citation: Marc Lafon, Elias Ramzi, Clément Rambour, Nicolas Audebert, Nicolas Thome: GalLoP: Learning Global and Local Prompts for Vision-Language Models. European Conference on Computer Vision (ECCV 2024). https://arxiv.org/pdf/2407.01400v1

Optimization of Rank Losses for Image Retrieval

Published in Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence (under-review TPAMI), 2024

In image retrieval, standard evaluation metrics rely on score ranking, e.g. average precision (AP), recall at k (R@k), normalized discounted cumulative gain (NDCG). In this work we introduce a general framework for robust and decomposable rank losses optimization. It addresses two major challenges for end-to-end training of deep neural networks with rank losses: non-differentiability and non-decomposability. Firstly we propose a general surrogate for ranking operator, SupRank, that is amenable to stochastic gradient descent. It provides an upperbound for rank losses and ensures robust training. Secondly, we use a simple yet effective loss function to reduce the decomposability gap between the averaged batch approximation of ranking losses and their values on the whole training set. We apply our framework to two standard metrics for image retrieval: AP and R@k. Additionally we apply our framework to hierarchical image retrieval. We introduce an extension of AP, the hierarchical average precision H-AP, and optimize it as well as the NDCG. Finally we create the first hierarchical landmarks retrieval dataset. We use a semi-automatic pipeline to create hierarchical labels, extending the large scale Google Landmarks v2 dataset. The hierarchical dataset is publicly available at github.com/cvdfoundation/google-landmark. Code will be released at github.com/elias-ramzi/SupRank.

Recommended citation: Elias Ramzi, Nicolas Audebert, Clément Rambour, André Araujo, Xavier Bitot, Nicolas Thome: Optimization of Rank Losses for Image Retrieval. Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence (under-review TPAMI). https://arxiv.org/abs/2309.08250

Hybrid Energy Based Model in the Feature Space for Out-of-Distribution Detection

Published in International Conference on Machine Learning (ICML 2023), 2023

Out-of-distribution (OOD) detection is a critical requirement for the deployment of deep neural networks. This paper introduces the HEAT model, a new post-hoc OOD detection method estimating the density of in-distribution (ID) samples using hybrid energy-based models (EBM) in the feature space of a pre-trained backbone. HEAT complements prior density estimators of the ID density, e.g. parametric models like the Gaussian Mixture Model (GMM), to provide an accurate yet robust density estimation. A second contribution is to leverage the EBM framework to provide a unified density estimation and to compose several energy terms. Extensive experiments demonstrate the significance of the two contributions. HEAT sets new state-of-the-art OOD detection results on the CIFAR-10 / CIFAR-100 benchmark as well as on the large-scale Imagenet benchmark. The code is available at: github.com/MarcLafon/heatood.

Recommended citation: Marc Lafon, Elias Ramzi, Clément Rambour, Nicolas Thome: Hybrid Energy Based Model in the Feature Space for Out-of-Distribution Detection. International Conference on Machine Learning (ICML 2023). https://arxiv.org/pdf/2305.16966.pdf

Hierarchical Average Precision Training for Pertinent Image Retrieval

Published in European Conference on Computer Vision (ECCV 2022), 2022

Image Retrieval is commonly evaluated with Average Precision (AP) or Recall@k. Yet, those metrics, are limited to binary labels and do not take into account errors’ severity. This paper introduces a new hierarchical AP training method for pertinent image retrieval (HAPPIER). HAPPIER is based on a new H-AP metric, which leverages a concept hierarchy to refine AP by integrating errors’ importance and better evaluate rankings. To train deep models with H-AP, we carefully study the problem’s structure and design a smooth lower bound surrogate combined with a clustering loss that ensures consistent ordering. Extensive experiments on 6 datasets show that HAPPIER significantly outperforms state-of-the-art methods for hierarchical retrieval, while being on par with the latest approaches when evaluating fine-grained ranking performances. Finally, we show that HAPPIER leads to better organization of the embedding space, and prevents most severe failure cases of non-hierarchical methods. Our code is publicly available at github.com/elias-ramzi/HAPPIER.

Recommended citation: Elias Ramzi, Nicolas Audebert, Nicolas Thome, Clément Rambour, Xavier Bitot: RHierarchical Average Precision Training for Pertinent Image Retrieval. In: European Conference on Computer Vision. Springer (ECCV, 2022). https://arxiv.org/abs/2207.04873

Robust and Decomposable Average Precision for Image Retrieval

Published in Advances in Neural Information Processing Systems 34 (NeurIPS 2021), 2021

In image retrieval, standard evaluation metrics rely on score ranking, e.g. average precision (AP). In this paper, we introduce a method for robust and decomposable average precision (ROADMAP) addressing two major challenges for end-to-end training of deep neural networks with AP: non-differentiability and non-decomposability. Firstly, we propose a new differentiable approximation of the rank function, which provides an upper bound of the AP loss and ensures robust training. Secondly, we design a simple yet effective loss function to reduce the decomposability gap between the AP in the whole training set and its averaged batch approximation, for which we provide theoretical guarantees. Extensive experiments conducted on three image retrieval datasets show that ROADMAP outperforms several recent AP approximation methods and highlight the importance of our two contributions. Finally, using ROADMAP for training deep models yields very good performances, outperforming state-of-the-art results on the three datasets. Code and instructions to reproduce our results will be made publicly available at github.com/elias-ramzi/ROADMAP.

Recommended citation: Elias Ramzi, Nicolas Thome, Clément Rambour, Nicolas Audebert, Xavier Bitot: Robust and decomposable average precision for image retrieval. Advances in Neural Information Processing Systems 34 (NeurIPS, 2021). https://arxiv.org/abs/2110.01445