A Pseudo-Metric between Probability Distributions based on Depth-Trimmed Regions

Authors: Guillaume Staerman, Pavlo Mozharovskyi, Pierre Colombo, Stephan Clémençon, Florence d'Alché-Buc

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The quality of this approximation and the performance of the proposed approach are illustrated in numerical experiments. Applications to robust clustering of images and automatic evaluation of natural language generation (NLG) show the benefits of this approach when benchmarked with state-of-the-art probability metrics.
Researcher Affiliation Collaboration Guillaume Staerman EMAIL LTCI, Télécom Paris, Institut Polytechnique de Paris; Pavlo Mozharovskyi EMAIL LTCI, Télécom Paris, Institut Polytechnique de Paris; Pierre Colombo EMAIL Equall.ai and MICS, Centrale Supélec, Université Paris-Saclay; Stéphan Clémençon EMAIL LTCI, Télécom Paris, Institut Polytechnique de Paris; Florence d Alché-Buc EMAIL LTCI, Télécom Paris, Institut Polytechnique de Paris
Pseudocode Yes Algorithm 1 Approximation of DRp,ε; Algorithm 2 Approximation of the halfspace depth; Algorithm 3 Approximation of the projection depth; Algorithm 4 Approximation of the AI-IRW depth
Open Source Code No The text does not contain an explicit statement that the authors' code is released, nor does it provide a specific link to a code repository for the methodology described in the paper.
Open Datasets Yes The first dataset (FM) is constructed by taking the 100 first images in each class of the Fashion-MNIST dataset. We follow previous BERT-based metrics and evaluate performances of DRp,ε (with p = 2, ε = 0.01 and using the AI-IRW depth (Staerman et al., 2021b)) on two different NLG tasks namely: data2text generation (using the Web NLG 2020 dataset (Ferreira et al., 2020)) and summarization. We work with the dataset from Bhandari et al. (2020) for this task.
Dataset Splits No The paper discusses how two datasets were constructed from Fashion-MNIST (FM and Cont. FM) and how contamination was introduced (5% contamination for Cont. FM), but it does not specify explicit training, testing, or validation splits for these datasets or any other.
Hardware Specification Yes The authors thank the Jean Zay supercomputer operated by GENCI IDRIS with the compute grant 2023AD011014668R1 and Adastra with the grant AD010614770, where the NLP experiments have been done.
Software Dependencies No The paper mentions the use of 'scikit-learn spectral clustering implementation' and 'Roberta-based model from the Hugging Face hub (Wolf et al., 2019)', but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes We benchmark DRp,ε (using the projection depth) setting p = 2 and ε = 0.1 with the Wasserstein (W), the Sliced-Wasserstein (Sliced-W) and the Maximum Mean Discrepancy (MMD; Gretton et al., 2007) distances. DRp,ε and the Sliced-Wasserstein are approximated by Monte-Carlo using 100 directions while the MMD distance is computed using a Gaussian kernel with a bandwidth equal to 1. As a baseline method, spectral clustering is also applied to images considered as vectors using Euclidean distance. Standard parameters of the scikit-learn spectral clustering implementation are employed with a number of clusters fixed to 10. We follow previous BERT-based metrics and evaluate performances of DRp,ε (with p = 2, ε = 0.01 and using the AI-IRW depth (Staerman et al., 2021b)).