Learning to Match Unpaired Data with Minimum Entropy Coupling

Authors: Mustapha Bounoua, Giulio Franzese, Pietro Michiardi

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that our method, DDMEC, is general and can be easily used to address challenging tasks, including unsupervised single-cell multi-omics data alignment and unpaired image translation, outperforming specialized methods. Section 4. Experiments
Researcher Affiliation Collaboration 1Ampere Software Technology, France 2Department of Data Science, EURECOM, France. Correspondence to: <EMAIL>.
Pseudocode Yes Algorithm 1 DDMEC Training Loop Input: θ , ϕ Initialize θ θ , ϕ ϕ repeat Call Algorithm 2 with y p Y , θ, θ , ϕ Call Algorithm 2 with x p X, ϕ, ϕ , θ until Converged Algorithm 2 DDMEC Training Step Input: y, θ, θ , ϕ x pθ X|Y =y, t U[0, T], ϵ N(0, I) Update θ using Equations (10) and (11) Update ϕ using ϕEyt,t ϵ ϵϕ(yt, x, t) 2
Open Source Code Yes The source 1 is publicly available. 1https://github.com/Mustapha Bounoua/ddmec
Open Datasets Yes We evaluate our method on single-cell multi-omics datasets: the peripheral blood mononuclear cells (PBMC) dataset and the bone marrow (BM) dataset. We use the AFHQ (Choi et al., 2020) dataset Furthermore, we employ the CELEBA-HQ (Karras, 2017) dataset In this experiment we use the SNARESEQ (Chen et al., 2019) dataset
Dataset Splits Yes For both datasets, we adopt the data preprocessing and evaluation pipeline described in (Singh et al., 2023), resulting in 50-dimensional embeddings per modality. We adopt the same experimental validation protocol as described by Zhao et al. (2022), where all images are resized to a resolution of 256 256. In this experiment we use the SNARESEQ (Chen et al., 2019) dataset, which links chromatin accessibility with gene expression data on a mixture of four cells types. We use the same preprocessing procedures detailed in (Demetci et al., 2022), which deal with filtering spurious data affected by technical errors, and normalization.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running experiments.
Software Dependencies No The paper mentions the use of an "Adam optimizer (Kingma, 2014)" and references to various models (DDPM, DDIM sampler), but it does not specify version numbers for any software libraries, frameworks, or programming languages.
Experiment Setup Yes For each training step, we use a batch size of 256 and perform four gradient updates corresponding to line 2, followed by four updates for Line 2. We use a simple MLP network with skip connections and use the Adam optimizer (Kingma, 2014) with a learning rate of 1 10 4. The KL divergence regularization weight is set to λ = 0.01 for PBMC and λ = 0.02 for BM. Table 3: Hyperparameters used for training.