reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning to Match Unpaired Data with Minimum Entropy Coupling

Authors: Mustapha Bounoua, Giulio Franzese, Pietro Michiardi

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate that our method, DDMEC, is general and can be easily used to address challenging tasks, including unsupervised single-cell multi-omics data alignment and unpaired image translation, outperforming specialized methods. Section 4. Experiments
Researcher Affiliation	Collaboration	1Ampere Software Technology, France 2Department of Data Science, EURECOM, France. Correspondence to: <EMAIL>.
Pseudocode	Yes	Algorithm 1 DDMEC Training Loop Input: θ , ϕ Initialize θ θ , ϕ ϕ repeat Call Algorithm 2 with y p Y , θ, θ , ϕ Call Algorithm 2 with x p X, ϕ, ϕ , θ until Converged Algorithm 2 DDMEC Training Step Input: y, θ, θ , ϕ x pθ X\|Y =y, t U[0, T], ϵ N(0, I) Update θ using Equations (10) and (11) Update ϕ using ϕEyt,t ϵ ϵϕ(yt, x, t) 2
Open Source Code	Yes	The source 1 is publicly available. 1https://github.com/Mustapha Bounoua/ddmec
Open Datasets	Yes	We evaluate our method on single-cell multi-omics datasets: the peripheral blood mononuclear cells (PBMC) dataset and the bone marrow (BM) dataset. We use the AFHQ (Choi et al., 2020) dataset Furthermore, we employ the CELEBA-HQ (Karras, 2017) dataset In this experiment we use the SNARESEQ (Chen et al., 2019) dataset
Dataset Splits	Yes	For both datasets, we adopt the data preprocessing and evaluation pipeline described in (Singh et al., 2023), resulting in 50-dimensional embeddings per modality. We adopt the same experimental validation protocol as described by Zhao et al. (2022), where all images are resized to a resolution of 256 256. In this experiment we use the SNARESEQ (Chen et al., 2019) dataset, which links chromatin accessibility with gene expression data on a mixture of four cells types. We use the same preprocessing procedures detailed in (Demetci et al., 2022), which deal with filtering spurious data affected by technical errors, and normalization.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running experiments.
Software Dependencies	No	The paper mentions the use of an "Adam optimizer (Kingma, 2014)" and references to various models (DDPM, DDIM sampler), but it does not specify version numbers for any software libraries, frameworks, or programming languages.
Experiment Setup	Yes	For each training step, we use a batch size of 256 and perform four gradient updates corresponding to line 2, followed by four updates for Line 2. We use a simple MLP network with skip connections and use the Adam optimizer (Kingma, 2014) with a learning rate of 1 10 4. The KL divergence regularization weight is set to λ = 0.01 for PBMC and λ = 0.02 for BM. Table 3: Hyperparameters used for training.