reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Transport with Support: Data-Conditional Diffusion Bridges

Authors: Ella Tamir, Martin Trapp, Arno Solin

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We assess the effectiveness of our method on synthetic and real-world data generation tasks and we show that the ISB generalises well to high-dimensional data, is computationally efficient, and provides accurate estimates of the marginals at intermediate and terminal times.
Researcher Affiliation	Academia	Ella Tamir EMAIL Department of Computer Science Aalto University
Pseudocode	Yes	We present a high-level description of the ISB steps in Alg. 1. Algorithm 1 The Iterative Smoothing Bridge
Open Source Code	Yes	A reference implementation of the ISB model can be found at https://github.com/Aalto ML/ iterative-smoothing-bridge.
Open Datasets	Yes	We assess the effectiveness of our method on synthetic and real-world data generation tasks... 2D toy experiments from scikit-learn... by adapting data from Ambrosini et al. (2014) and Pellegrino et al. (2015), we propose a simplified data set for geese migration in Europe (OIBMD: ornithologically implausible bird migration data; available in the supplement)... We modify the diffusion generative process of the MNIST (Le Cun et al., 1998) digit 8... Lastly, we evaluated our approach on an Embryoid body sc RNA-seq time course (Tong et al., 2020).
Dataset Splits	No	The paper uses datasets for generative tasks and observations/constraints rather than traditional machine learning train/test/validation splits. For example, for the single-cell embryo RNA-seq data: "used the first and last time ranges as the initial and terminal constraints. All other time ranges are considered observational data." This describes how data is used, but not conventional splits for model evaluation.
Hardware Specification	Yes	All low-dimensional (at most d = 5) experiments were run on a Mac Book Pro laptop CPU, whereas the image experiments used a single NVIDIA A100 GPU and ran for 5 h 10 min.
Software Dependencies	No	The paper thanks Adrien Corenflos for sharing an implementation of differentiable resampling in Py Torch, indicating PyTorch was used. However, specific version numbers for PyTorch or any other software dependencies are not provided in the text.
Experiment Setup	Yes	In all experiments, the forward and backward drift functions fθ and bφ are parametrized as neural networks... The latent state SDE was simulated by Euler Maruyama with a fixed time-step of 0.01 over 100 steps and 1000 particles if not otherwise stated... All three experiments had the same discretization (t [0, 0.99]), k = 0.01), learning rate 0.001, and differentiable resampling regularization parameter ε = 0.01. The process noise g(t)2 follows a linear schedule from 0.001 to 1... and each iteration of the ISB method trains the forward and backward drift networks each for 5000 iterations, with batch size 256. Other hyperparameters specific to each experiment are provided in Appendix B.