reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fast, Accurate Manifold Denoising by Tunneling Riemannian Optimization

Authors: Shiyu Wang, Mariam Avagyan, Yihan Shen, Arnaud Lamy, Tingran Wang, Szabolcs Marka, Zsuzsanna Marka, John Wright

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on scientific manifolds demonstrate significantly improved complexity-performance tradeoffs compared to nearest neighbor search, which underpins existing provable denoising approaches based on exhaustive search.
Researcher Affiliation	Academia	1Department of Electrical Engineering, Columbia University 2Data Science Institute, Columbia University 3Department of Computer Science, Columbia University 4Department of Physics 5Institute of Advanced Studies (i ASK), Chernel utca 14, K oszeg, 9730, Hungary 6Columbia Astrophysics Laboratory 7Department of Applied Physics and Applied Mathematics, Columbia University. Correspondence to: Shiyu Wang <EMAIL>.
Pseudocode	Yes	Algorithm 1 Manifold Traversal Algorithm 2 Online Learning For Manifold Traversal Algorithm 3 101Traversal Algorithm 4 Incr PCAon Matrix(X, d) Algorithm 5 Incr PCA(xi+1 q M, Ui, Λi, i + 1, d)
Open Source Code	Yes	The code for our framework and experiments is available at https://github.com/shiyu-w/Manifold_Traversal.
Open Datasets	Yes	We learn a denoiser on a dataset of 100,000 noisy gravitational waves (Abramovici et al., 1992; Aasi et al., 2015) using the online method as described in Algorithm 2. ...We evaluate our method on large-scale real-world image data by performing patch-level denoising. Specifically, we randomly select 300 RGB images from Image Net...We conduct an additional experiment to denoise a single natural image from the DIV2K dataset(Agustsson & Timofte, 2017)
Dataset Splits	Yes	The training set consists of 100,000 noisy waveforms, the test set contains 20,000 noisy waveforms. ...we use the first 890,000 patches to train our traversal network. ...After shuffling, we use the first 170,000 patches for training.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run its experiments. It mentions the dimensionality of data but not computing resources.
Software Dependencies	Yes	We generate synthetic gravitational waveforms with the Py CBC package (Nitz et al., 2023) with masses drawn from a Gaussian distribution with mean 35 and variance 15.
Experiment Setup	Yes	All autoencoders are trained using the Adam optimizer with a learning rate of 1 x 10^-3. As we can see in the Figure 13, high-complexity autoencoders can reach high accuracy. ...We simulate noise as i.i.d. Gaussian with standard deviation σ = 0.01...The parameter called denoising radius R(i) in Algorithm 2 controls complexity by determining the number of landmarks created. ...Table 1: The choice of hyperparameters yielding each denoiser. Ni corresponds to the number of points assigned to a landmark qi. For all experiments, σ = 0.01, d = 2,and D = 2048.