reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fast Min-$ε$ Segmented Regression using Constant-Time Segment Merging

Authors: Ansgar Lößer, Max Schlecht, Florian Schintke, Joel Witzke, Matthias Weidlich, Björn Scheuermann

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the efficiency and effectiveness of our solution through a series of experiments (Sect. 6). For datasets exceeding 104 samples, our technique runs about two orders of magnitude faster than best current methods. At the same time, accuracy is greatly improved, reducing the MSE by up to three orders of magnitude in comparison, resulting in an error averaging just 3 % above the optimal solution.
Researcher Affiliation	Academia	1TU Darmstadt, Germany 2Humboldt-Universit at zu Berlin, Germany 3Zuse Institute Berlin, Germany. Correspondence to: Ansgar L oßer <EMAIL>.
Pseudocode	No	The paper describes the greedy algorithm's phases (Initial placement, Segment reduction, Placement optimization) in textual paragraphs within Section 4, but it does not present a formal pseudocode block or an algorithm figure.
Open Source Code	Yes	Details, measurement data, source code, experiment setup, our analysis pipeline, and an additional evaluation regarding parameter d are included in the supplemental material of this paper2. 2The supplemental material is available at https:// github.com/Loesgar/mvsr/tree/paper-icml-25.
Open Datasets	Yes	To evaluate our algorithm, we used time series data showing CPU usage during the 43-hour execution of the bwa program (Li, 2013) from sarek workflow (Garcia et al., 2020).
Dataset Splits	No	The paper uses synthetic data and real-world time series data but does not specify explicit training/validation/test splits, percentages, or methodology for partitioning the data for reproducibility.
Hardware Specification	Yes	All experiments were done on an AMD Ryzen Threadripper PRO 5955WX system with Ubuntu 24.04 LTS.
Software Dependencies	No	The dataset generation and measurement code use Python. The optimal dynamic program and heuristic were taken from Acharya et al. (2016). For exact control of data structures, we implemented our approach in C++.
Experiment Setup	Yes	Akin to Acharya et al. (2016), we generate piecewise continuous function with k = 6 segments for evaluation at n points. Five random positions are chosen as segment start points between 0 and n. The first segment starts at 0, and the last ends at n. At each joint, we randomly select a y value between 0 and 1. Segments interpolate linearly from their start to end points. We take evenly spaced points from a function, add Gaussian noise using two fixed standard deviation (σ) values, and measure the time needed for a k-segmented regression. We calculate the MSE using the true values from the original function, not the noisy data.