Fast Min-$ε$ Segmented Regression using Constant-Time Segment Merging

Authors: Ansgar Lößer, Max Schlecht, Florian Schintke, Joel Witzke, Matthias Weidlich, Björn Scheuermann

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the efficiency and effectiveness of our solution through a series of experiments (Sect. 6). For datasets exceeding 104 samples, our technique runs about two orders of magnitude faster than best current methods. At the same time, accuracy is greatly improved, reducing the MSE by up to three orders of magnitude in comparison, resulting in an error averaging just 3 % above the optimal solution.
Researcher Affiliation Academia 1TU Darmstadt, Germany 2Humboldt-Universit at zu Berlin, Germany 3Zuse Institute Berlin, Germany. Correspondence to: Ansgar L oßer <EMAIL>.
Pseudocode No The paper describes the greedy algorithm's phases (Initial placement, Segment reduction, Placement optimization) in textual paragraphs within Section 4, but it does not present a formal pseudocode block or an algorithm figure.
Open Source Code Yes Details, measurement data, source code, experiment setup, our analysis pipeline, and an additional evaluation regarding parameter d are included in the supplemental material of this paper2. 2The supplemental material is available at https:// github.com/Loesgar/mvsr/tree/paper-icml-25.
Open Datasets Yes To evaluate our algorithm, we used time series data showing CPU usage during the 43-hour execution of the bwa program (Li, 2013) from sarek workflow (Garcia et al., 2020).
Dataset Splits No The paper uses synthetic data and real-world time series data but does not specify explicit training/validation/test splits, percentages, or methodology for partitioning the data for reproducibility.
Hardware Specification Yes All experiments were done on an AMD Ryzen Threadripper PRO 5955WX system with Ubuntu 24.04 LTS.
Software Dependencies No The dataset generation and measurement code use Python. The optimal dynamic program and heuristic were taken from Acharya et al. (2016). For exact control of data structures, we implemented our approach in C++.
Experiment Setup Yes Akin to Acharya et al. (2016), we generate piecewise continuous function with k = 6 segments for evaluation at n points. Five random positions are chosen as segment start points between 0 and n. The first segment starts at 0, and the last ends at n. At each joint, we randomly select a y value between 0 and 1. Segments interpolate linearly from their start to end points. We take evenly spaced points from a function, add Gaussian noise using two fixed standard deviation (σ) values, and measure the time needed for a k-segmented regression. We calculate the MSE using the true values from the original function, not the noisy data.