reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Clustering Items through Bandit Feedback: Finding the Right Feature out of Many

Authors: Maximilian Graf, Thuot Victor, Nicolas Verzelen

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We support our theoretical results with numerical experiments on synthetic data. In the first experiment, we investigate the sample complexity of our algorithm in sparse regimes. We compare its performance to a uniform sampling baseline, where each of the n d entries of the matrix M is sampled an equal number of times, and the resulting data is clustered using the K-means algorithm. In the second experiment, we evaluate the Bandit Clustering algorithm under fixed sparsity while simultaneously increasing the parameters n and d, and we compare different choices of the confidence parameter δ > 0. In a third experiment, we compare Bandit Clustering to Adaptive Clustering (Ariu et al., 2024) and see how a growing number of features impacts both algorithms performances. We defer a fourth experiment to Appendix B, which illustrates the influence of θ on the budget of Bandit Clustering.
Researcher Affiliation	Academia	1Institut f ur Mathematik, Universit at Potsdam, Potsdam, Germany 2INRAE, Mistea, Institut Agro, Univ Montpellier, Montpellier, France. Correspondence to: Maximilian Graf <EMAIL>, Victor Thuot <EMAIL>.
Pseudocode	Yes	Algorithm 1 Compare Sequential Halving (CSH) Require: r0 an item, I [n] 0 subset of items, L number of halving steps, T 2L+2 budget Ensure: a couple (i, j) I [d]... Algorithm 2 Candidate Row (CR) Require: confidence parameter δ > 0, item r0 Ensure: row index r1 [n]... Algorithm 3 Cluster By Candidates (CBC) Require: confidence δ > 0, representative items r0, r1 [n] Ensure: labels ˆg {0, 1}n... Algorithm 4 Bandit Clustering Require: confidence parameter δ > 0 Ensure: labels ˆg {0, 1}n
Open Source Code	Yes	The code we used for the simulations is available in a Git Hub repository4. 4https://github.com/grafmaxi/bandit_two_clusters
Open Datasets	No	We support our theoretical results with numerical experiments on synthetic data. ...We construct the matrix M s with half of its rows equal to 0 and the other half equal to s, and sample observations as νi,j N(M s i,j, 1).
Dataset Splits	No	We construct the matrix M s with half of its rows equal to 0 and the other half equal to s... We construct a matrix M (n) with n/2 rows equal to 0 and n/2 rows equal to (n)...
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments.
Software Dependencies	No	As a benchmark, we compare our algorithm to a strategy that samples uniformly all entries of M s, and then applies the KMeans algorithm from the Scikit-learn library (Pedregosa et al., 2011). ...We then ran the Adaptive Clustering algorithm using the MATLAB code provided by (Ariu et al., 2024)...
Experiment Setup	Yes	In this experiment, we consider a small number of items (n = 20) and a large number of features (d = 1000). For s 1, . . . , d, define the vector s = (hs, . . . , hs \| {z } s times , 0, . . . , 0), where hs = 15/ s. ...For δ = 0.8, we run each of the algorithms CR(δ/2), CBC(δ/2, i ) and Bandit Cluster(δ) a total of κ = 5000 times... We fix the number of items to n = 30 and vary the number of features as dγ = 2γ with γ = 1, . . . , 5. We set θ = 0.5, and for each dγ, we define sγ = dγ/2 features j such that Mi,j = 0.25 for items in the first cluster and Mi,j = 0.75 for items in the second cluster, yielding h = 0.5. For the remaining features, we set Mi,j = 0.5, independent of the item s cluster. Observations are sampled from Bernoulli distributions: νi,j = Bern(Mi,j). We apply Bandit Clustering with δ = 0.8 over κ = 500 runs. We then ran the Adaptive Clustering algorithm using the MATLAB code provided by (Ariu et al., 2024), under the same setting and a fixed budget of T = 400,000, also with κ = 500 runs.