reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimal Clustering with Bandit Feedback

Authors: Junwen Yang, Zixin Zhong, Vincent Y. F. Tan

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we study the empirical performance of our algorithm BOC and compare it with two baselines, namely Uniform and Oracle. Since our stopping rule is the only computationally tractable one available and the recommendation rule is embedded in either the sampling rule or the stopping rule, we focus on evaluating the eﬃcacy of our sampling rule in terms of the time it takes for the algorithm(s) to stop. As we discussed in Remark 16, any sampling rule and recommendation rule can be combined with our stopping rule. Therefore, for the sake of fairness in comparison, Uniform and Oracle only diﬀer from BOC in the sampling rules and retain the other frameworks. In particular, Uniform samples the M arms in a simple round-robin fashion while Oracle samples the arms based on the optimal oracle sampling rule λ (i.e., the estimate λ (t) in Line 8 of Algorithm 1 is replaced by λ , which is calculated with the unknown true pair (c, U)). In each setting, the reported sample complexities of diﬀerent methods are averaged over 256 independent trials and the corresponding standard deviations are also shown as error bars or directly in the table. Finally, we mention that the partitions we learn in all our experiments are always correct.
Researcher Affiliation	Academia	Junwen Yang EMAIL Institute of Operations Research and Analytics National University of Singapore 117602, Singapore Zixin Zhong EMAIL Thrust of Data Science and Analytics Hong Kong University of Science and Technology (Guangzhou) 511453, Guangzhou, Guangdong, China Vincent Y. F. Tan EMAIL Department of Mathematics Department of Electrical and Computer Engineering Institute of Operations Research and Analytics National University of Singapore 119076, Singapore
Pseudocode	Yes	Algorithm 1 Bandit Online Clustering (BOC) Input: Number of clusters K, conﬁdence level δ and arm set [M] 1: Sample each arm once, set t = M and initialize ˆµm(t) and Nm(t) = 1 for all m [M]. 3: if minm [M] Nm(t) max( t M/2, 0) then 4: Sample At+1 = arg minm [M] Nm(t), and (ct, Ut) (ct 1, Ut 1) Forced exploration 6: (ct, Ut) K-means Maximin(K, {ˆµm(t)}m [M], {Nm(t)}m [M]) Algorithm 2 7: Solve Proposition 8 λ (t) = arg max λ PM inf (c ,U ) Alt(ct) m=1 λm µt(ct m) µ (c m) 2 (4) 8: Sample At+1 = arg maxm [M](tλ m(t) Nm(t)) 10: t t + 1, update ˆµm(t) and Nm(t) for all m [M] 11: Compute m=1 Nm(t) ˆµm(t) µt 1(ct 1 m ) 2 and solve Proposition 5 Z2(t) = min (c ,U ) Alt(ct 1) m=1 Nm(t) µt 1(ct 1 m ) µ (c m) 2 13: until Z(t) β(δ, t) Output: τδ = t and cout = ct 1 Algorithm 2 Weighted K-means with Maximin Initialization (K-means Maximin) Input: Number of clusters K, empirical estimate ˆµm and weighting Nm for all m [M] 1: Choose the empirical estimate of an arbitrary arm as the ﬁrst cluster center ˆµ(1) 2: for k = 2 to K do Maximin Initialization 3: Choose the empirical estimate of the arm that has the greatest Euclidean distance to the nearest existing center as the k-th center ˆµ(k): ˆµ(k) = arg max m [M] min 1 k k 1 ˆµm ˆµ(k ) 5: repeat Weighted K-means 6: Assign each arm to its closest cluster center: ˆcm = arg min k [K] ˆµm ˆµ(k) 7: Update each cluster center as the weighted mean of the empirical estimates of the arms in it: P m [M] Nmˆµm1{ˆcm = k} P m [M] Nm1{ˆcm = k} 8: until Clustering ˆc no longer changes 9: Set µout(k) = ˆµ(k) for all k [K] Output: cout = ˆc and Uout = [µout(1), µout(2), . . . , µout(K)]
Open Source Code	No	The paper does not contain an unambiguous statement about releasing code or a link to a code repository for the described methodology.
Open Datasets	Yes	To complement the experiments on synthetic data and to verify that BOC also excels in the non-asymptotic regime, we conduct experiments on the Iris and Yeast datasets (Dua and Graﬀ, 2017), both of which are ubiquitous in oﬄine clustering and classiﬁcation tasks. In Appendix E, we consider a dataset with a much higher dimensionality d, namely, the MNIST dataset (Le Cun et al., 1998). From Table 3 therein, we observe that the same conclusions apply.
Dataset Splits	No	The paper uses standard benchmark datasets (Iris, Yeast, MNIST) but does not explicitly describe how these datasets were split into training, validation, or test sets for their experiments, or if a specific split was used from a reference.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies	No	The paper does not mention any specific software or library versions used for implementation or experimentation (e.g., Python version, specific machine learning libraries like PyTorch or TensorFlow).
Experiment Setup	Yes	To study the asymptotic behavior of the expected sample complexities of diﬀerent methods, we construct three synthetic instances with varying diﬃculty levels, where K = 4, M = 11 and d = 3. The partitions and the ﬁrst three cluster centers of all the three instances are the same, while their fourth cluster centers vary. In particular, the three instances can be expressed as follows: c = [1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 4] µ(1) = [0, 0, 0] µ(2) = [0, 10, 0] µ(3) = [0, 0, 10] µ(4) = [5, 0, 0] for the easy instance, [1, 0, 0] for the moderate instance, [0.5, 0, 0] for the challenging instance. Moreover, motivated by Garivier and Kaufmann (2016), we also consider using a heuristic threshold function β(δ, t) := log((1 + log(t))d/δ), which is an approximation to the original threshold function β(δ, t) in Equation (8). To adapt these datasets to be amenable to online clustering tasks, we choose each cluster center to be the mean of the original data points of the arms in it, and then rescale the centers so that the hardness parameter D (c, U) is equal to 2 for both datasets. Since the performances of BOC and Oracle are similar (as observed in Section 6.1) and the heuristic threshold function β(δ, t) generally achieves lower sample complexities (compared to when β(t, δ) is used), we only present the results of the two methods (namely BOC and Uniform) with β(δ, t) on the two real datasets.