Optimal Clustering with Bandit Feedback
Authors: Junwen Yang, Zixin Zhong, Vincent Y. F. Tan
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we study the empirical performance of our algorithm BOC and compare it with two baselines, namely Uniform and Oracle. Since our stopping rule is the only computationally tractable one available and the recommendation rule is embedded in either the sampling rule or the stopping rule, we focus on evaluating the efficacy of our sampling rule in terms of the time it takes for the algorithm(s) to stop. As we discussed in Remark 16, any sampling rule and recommendation rule can be combined with our stopping rule. Therefore, for the sake of fairness in comparison, Uniform and Oracle only differ from BOC in the sampling rules and retain the other frameworks. In particular, Uniform samples the M arms in a simple round-robin fashion while Oracle samples the arms based on the optimal oracle sampling rule λ (i.e., the estimate λ (t) in Line 8 of Algorithm 1 is replaced by λ , which is calculated with the unknown true pair (c, U)). In each setting, the reported sample complexities of different methods are averaged over 256 independent trials and the corresponding standard deviations are also shown as error bars or directly in the table. Finally, we mention that the partitions we learn in all our experiments are always correct. |
| Researcher Affiliation | Academia | Junwen Yang EMAIL Institute of Operations Research and Analytics National University of Singapore 117602, Singapore Zixin Zhong EMAIL Thrust of Data Science and Analytics Hong Kong University of Science and Technology (Guangzhou) 511453, Guangzhou, Guangdong, China Vincent Y. F. Tan EMAIL Department of Mathematics Department of Electrical and Computer Engineering Institute of Operations Research and Analytics National University of Singapore 119076, Singapore |
| Pseudocode | Yes | Algorithm 1 Bandit Online Clustering (BOC) Input: Number of clusters K, confidence level δ and arm set [M] 1: Sample each arm once, set t = M and initialize ˆµm(t) and Nm(t) = 1 for all m [M]. 3: if minm [M] Nm(t) max( t M/2, 0) then 4: Sample At+1 = arg minm [M] Nm(t), and (ct, Ut) (ct 1, Ut 1) Forced exploration 6: (ct, Ut) K-means Maximin(K, {ˆµm(t)}m [M], {Nm(t)}m [M]) Algorithm 2 7: Solve Proposition 8 λ (t) = arg max λ PM inf (c ,U ) Alt(ct) m=1 λm µt(ct m) µ (c m) 2 (4) 8: Sample At+1 = arg maxm [M](tλ m(t) Nm(t)) 10: t t + 1, update ˆµm(t) and Nm(t) for all m [M] 11: Compute m=1 Nm(t) ˆµm(t) µt 1(ct 1 m ) 2 and solve Proposition 5 Z2(t) = min (c ,U ) Alt(ct 1) m=1 Nm(t) µt 1(ct 1 m ) µ (c m) 2 13: until Z(t) β(δ, t) Output: τδ = t and cout = ct 1 Algorithm 2 Weighted K-means with Maximin Initialization (K-means Maximin) Input: Number of clusters K, empirical estimate ˆµm and weighting Nm for all m [M] 1: Choose the empirical estimate of an arbitrary arm as the first cluster center ˆµ(1) 2: for k = 2 to K do Maximin Initialization 3: Choose the empirical estimate of the arm that has the greatest Euclidean distance to the nearest existing center as the k-th center ˆµ(k): ˆµ(k) = arg max m [M] min 1 k k 1 ˆµm ˆµ(k ) 5: repeat Weighted K-means 6: Assign each arm to its closest cluster center: ˆcm = arg min k [K] ˆµm ˆµ(k) 7: Update each cluster center as the weighted mean of the empirical estimates of the arms in it: P m [M] Nmˆµm1{ˆcm = k} P m [M] Nm1{ˆcm = k} 8: until Clustering ˆc no longer changes 9: Set µout(k) = ˆµ(k) for all k [K] Output: cout = ˆc and Uout = [µout(1), µout(2), . . . , µout(K)] |
| Open Source Code | No | The paper does not contain an unambiguous statement about releasing code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | To complement the experiments on synthetic data and to verify that BOC also excels in the non-asymptotic regime, we conduct experiments on the Iris and Yeast datasets (Dua and Graff, 2017), both of which are ubiquitous in offline clustering and classification tasks. In Appendix E, we consider a dataset with a much higher dimensionality d, namely, the MNIST dataset (Le Cun et al., 1998). From Table 3 therein, we observe that the same conclusions apply. |
| Dataset Splits | No | The paper uses standard benchmark datasets (Iris, Yeast, MNIST) but does not explicitly describe how these datasets were split into training, validation, or test sets for their experiments, or if a specific split was used from a reference. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not mention any specific software or library versions used for implementation or experimentation (e.g., Python version, specific machine learning libraries like PyTorch or TensorFlow). |
| Experiment Setup | Yes | To study the asymptotic behavior of the expected sample complexities of different methods, we construct three synthetic instances with varying difficulty levels, where K = 4, M = 11 and d = 3. The partitions and the first three cluster centers of all the three instances are the same, while their fourth cluster centers vary. In particular, the three instances can be expressed as follows: c = [1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 4] µ(1) = [0, 0, 0] µ(2) = [0, 10, 0] µ(3) = [0, 0, 10] µ(4) = [5, 0, 0] for the easy instance, [1, 0, 0] for the moderate instance, [0.5, 0, 0] for the challenging instance. Moreover, motivated by Garivier and Kaufmann (2016), we also consider using a heuristic threshold function β(δ, t) := log((1 + log(t))d/δ), which is an approximation to the original threshold function β(δ, t) in Equation (8). To adapt these datasets to be amenable to online clustering tasks, we choose each cluster center to be the mean of the original data points of the arms in it, and then rescale the centers so that the hardness parameter D (c, U) is equal to 2 for both datasets. Since the performances of BOC and Oracle are similar (as observed in Section 6.1) and the heuristic threshold function β(δ, t) generally achieves lower sample complexities (compared to when β(t, δ) is used), we only present the results of the two methods (namely BOC and Uniform) with β(δ, t) on the two real datasets. |