Learning-Augmented Hierarchical Clustering

Authors: Vladimir Braverman, Jon C. Ergun, Chen Wang, Samson Zhou

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our main results are algorithms that given learning-augmented oracles, compute efficient approximate HC trees for the celebrated Dasgupta s and Moseley-Wang objectives that overcome known hardness barriers. ... This paper presents work that advances the Machine Learning and Algorithm Design fields. Due to its theoretical nature, we do not see any immediate societal impacts
Researcher Affiliation Collaboration 1Johns Hopkins University 2Rice University 3Google Research 4Carnegie Mellon University 5Texas A&M University. Correspondence to: Vladimir Braverman <EMAIL>, Jon C. Ergun <EMAIL>, Chen Wang <EMAIL>, Samson Zhou <EMAIL>.
Pseudocode Yes Algorithm 1 A polynomial-time algorithm for the Dasgupta s HC objective ... Algorithm 2 An O(n3) time algorithm for the Dasgupta s HC objective ... Algorithm 3 A near-linear time algorithm for the Moseley Wang HC objective ... Algorithm 4 A polynomial-time single-pass semi-streaming algorithm for the Dasgupta s HC objective ... Algorithm 5 A near-linear work, poly-logarithmic depth PRAM algorithm for the Moseley-Wang HC objective ... Algorithm 6 counterpart-tester-strong ... Algorithm 7 strong-partial-tree O(e V ) ... Algorithm 8 weak-partial-tree O(e V , VH) ... Algorithm 9 counterpart-tester(O,VH) ... Algorithm 10 predecessor-tester(O,VH) ... Algorithm 11 test-orphan-predecessor(O,VH,e V ) ... Algorithm 12 vertex-split O(e V , VH) ... Algorithm 13 tree-merge O(TT , Te V \T , u ).
Open Source Code No The paper does not explicitly state that source code for the described methodology is publicly available, nor does it provide any links to a code repository.
Open Datasets No The paper discusses hierarchical clustering with 'n data points' and refers to 'datasets' in a general, theoretical context (e.g., 'datasets often have additional auxiliary information'). However, it does not describe using any specific, named datasets for experiments or provide concrete access information for any dataset.
Dataset Splits No The paper is theoretical in nature and does not conduct empirical studies using specific datasets. Therefore, no information on training/test/validation dataset splits is provided.
Hardware Specification No The paper is theoretical and focuses on algorithm design and approximation guarantees. It does not describe any experimental hardware used.
Software Dependencies No The paper is theoretical and focuses on algorithm design. It does not mention any specific software dependencies or versions used for implementation or experimentation.
Experiment Setup No The paper is theoretical and presents algorithms and their approximation guarantees. It does not include an experimental setup with hyperparameters or system-level training settings.