Learning-Augmented Frequent Directions
Authors: Anders Aamand, Justin Chen, Siddharth Gollapudi, Sandeep Silwal, Hao WU
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally verify the performance of our learning-augmented algorithms on real data. Following prior work, we consider datasets containing numerous problem instances in a temporal order (each instance being either a sequence of items for frequency estimation or a sequence of matrices for matrix streaming). Using predictions trained on past data, we demonstrate the power of incorporating learned structure in our algorithms, achieving state-of-the-art performance in both settings. |
| Researcher Affiliation | Academia | Anders Aamand University of Copenhagen EMAIL Justin Y. Chen MIT EMAIL Siddharth Gollapudi Independent EMAIL Sandeep Silwal UW-Madison EMAIL Hao WU University of Waterloo EMAIL |
| Pseudocode | Yes | Algorithm 1 Frequent Direction AFD Algorithm 2 Learning-Augmented Frequent Direction ALFD Algorithm 3 Robust Learning-based Frequent Direction ARLFD |
| Open Source Code | No | Both implementations are based on an existing implementation of Frequent Directions4. 4https://github.com/edoliberty/frequent-directions The paper refers to an existing, third-party implementation that was used as a base, but does not explicitly state that the authors' specific code for the described methodology is open-sourced. |
| Open Datasets | Yes | We use datasets from Indyk et al. (2019) and Indyk et al. (2021), prior works on learning-based low rank approximation not in the streaming setting. The Hyper dataset (Imamoglu et al., 2018) contains a sequence of hyperspectral images of natural scenes. We test our algorithm and baselines on the CAIDA (CAIDA, 2016) and AOL (Pass et al., 2006) datasets used in prior work (Hsu et al., 2019; Aamand et al., 2023). |
| Dataset Splits | No | The paper discusses evaluating performance on sequences of streams or matrices over time, but does not provide traditional training/test/validation splits with percentages or sample counts. For example: "The CAIDA dataset contains 50 minutes of internet traffic data, with a stream corresponding to the IP addressed associated with packets passing through an ISP over a minute of data." and "The AOL dataset contains 80 days of internet search query data with each stream (corresponding to a day)." |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models. It only mentions funding sources for the authors. |
| Software Dependencies | No | The paper mentions various algorithms like Count Min, Count Sketch, and Misra-Gries, and states that recurrent neural networks were used as predictors, but it does not specify any software libraries or packages with version numbers used for implementation. |
| Experiment Setup | Yes | For both Frequent Directions and our learned variant, the space used by the streaming algorithm is twice the rank: this is a choice made in the Frequent Directions implementation to avoid running SVD on every insertion and thus improve the update time. We use half of the space for the learned projection component and half for the orthogonal component in our algorithm. For both datasets, we compare the learning-augmented algorithms by plotting the tradeoff between median error and space as well as error across the sequence of streams for a fixed space of 750 (see Figure 2). Randomized algorithms are averaged across 10 trials and one standard deviation is shaded. |