Variance-Reducing Couplings for Random Features
Authors: Isaac Reid, Stratis Markou, Krzysztof Choromanski, Richard E Turner, Adrian Weller
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our algorithms on UCI datasets and real-world graphs, verifying that OT couplings substantially reduce kernel estimator variance (Secs 3 and 4). |
| Researcher Affiliation | Collaboration | 1University of Cambridge, 2Google Deep Mind, 3Columbia, 4Alan Turing Institute |
| Pseudocode | No | The paper describes methods and mathematical formulations in prose and equations, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at: https://github.com/cambridge-mlg/learnable-qmc. |
| Open Datasets | Yes | We test our algorithms on UCI datasets and real-world graphs... Table 1: Performance of RFFs and RLFs on kernel estimation with UCI datasets... Fig. 1 demonstrates this on a train-test split of the POWER dataset... Performers trained on Image Net (Deng et al., 2009)... Fig. 3 shows the results for cora (N = 2708)... We use mesh graphs made available by Dawson-Haggerty (2023)... traffic flow dataset of the highways of San Jose, California, curated by Borovitskiy et al. (2021) using data from Chen et al. (2001) and Open Street Map. |
| Dataset Splits | Yes | For each dataset, we conduct cross validation with 20 splits, splitting each dataset into a training and a test set... We set them to a maximum of 256 points each by sub-sampling at random without replacement. After training the GP, we evaluate the metrics on the test set, and repeat this procedure for all 20 splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Adam optimiser (Kingma and Ba, 2014)' but does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, or specific library versions). |
| Experiment Setup | Yes | We train the exact GP using the Adam optimiser (Kingma and Ba, 2014), using a learning rate of 10^-2. The exact GP optimisation stage converges around 1000 steps, and we run it up to 5000 steps. We use a transformer with 12 layers and 12 heads, with hidden size 768 and MLP dimension 3072. We take 16x16 patches, and train with the Adam optimiser for 90 epochs with a compound learning rate (10^4 steps linear warmup, constant, then cosine decay, with base LR 3e-3 and final LR 1e-5). The batch size is 4096. |