Approximately Equivariant Neural Processes
Authors: Matthew Ashman, Cristiana Diaconu, Adrian Weller, Wessel Bruinsma, Richard Turner
NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our approach on a number of synthetic and real-world regression experiments, showing that approximately equivariant NP models can outperform both their non-equivariant and strictly equivariant counterparts. |
| Researcher Affiliation | Collaboration | Matthew Ashman University of Cambridge EMAIL Cristiana Diaconu University of Cambridge EMAIL Adrian Weller University of Cambridge The Alan Turing Institute EMAIL Wessel Bruinsma Microsoft Research AI for Science EMAIL Richard E. Turner University of Cambridge Microsoft Research AI for Science The Alan Turing Institute EMAIL |
| Pseudocode | Yes | Algorithm 1: Forward pass through the Conv CNP (T) for off-the-grid data. |
| Open Source Code | Yes | An implementation of our models can be found at cambridge-mlg/aenp. |
| Open Datasets | Yes | We consider a synthetic 1-D regression task with datasets drawn from a Gaussian process (GP) with the Gibbs kernel [Gibbs, 1998]. derived from ERA5 [Copernicus Climate Change Service, 2020], consisting of surface air temperatures for the years 2018 and 2019. |
| Dataset Splits | Yes | For each task, we sample the number of context points Nc U{1, 64} and set the number of target points to Nt = 128. The context range [xc,min, xc,max] (from which the context points are uniformly sampled) is an interval of length 4, with its centre randomly sampled according to U[ 7,7] for the ID task, and according to U[13,27] for the OOD task. The target range is [xt,min, xt,max] = [xc,min 1, xc,max + 1]. This is also applicable during testing, with the test dataset consisting of 80,000 datasets. |
| Hardware Specification | Yes | We train and evaluate all models on a single 11 GB NVIDIA Ge Force RTX 2080 Ti GPU. |
| Software Dependencies | No | The paper mentions 'Adam W [Loshchilov and Hutter, 2017]' as the optimizer but does not specify versions for core software libraries like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | For all models, we optimise the model parameters using Adam W [Loshchilov and Hutter, 2017] with a learning rate of 5 10 4 and batch size of 16. Gradient value magnitudes are clipped at 0.5. We train for a maximum of 500 epochs, with each epoch consisting of 16,000 datasets (10,000 iterations per epoch). |