FLUID: A Unified Evaluation Framework for Flexible Sequential Data
Authors: Matthew Wallingford, Aditya Kusupati, Keivan Alizadeh-Vahid, Aaron Walsman, Aniruddha Kembhavi, Ali Farhadi
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on a broad set of methods which shed new insight on the advantages and limitations of current techniques and indicate new research problems to solve. As a starting point towards more general methods, we present two new baselines which outperform other evaluated methods on Fluid. |
| Researcher Affiliation | Collaboration | Matthew Wallingford EMAIL University of Washington Aditya Kusupati EMAIL University of Washington Keivan Alizadeh EMAIL University of Washington Aaron Walsman EMAIL University of Washington Aniruddha Kembhavi EMAIL Allen Institute for Artificial Intelligence Ali Farhadi EMAIL University of Washington |
| Pseudocode | Yes | Algorithm 1 Fluid Procedure Input: Task T Input: ML sys.: (pretrained) model f, update strategy S Output: Evaluations: E, Operation Counter: C 1: function Fluid(T , (f, S)) 2: Evaluations E = [ ] 3: Datapoints D = [ ] 4: Operation Counter C = 0. |
| Open Source Code | No | The framework, data and models will be open-sourced. |
| Open Datasets | Yes | Data In this paper, we evaluate methods with Fluid using a subset of Image Net-22K (Deng et al., 2009). Traditionally, few-shot learning used datasets like Omniglot (Lake et al., 2011) & Mini Imagenet (Vinyals et al., 2016) and continual learning focused on MNIST (Le Cun, 1998) & CIFAR (Krizhevsky et al., 2009). ... We evaluate on the Image Net-22K dataset to present new challenges to existing models. Recently, the INaturalist (Van Horn et al., 2018; Wertheimer & Hariharan, 2019) and LVIS (Gupta et al., 2019) datasets have advocated for heavy-tailed distributions. |
| Dataset Splits | Yes | The dataset consists of a pretraining dataset and 5 different sequences of images for streaming (3 test and 2 validation sequences). For pretraining we use the standard Image Net-1K (Russakovsky et al., 2015). ... Each test sequence contains images from 1000 different classes, 750 of which do not appear in Image Net-1K. ... Each sequence contains 90000 samples, where head classes contain > 50 and tail classes contain 50 samples. ... More comprehensive statistics on the data and sequences are in Appendix B. Table 4: Statistics for the sequences of images used in Fluid. Sequences 1-2 are for validation and Sequence 3-5 are for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models or processor types used for running its experiments. It mentions 'Res Net18 and Res Net50 models' but these refer to model architectures, not hardware. |
| Software Dependencies | No | We use the Py Torch (Paszke et al., 2019) Res Net18 and Res Net50 models pretrained on supervised Image Net-1K. |
| Experiment Setup | Yes | For all experiments (Table 3) that require offline training (fine-tuning, Weight Imprinting, standard training, ET and Lw F), except OLTR, we train each model for 4 epochs every 5,000 samples observed. ... Fine-tuning experiments use a learning rate of 0.1 and standard training uses 0.01 for supervised pretraining. For Mo Co pretraining fine-tuning uses a learning rate of 30 and standard training uses 0.01. All the experiments use the SGD+Momentum optimizer with a 0.9 momentum. For Prototypical-Networks and MAML we meta-train from scratch with the n-shot k-way paradigm. We use 5-shot 30-way in accordance with the original works (Snell et al., 2017) (Finn et al., 2017). We trained according We meta-train for 100 epochs with a learning rate of 0.01 and reduce it by 0.5 every 40 epochs. |