FLUID: A Unified Evaluation Framework for Flexible Sequential Data

Authors: Matthew Wallingford, Aditya Kusupati, Keivan Alizadeh-Vahid, Aaron Walsman, Aniruddha Kembhavi, Ali Farhadi

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on a broad set of methods which shed new insight on the advantages and limitations of current techniques and indicate new research problems to solve. As a starting point towards more general methods, we present two new baselines which outperform other evaluated methods on Fluid.
Researcher Affiliation Collaboration Matthew Wallingford EMAIL University of Washington Aditya Kusupati EMAIL University of Washington Keivan Alizadeh EMAIL University of Washington Aaron Walsman EMAIL University of Washington Aniruddha Kembhavi EMAIL Allen Institute for Artificial Intelligence Ali Farhadi EMAIL University of Washington
Pseudocode Yes Algorithm 1 Fluid Procedure Input: Task T Input: ML sys.: (pretrained) model f, update strategy S Output: Evaluations: E, Operation Counter: C 1: function Fluid(T , (f, S)) 2: Evaluations E = [ ] 3: Datapoints D = [ ] 4: Operation Counter C = 0.
Open Source Code No The framework, data and models will be open-sourced.
Open Datasets Yes Data In this paper, we evaluate methods with Fluid using a subset of Image Net-22K (Deng et al., 2009). Traditionally, few-shot learning used datasets like Omniglot (Lake et al., 2011) & Mini Imagenet (Vinyals et al., 2016) and continual learning focused on MNIST (Le Cun, 1998) & CIFAR (Krizhevsky et al., 2009). ... We evaluate on the Image Net-22K dataset to present new challenges to existing models. Recently, the INaturalist (Van Horn et al., 2018; Wertheimer & Hariharan, 2019) and LVIS (Gupta et al., 2019) datasets have advocated for heavy-tailed distributions.
Dataset Splits Yes The dataset consists of a pretraining dataset and 5 different sequences of images for streaming (3 test and 2 validation sequences). For pretraining we use the standard Image Net-1K (Russakovsky et al., 2015). ... Each test sequence contains images from 1000 different classes, 750 of which do not appear in Image Net-1K. ... Each sequence contains 90000 samples, where head classes contain > 50 and tail classes contain 50 samples. ... More comprehensive statistics on the data and sequences are in Appendix B. Table 4: Statistics for the sequences of images used in Fluid. Sequences 1-2 are for validation and Sequence 3-5 are for testing.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models or processor types used for running its experiments. It mentions 'Res Net18 and Res Net50 models' but these refer to model architectures, not hardware.
Software Dependencies No We use the Py Torch (Paszke et al., 2019) Res Net18 and Res Net50 models pretrained on supervised Image Net-1K.
Experiment Setup Yes For all experiments (Table 3) that require offline training (fine-tuning, Weight Imprinting, standard training, ET and Lw F), except OLTR, we train each model for 4 epochs every 5,000 samples observed. ... Fine-tuning experiments use a learning rate of 0.1 and standard training uses 0.01 for supervised pretraining. For Mo Co pretraining fine-tuning uses a learning rate of 30 and standard training uses 0.01. All the experiments use the SGD+Momentum optimizer with a 0.9 momentum. For Prototypical-Networks and MAML we meta-train from scratch with the n-shot k-way paradigm. We use 5-shot 30-way in accordance with the original works (Snell et al., 2017) (Finn et al., 2017). We trained according We meta-train for 100 epochs with a learning rate of 0.01 and reduce it by 0.5 every 40 epochs.