reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Assisted Learning for Organizations with Limited Imbalanced Data

Authors: Cheng Chen, Jiaying Zhou, Jie Ding, Yi Zhou

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on deep learning and reinforcement learning tasks, we demonstrate that the learner can achieve a near-oracle performance with Assist Deep and Assist PG as if the model was trained on centralized data (i.e., the learner has full access to the provider’s raw data). In particular, as the learner data’s level of imbalance increases, Assist Deep can help the learner achieve a higher performance gain.
Researcher Affiliation	Academia	Cheng Chen EMAIL Department of Electrical and Computer Engineering University of Utah Salt Lake City, UT 84112 Jiaying Zhou EMAIL School of Statistics University of Minnesota Minneapolis, MN 55455 Jie Ding EMAIL School of Statistics University of Minnesota Minneapolis, MN 55455 Yi Zhou EMAIL Department of Electrical and Computer Engineering University of Utah Salt Lake City, UT 84112
Pseudocode	Yes	Algorithm 1 Assist Deep Input: Initialization model θ0, learning rate η, assistance rounds R, number of local iterations T (for learner) and T (for provider). for assistance rounds r = 1, . . . , R do ... Algorithm 2 Assist PG Input: Initialization model θ0, learning rate η, assistance rounds R, number of local iterations T (for learner) and T (for provider). for assistance rounds r = 1, . . . , R do ...
Open Source Code	No	The paper does not provide concrete access to source code for the methodology. It only provides a Dropbox link for videos related to the experiments, not the code itself: "The video version for the Cart Pole and Lunar Lander games can be accessed from the anonymous link https://www.dropbox.com/ sh/oz2jswj36li4lkh/AADa Qn4Nj67v9md IHKDLN6n Aa?dl=0."
Open Datasets	Yes	We consider two popular datasets CIFAR-10 (Krizhevsky, 2009) and SVHN. ... We demonstrate the effectiveness of Assist PG via two RL applications: Cart Pole (Barto et al., 1983) and Lunar Lander (Brockman et al., 2016). ...the learner samples data from the CIFAR-10 dataset and the provider samples data from the CINIC-10 dataset (Darlow et al., 2018).
Dataset Splits	Yes	We test the performance of Assist Deep by comparing it with two baselines: SGD (using centralized data D(L,P)) and Learner-SGD (using only the learner’s data D(L)). We consider two popular datasets CIFAR-10 (Krizhevsky, 2009) and SVHN. ... To test Assist Deep, we split the classification dataset into two parts and assign them to the learner and the provider, respectively. Specifically, the provider’s data consists of half of the total samples of each classification class, and therefore is balanced and of a large size. On the other hand, the learner’s data is sampled from the rest half of the data (excluding the provider’s data)... We plot the training loss (on centralized data D(L,P)) and the test accuracy (on the 10k test data) against the number of assistance rounds.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only discusses the experimental setup at a software and algorithmic level.
Software Dependencies	No	The paper mentions using "Adam optimizer" and "SGD optimizer" but does not specify versions for any programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We implement all these algorithms using the Adam optimizer with learning rate 0.001 and batch size 256 to train Alex Net and Res Net-18 on CIFAR-10 and SVHN, respectively. ... We fix the number of assistance rounds to be 10. The total number of local optimization iterations in each assistance round is fixed to be 2000. ... We assign these iteration budget to the learner and provider in proportion to their local sample sizes. Both the learner and provider record their local training models and local loss values for every I = 50 iteration, which is referred to as the sampling period. ... All these algorithms are implemented with the learning rate 5e-3 and episode batch size 32. We model the policy using a three-layer feed-forward neural network with 4 and 32 hidden neurons for Cart Pole and Lunar Lander, respectively. Moreover, for Assist PG, we fix the total number of assistance rounds to be 10 and 5 for Cart Pole and Lunar Lander, respectively. The total number of local PG iterations in each assistance round is fixed to be 20 for both problems. We also set the sampling period to be four, namely, the learner and the provider record their local model and discounted training reward for every four local PG iterations.