reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A General Framework for Adversarial Label Learning

Authors: Chidubem Arachie, Bert Huang

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that our method can train without labels and outperforms other approaches for weakly supervised learning.
Researcher Affiliation	Academia	Chidubem Arachie EMAIL Department of Computer Science Virginia Tech Blacksburg, VA 24061, USA Bert Huang EMAIL Department of Computer Science Data Intensive Studies Center Tufts University Medford, MA 02155, USA
Pseudocode	Yes	Algorithm 1 Adversarial Label Learning and Algorithm 2 Multiclass Adversarial Label Learning
Open Source Code	No	The paper does not provide an explicit statement or link to open-source code for the methodology described.
Open Datasets	Yes	Fashion-MNIST (Xiao et al., 2017), Breast Cancer (Blake and Merz, 1998; Street et al., 1993), OBS Network (Rajab et al., 2016), Cardiotocography (Ayres-de Campos et al., 2000), Clave Direction (Vurkaç, 2011), Credit Card (Blake and Merz, 1998), Statlog Satellite (Blake and Merz, 1998), Phishing Websites (Mohammad et al., 2012), Wine Quality (Cortez et al., 2009), Microsoft COCO: Common Objects in Context (Plummer et al., 2015), RTE (Snow et al., 2008), Word Similarity (Snow et al., 2008), Street View House Numbers (SVHN) (Netzer et al., 2018)
Dataset Splits	Yes	We randomly split each dataset such that 30% is used as weak supervision data, 40% is used as training data, and 30% is used as test data. For our experiments, we use 10 such random splits and report the mean of the results. We assume we have access to a labeled validation set consisting of 1% of the available data. We calculate error and precision bounds for the pseudolabels with four-fold cross-validation on the validation set.
Hardware Specification	No	We thank NVIDIA for their support through the GPU Grant Program and Amazon for their support via the AWS Cloud Credits for Research program. (This acknowledges support but does not specify hardware models or configurations used for experiments.)
Software Dependencies	No	The paper mentions using Adagrad for optimization and describes model architectures (e.g., six-layer convolutional neural network, logistic regression), and references pre-trained models (BERT, ResNet-101), but it does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	We use the sigmoid function as our parameterized function fθ for estimating class probabilities of ALL and GE, i.e., [fθ(xj)]n j=1 = 1/(1 + exp( θT x)) = pθ. We use a ﬁxed upper bound of b1 = b2 = b3 = 0.3. For multiclass classiﬁcation using deep neural networks, our model uses popular cross-entropy loss. We optimize the loss functions using Adagrad (Duchi et al., 2011). The final result for each experiment is that Multi-ALL outperforms both Snorkel and averaging in all settings, showing a strong ability to fuse noisy signals and to avoid being confounded by redundant signals. We use the same deep neural network architecture for all experiments: a six-layer convolutional neural network where each layer contains a max-pooling unit, a relu activation unit, and dropout. The ﬁnal layer is a fully connected layer with softmax output.