reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

To Transfer or Not to Transfer: Suppressing Concepts from Source Representations

Authors: Vijay Sadashivaiah, Keerthiram Murugesan, Ronny Luss, Pin-Yu Chen, Chris Sims, James Hendler, Amit Dhurandhar

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach qualitatively and quantitatively in the visual domain showcasing its efficacy for classification and generative source models.
Researcher Affiliation	Collaboration	Vijay Sadashivaiah EMAIL Department of Computer Science, Rensselaer Polytechnic Institute Keerthiram Murugesan EMAIL IBM Research, Yorktown Heights
Pseudocode	Yes	Algorithm 1 Controllable Concept Transfer Concatenate with Concept Search method (CCT-cat(cs)) Algorithm 2 TRAIN-CDN: Train Concept Disentangling Network Algorithm 3 CONCEPT-SEARCH: Search Concepts to Suppress
Open Source Code	No	The paper does not provide an explicit statement about the release of source code for the methodology, nor does it include a link to a code repository.
Open Datasets	Yes	For the bulk of our experiments in Sections 4 and 5 we use the MNIST (Le Cun et al., 1998), EMNIST (Cohen et al., 2017) and Celeb Faces Attributes (Celeb A) (Liu et al., 2015) datasets. ... Specifically, it has been shown that object recognition models can spuriously rely on the image background instead of the objects themselves (Ribeiro et al., 2016). We study this phenomenon using the Water Birds dataset (Sagawa et al., 2019)
Dataset Splits	Yes	EMNIST. EMNIST is a set of handwritten characters derived from NIST Special Database 19. ... There are 88,800 training examples and 14,800 testing examples. ... Celeb Faces Attributes. ... There are 162,770 training examples, 19962 test examples and 19867 validation examples. ... The Diabetes Health Indicators Dataset contains healthcare statistics and lifestyle survey information about people along with their diagnosis of diabetes. ... We first divide the dataset into source and target tasks, where the source task has 69,692 samples leaving the target task with 1000 samples.
Hardware Specification	Yes	The models were trained in parallel with the specifications shown in Table 8. Resource Setting CPU IBM Power 9 CPU @ 3.15GHz Memory 512GB GPUs 1 x NVIDIA Tesla V100 16 GB Disk 1.2 TB OS Red Hat8
Software Dependencies	No	The paper mentions the use of the Adam optimizer (Kingma & Ba, 2014) but does not provide specific version numbers for any software libraries or dependencies used in the implementation.
Experiment Setup	Yes	For our experimental analysis in the main paper, we set the number of epochs for training to E = 50 for all models. We train all models using a batch size of 25 and a learning rate of 10 4 for the Adam optimizer (Kingma & Ba, 2014). All models were randomly initialized before training.