reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On Catastrophic Inheritance of Large Foundation Models

Authors: Hao Chen, Bhiksha Raj, Xing Xie, Jindong Wang

DMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this position paper, we propose to identify a neglected issue deeply rooted in LFMs: Catastrophic Inheritance, describing the weaknesses and limitations inherited from biased large-scale pre-training data to behaviors of LFMs on the downstream tasks... We discuss the challenges behind this issue and propose UIM, a framework to Understand the catastrophic inheritance of LFMs from both pre-training and downstream adaptation, Interpret the implications of catastrophic inheritance on downstream tasks, and how to Mitigate it.
Researcher Affiliation	Collaboration	Hao Chen EMAIL Carnegie Mellon University Bhiksha Raj EMAIL Carnegie Mellon University Xing Xie EMAIL Microsoft Research Jindong Wang EMAIL Microsoft Research, William & Mary
Pseudocode	No	The paper defines concepts and proposes a framework (UIM) but does not include any specific pseudocode or algorithm blocks. It presents a conceptual framework and discussions without formal algorithms.
Open Source Code	No	The paper discusses other models and their training data (e.g., LAION-5B, GPT) as examples to illustrate points about catastrophic inheritance, but it does not provide any specific code for the methodology or framework proposed in this paper.
Open Datasets	No	The paper cites numerous external datasets used in other research (e.g., LAION-5B, Red Pajama, ImageNet) to illustrate points about biased pre-training data, but it does not perform its own experiments using these datasets or any other dataset in the context of the framework it proposes. Therefore, it does not provide concrete access information for a dataset used in its own work.
Dataset Splits	No	This paper is a position paper proposing a framework and discussing existing research; it does not describe any experiments that would require dataset splits.
Hardware Specification	No	This paper is a position paper proposing a framework and discussing existing research; it does not describe any experiments that would require specific hardware for execution.
Software Dependencies	No	This paper is a position paper proposing a framework and discussing existing research; it does not describe any experiments that would require specific software dependencies for execution.
Experiment Setup	No	This paper is a position paper proposing a framework and discussing existing research; it does not describe any experiments or their setup, including hyperparameters or training configurations.