reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fairness on Principal Stratum: A New Perspective on Counterfactual Fairness

Authors: Haoxuan Li, Zeyu Tang, Zhichao Jiang, Zhuangyan Fang, Yue Liu, Zhi Geng, Kun Zhang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments are conducted using synthetic and real-world datasets to verify the effectiveness of our methods.
Researcher Affiliation	Collaboration	1Peking University 2Carnegie Mellon University 3Sun Yatsen University 4Xiaomi 5Renmin University of China 6Beijing Technology and Business University 7Mohamed bin Zayed University of Artificial Intelligence.
Pseudocode	No	The paper describes methods in text but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code	No	The paper does not provide any statements about releasing code, nor does it include links to source code repositories.
Open Datasets	Yes	The STUDENTINFO file in the Open University Learning Analytics Dataset (OULAD) dataset (Kuzilek et al., 2017) is used for the real-world experiment.
Dataset Splits	No	The paper mentions a sample size of 1,000 for synthetic data and 32,593 students for the OULAD dataset. It discusses dividing the population into subgroups for analysis, but does not provide specific train/test/validation split percentages, counts, or methodologies needed to reproduce the data partitioning for model training.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies	No	The paper mentions using "the PC algorithm in the causal-learn package" and various models like "Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF) and Naive Bayes (NB)", but it does not specify version numbers for any of these software components or libraries.
Experiment Setup	No	The paper mentions data generation parameters such as "noise ϵi N(0, 2.5)" and "n is the sample size, which is 1,000", but it does not provide specific experimental setup details like hyperparameter values (e.g., learning rates, batch sizes, number of epochs) or other training configurations for the models used.