reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Exogenous Isomorphism for Counterfactual Identifiability

Authors: Yikang Chen, Dehui Du

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To demonstrate that neural TM-SCM can effectively address the counterfactual consistency problem in practice, we conducted experiments on synthetic adatasets. These experiments were designed to showcase the model s ability to generate counterfactual results that are consistent with the test set, using only the endogenous samples drawn from the observational distribution as the training set.
Researcher Affiliation	Academia	1Shanghai Key Laboratory of Trustworthy Computing, East China Normal University. Correspondence to: Dehui Du <EMAIL>.
Pseudocode	Yes	Algorithm 1 Pseudo Potential Response for TM-SCM Input: a TM-SCM M, exogenous value u, intervened value x, vectorization ι under a causal order of M. Output: potential response VM[x](u) u u, v Γ(u), D P i I di {Initialize potential response} for k = 1 to D do (i, j) ι 1(k), v v if i Ix then (v )i,j (x)i,j {Do-intervention} end if (u )ι 1[1:k] (Γ 1(v ))ι 1[1:k] {Find exogenous value for fully explaining the prefix part} (v )ι 1[k:D] (Γ(u ))ι 1[k:D] {Assume the suffix part is not intervened, and update the potential response} end for return v
Open Source Code	Yes	Code is available at: https://github.com/cyisk/tmscm
Open Datasets	No	Datasets The experiments involve the following synthetic datasets, with details described in Appendix D.1. TM-SCM-SYM: A collection of four small datasets (BARBELL, STAIR, FORK, BACKDOOR) with up to 4 causal variables, using exogenous distributions that are standard or Markovian multivariate normals and manually defined TM causal mechanisms. ER-DIAG-50 and ER-TRIL-50: Each contains 50 datasets with 3 8 causal variables, Markovian multivariate normal exogenous distributions, and Erd os-R enyi causal graphs (edge probability 0.5).
Dataset Splits	Yes	During the initial execution, following the settings described in Appendix D.1, each synthetic dataset is divided into three splits: training, validation, and test datasets. The training dataset comprises observational data with a sample size of 20,000, directly sampled from the exogenous distribution, with exogenous samples propagated through the synthesized TM-SCM to derive endogenous observational values. The validation and test datasets, each containing 2,000 samples, consist of counterfactual data, providing observations, interventions, and counterfactual outcomes.
Hardware Specification	No	The paper does not explicitly mention any specific hardware used for running its experiments.
Software Dependencies	No	The paper mentions several libraries like Zuko (Rozet et al., 2024), torchdiffeq (Chen, 2018), and geomloss (Feydy et al., 2019). However, it does not provide specific version numbers for these or any other key software components used in the implementation.
Experiment Setup	Yes	All neural network components in the models, as reported in Appendix C, are configured as MLPs with 2 hidden layers and a width of 128. The outputs of these MLPs include an additional dimension representing the encoding length, and the parameters subsequently derived are averaged over this encoding. ... For experiments on TM-SCM-SYM, we train for 100 epochs with a batch size of 64. Validation is performed every k training steps, where k grows exponentially such that the interval after the t-th validation is k = γt . We set γ = 1.25. ... For experiments on ER-DIAG-50 and ER-TRIL-50, we train for 50 epochs with a batch size of 64, performing validation after every epoch. ... The optimizer used is Adam with a learning rate of 0.001 and weight decay of 0.01 as a regularization term.