Enhancing Treatment Effect Estimation via Active Learning: A Counterfactual Covering Perspective

Authors: Hechuan Wen, Tong Chen, Mingming Gong, Li Kheng Chai, Shazia Sadiq, Hongzhi Yin

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Furthermore, benchmarking FCCM against other baselines demonstrates its superiority across both fully synthetic and semi-synthetic datasets. Code: https://github.com/uqhwen2/FCCM. ... In Figure 5, it is observed that our proposed method is generally served as the risk lower bound in all three datasets. Its outstanding performance empirically proves the superiority of our method...
Researcher Affiliation Academia 1School of EECS, The University of Queensland, Australia 2School of Mathematics and Statistics, The University of Melbourne, Australia 3Department of Machine Learning, Mohamed bin Zayed University of Artificial Intelligence, United Arab Emirates 4Health and Wellbeing Queensland, Australia. Correspondence to: Hongzhi Yin <EMAIL>.
Pseudocode Yes Algorithm 1 Greedy Radius Reduction (Sketch) ... Algorithm 2 FCCM
Open Source Code Yes Code: https://github.com/uqhwen2/FCCM.
Open Datasets Yes IBM (Shimoni et al., 2018) a high-dimensional tabular dataset based on the publicly available Linked Births and Infant Deaths Database. ... CMNIST (Jesson et al., 2021a) This dataset contains 60,000 image samples (10 classes) of size 28 28, which are adapted from MINIST (Le Cun, 1998) benchmark.
Dataset Splits Yes The details of the data acquisition setup is summarized in Table 1, where we initialize the training set S with the entire labeled samples (denoted as ALL*) from group t = 0 and start acquisition only on the sample from t = 1, which simulates scenarios with a significant number of missing counterfactual samples. Then, a fixed step length is enforced at each acquisition step with fifty data acquisition steps.Table 1. Summary of the Acquisition Setup and Testing Dataset Start Length Steps Pool Val Test TOY ALL* 1 50 7200 2880 1600 IBM ALL* 50 50 2891 3180 6250 CMNIST ALL* 50 50 16706 10500 18000
Hardware Specification Yes We conduct all the experiments with 24GB NVIDIA RTX-3090 GPU on Ubuntu 22.04 LTS platform with the 12th Gen Intel i7-12700K 12-Core 20-Thread CPU.
Software Dependencies No We conduct all the experiments with 24GB NVIDIA RTX-3090 GPU on Ubuntu 22.04 LTS platform with the 12th Gen Intel i7-12700K 12-Core 20-Thread CPU. As stated in the main text, for fair comparison, we take the consistent hyperparameters tuned in (Jesson et al., 2021b; Wen et al., 2025) for the estimators: DUE-DNN (Van Amersfoort et al., 2021) and DUE-CNN (Van Amersfoort et al., 2021) shown in Table 3.
Experiment Setup Yes As stated in the main text, for fair comparison, we take the consistent hyperparameters tuned in (Jesson et al., 2021b; Wen et al., 2025) for the estimators: DUE-DNN (Van Amersfoort et al., 2021) and DUE-CNN (Van Amersfoort et al., 2021) shown in Table 3. Additionally, we search the best hyperparameters, i.e., covering radius δ and edge weight α for counterfactual linkage, for Algorithm 2 with the validation set shown in Table 4.Table 3. Hyperparameters for Estimators Hyperparameters DNN CNN Kernel RBF Matern Inducing Points 100 100 Hidden Neurons 200 200 Depth 3 2 Dropout Rate 0.1 0.05 Spectral Norm 0.95 3.0 Learning Rate 1e-3 1e-3Table 4. Hyperparameters for Algorithm 2 Hyperparameters Search Space Tuned δ(1,1) for TOY [0.11, 0.12, 0.13] 0.11 δ(1,0) for TOY [0.11, 0.12, 0.13] 0.11 δ(1,1) for IBM [0.11, 0.13, 0.15] 0.11 δ(1,0) for IBM [0.11, 0.13, 0.15] 0.11 δ(1,1) for CMNIST [0.40, 0.45, 0.50] 0.50 δ(1,0) for CMNIST [0.40, 0.45, 0.50] 0.40 Edge weight α [1.0, 2.5, 5.0] 2.5