Nonparametric Identification of Latent Concepts
Authors: Yujia Zheng, Shaoan Xie, Kun Zhang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our theoretical results in both synthetic and real-world settings. |
| Researcher Affiliation | Academia | 1Carnegie Mellon University 2Mohamed bin Zayed University of Artificial Intelligence. |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. Procedural steps are described in regular paragraph text. |
| Open Source Code | No | Experiments are conducted using the official implementation of GIN2 (Sorrenson et al., 2020) with an additional ℓ1 regularization on the Jacobians and Fr EIA3 (Ardizzone et al., 2018-2022) for the flow-based generative function. |
| Open Datasets | Yes | To assess the applicability of our proposed structural condition in complex practical contexts, we performed experiments on five different real-world datasets, i.e., the Fashion-MNIST (Xiao et al., 2017), EMNIST (Cohen et al., 2017), Animal Face (Si & Zhu, 2011), Flower102 (Nilsback & Zisserman, 2008) , and FFHQ (Karras et al., 2019) datasets. |
| Dataset Splits | No | In the considered setting, different samples may correspond to different classes selected by a mask. We structure the dataset as {(x, c)(i)}N i=1, where N denotes the sample size, and c(i) is a multi-hot vector representing the classes for the data point x(i). ... For synthetic settings, the sample size is set as 10,000. The paper specifies total sample size for synthetic data but does not explicitly mention training/validation/test splits for any dataset. |
| Hardware Specification | Yes | Moreover, all experiments are conducted on 12 CPU cores with 16 GB RAM. |
| Software Dependencies | No | Experiments are conducted using the official implementation of GIN2 (Sorrenson et al., 2020) with an additional ℓ1 regularization on the Jacobians and Fr EIA3 (Ardizzone et al., 2018-2022) for the flow-based generative function. ... We use Generative Flow (Kingma & Dhariwal, 2018) as the nonlinear generating function. The paper mentions specific software tools but does not provide version numbers for them or any other key software libraries. |
| Experiment Setup | Yes | The objective function is defined as L(θ) = E(x,c)[− log p ˆ f 1(x | Mi,: c)+λR], where λ is the regularization parameter, and R represents the ℓ1 norm applied to ˆ M and, if estimating class-independent concepts, also to Dˆz ˆf. Following previous work, we use Mean Correlation Coefficient (MCC) to measure the alignment between the ground-truth and the recovered latent concepts. The results are from 10 random trials. Additional details and results are provided in Appx. C, such as identification with general noises and supplementary evaluation metrics across various settings. ... The regularization parameters λ is set according to a search in λ {0.01, 0.1, 1}, and we select λ = 0.1 according to the average MCCs of experiments conducted on synthetic datasets. |