reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Directed Graph Grammars for Sequence-based Learning

Authors: Michael Sun, Orion Foo, Gang Liu, Wojciech Matusik, Jie Chen

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4. Experiments Directed Graph Grammars for Sequence-based Learning Table 2. Prior validity, uniqueness and novelty (%). We follow the same settings as Zhang et al. (2019). Methods Neural architectures Bayesian networks Accuracy Validity Uniqueness Novelty Accuracy Validity Uniqueness Novelty D-VAE 99.96 100.00 37.26 100.00 99.94 98.84 38.98 98.01 S-VAE 99.98 100.00 37.03 99.99 99.99 100.00 35.51 99.70 Graph RNN 99.85 99.84 29.77 100.00 96.71 100.00 27.30 98.57 GCN 98.70 99.53 34.00 100.00 99.81 99.02 32.84 99.40 Deep GMG 94.98 98.66 46.37 99.93 47.74 98.86 57.27 98.49 DIGGED (GNN) 100 100 98.7 99.9 100 100 97.6 100 DIGGED (TOKEN) 100 100 25.4 37.8 100 100 98.67 26.67
Researcher Affiliation	Collaboration	1MIT CSAIL 2MIT 3University of Notre Dame 4MIT-IBM Watson AI Lab, IBM Research. Correspondence to: Michael Sun <EMAIL>.
Pseudocode	Yes	Further details and pseudocode are in App. B. In Algo. 4, we give the pseudocode of the disambiguation algorithm. Algorithm 1: function grammar induction(dataset)
Open Source Code	Yes	Code is available at https://github.com/ shiningsunnyday/induction.
Open Datasets	Yes	1. Neural Architectures (ENAS). The ENAS dataset contains 19,020 neural architectures from the ENAS software and their weight-sharing accuracy (WS-Acc) on CIFAR10 (Pham et al., 2018). ... 2. Bayesian Networks (BN). The BN dataset contains 200,000 random, 8-node Bayesian networks from the R package bnlearn (Scutari, 2009) and their Bayesian Information Criterion (BIC) score for fitting the Asia dataset (Lauritzen & Spiegelhalter, 1988). ... 3. Analog Circuits (CKT). The CKT dataset contains 10000 operational amplifiers (op-amps) released by Dong et al. (2023)
Dataset Splits	Yes	2. Predictive Performance. For property prediction, we train a Sparse Gaussian Process (SGP) regressor, following the same setup and hyperparameters as Zhang et al. (2019); Thost & Chen (2021); Dong et al. (2023). 3. Bayesian Optimization. We run batched Bayesian Optimization based on the SGP model for 10 rounds with 50 acquisition samples per round. We follow the same setup as Zhang et al. (2019) for ENAS and BN and Dong et al. (2023) for CKT
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models or processor types used for running its experiments.
Software Dependencies	No	The paper mentions software components like the 'Subdue library' and 'networkx's cliques library' (implicitly referring to the Python library 'networkx') in Appendix B.1, but it does not provide specific version numbers for any of the software dependencies used in their methodology.
Experiment Setup	Yes	The optimal parameters for our model were determined using a hyperparameter scan sweeping over various properties of the VAE, using validation loss as the guide. During the scan, we explore varying architecture properties such as: number of encoder layers, number of decoder layers, latent dimension, embedding dimension, batch size, and KL divergence loss coefficient.