reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Discrete Object Generation with Reversible Inductive Construction

Authors: Ari Seff, Wenda Zhou, Farhan Damani, Abigail Doyle, Ryan P. Adams

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the proposed approach on two highly structured discrete domains, molecules and Laman graphs, and ﬁnd that it compares favorably to alternative methods at capturing distributional statistics for a host of semantically relevant metrics. Quantitative evaluation indicates that the proposed method can effectively model highly structured discrete distributions while adhering to strict validity constraints.
Researcher Affiliation	Academia	Ari Seff Princeton University Princeton, NJ EMAIL Wenda Zhou Columbia University New York, NY EMAIL Farhan Damani Princeton University Princeton, NJ EMAIL Abigail Doyle Princeton University Princeton, NJ EMAIL Ryan P. Adams Princeton University Princeton, NJ EMAIL
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks for its own method.
Open Source Code	Yes	We formulate our approach, generative reversible inductive construction (Gen RIC)1, as the equilibrium distribution of a Markov chain that only visits valid objects, without a need for inefﬁcient rejection sampling. 1https://github.com/Princeton LIPS/reversible-inductive-construction
Open Datasets	Yes	For molecules we test the proposed approach on the ZINC dataset, which contains about 250K drug-like molecules from the ZINC database [35]. For Laman graphs, we generate synthetic graphs randomly via Algorithm 7 from Moussaoui [29], originally proposed for evaluating geometric constraint solvers embedded within CAD programs.
Dataset Splits	Yes	The model is trained on 220K molecules according to the same train/test split as in Jin et al. [19], Kusner et al. [21].
Hardware Specification	No	We acknowledge computing resources from Columbia University s Shared Research Computing Facility project, which is supported by NIH Research Facility Improvement Grant 1G20RR030893-01, and associated funds from the New York State Empire State Development, Division of Science Technology and Innovation (NYSTAR) Contract C090171, both awarded April 15, 2010. This statement describes funding and a facility but lacks specific hardware details (e.g., GPU/CPU models).
Software Dependencies	No	The paper mentions software like 'RDKit [24]' but does not provide specific version numbers for software dependencies used in their own experiments.
Experiment Setup	No	Unless otherwise stated, the results reported in Sections 3 and 4, use a geometric distribution with ﬁve expected steps for the corruption sequence length. For each method, we obtain 20K samples either by running pre-trained models [19, 14, 21], by accessing pre-sampled sets [26, 34, 25], or by training models from scratch [33]2. While some details are given, comprehensive hyperparameter values (e.g., learning rate, batch size, specific optimizer settings) are not provided in the main text.