reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Neural Task Synthesis for Visual Programming

Authors: Victor-Alexandru Pădurean, Georgios Tzannetos, Adish Singla

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of Neur Task Syn through an extensive empirical evaluation and a qualitative study on reference tasks taken from the Hour of Code: Classic Maze challenge by Code.org and the Intro to Programming with Karel course by Code HS.com.
Researcher Affiliation	Academia	Victor-Alexandru Pădurean EMAIL Max Planck Institute for Software Systems Georgios Tzannetos EMAIL Max Planck Institute for Software Systems Adish Singla EMAIL Max Planck Institute for Software Systems
Pseudocode	Yes	Algorithm 1: Specification Dataset Collection Procedure
Open Source Code	Yes	We publicly release the implementation and datasets to facilitate future research.1 1Git Hub repository: https://github.com/machine-teaching-group/tmlr2024_neurtasksyn.
Open Datasets	Yes	We publicly release the implementation and datasets to facilitate future research.1 1Git Hub repository: https://github.com/machine-teaching-group/tmlr2024_neurtasksyn.
Dataset Splits	Yes	In our evaluation, we split D as follows: 80% for training the neural models (Dtrain), 10% for calibration (Dcal), and a fixed 10% for evaluation (Dtest).
Hardware Specification	Yes	All the experiments were conducted on a cluster of machines equipped with Intel Xeon Gold 6142 CPUs running at a frequency of 2.60GHz.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers.
Experiment Setup	Yes	Hyper Parameter Ho CMaze Karel Max. Epochs 60 100 Batch Size 32 32 Learning Rate 5 10 4 5 10 4 Dict. Size 59 59 Max. Blocks 17 17 (c) Hyperparameters Figure 18: Illustration of training details for the code generator. (a) and (b) show the training curves with mean epoch loss and validation performance, based on metric M, for both the Ho CMaze and Karel domains. (c) shows the hyperparameters employed for the code generator training.