reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Synthesizing Libraries of Programs with Auxiliary Functions

Authors: Habibur Rahman, Thirupathi Reddy Emireddy, Kenneth Tjhia, Elham Parhizkar, Levi Lelis

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Aulile in string manipulation tasks. Aulile improved, in some cases by a large margin, the performance of several base algorithms that use different search and learning strategies: BUS, Bustle, Crossbeam, and Bee Search. Our results suggest that Aulile offers an effective method of injecting domain knowledge into existing systems through a library learning scheme that optimizes for an auxiliary function. We perform three sets of experiments. In the first set, we compare the base synthesizers with their augmented counterparts (Figures 3 and 4). We compare the algorithms in terms of the number of problems solved by the number of programs evaluated. In the second set, we compare the best performing systems of each experiment in the first set (Figure 5). Since the algorithms use different models, we use running time instead of number of evaluations. This is to account for the complexity of the models employed. For example, Crossbeam uses a more complex and thus slower model than Bustle. We used 14 million programs evaluated as the budget B of Aulile for A-BUS, A-Bustle, and A-Bee. In the third set, we perform ablation studies.
Researcher Affiliation	Academia	Habibur Rahman EMAIL Amii, Department of Computing Science, University of Alberta
Pseudocode	Yes	Algorithm 1 Aulile Require: Input-output pairs I, O, DSL G, auxiliary function a, base synthesizer Y , and a budget B Ensure: Solution program p or
Open Source Code	Yes	Our implementation is available online.1 1https://github.com/lelis-research/aulile
Open Datasets	Yes	We evaluate Aulile on string manipulation (Alur et al., 2013; Odena et al., 2021) tasks, in which one is given a set of input-output examples and needs to find a program that maps each input to its corresponding output. We use two datasets from Odena et al. (2021): one with 89 instances of the Sy Gu S competition and another with 38 handcrafted instances.
Dataset Splits	No	We evaluate Aulile on string manipulation (Alur et al., 2013; Odena et al., 2021) tasks, in which one is given a set of input-output examples and needs to find a program that maps each input to its corresponding output. We use two datasets from Odena et al. (2021): one with 89 instances of the Sy Gu S competition and another with 38 handcrafted instances. The paper does not specify training, validation, or test splits for these datasets within its experimental setup.
Hardware Specification	No	We used 1 CPU at 2.4 GHz and 64 GB of RAM in all experiments; for Bustle, A-Bustle, Bee Search, A-Bee, Crossbeam, and A-Crossbeam we also used 1 GPU, since they use a neural model. Specific CPU/GPU models are not mentioned.
Software Dependencies	No	We implemented the DSL for string manipulation, as well as BUS and Bustle. We use the implementations of Crossbeam and Bee Search provided by their authors. No specific version numbers for software or libraries are provided.
Experiment Setup	Yes	We used 14 million programs evaluated as the budget B of Aulile for A-BUS, A-Bustle, and A-Bee. In the second set of experiments, we used 5 million evaluations as B for A-Crossbeam. We plot the mean and standard deviation over five independent runs for Bustle, A-Bustle, Bee Search, A-Bee, Crossbeam, and A-Crossbeam because they depend on the random initialization of the weights of their neural models and also sampling in the case of Crossbeam and A-Crossbeam. BUS and A-BUS are deterministic, therefore we present the results of a single run. We use the Levenshtein distance (Levenshtein, 1966) as the auxiliary function. Figure 6 shows the results of A-BUS when it adds the best k programs to the language in each Aulile iteration, for k in {1, 2, 3}. We define different auxiliary functions that are based on the Levenshtein distance. This is achieved with a parameter 0.0 < l <= 1.0.