reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning the Language of Software Errors

Authors: Hana Chockler, Pascal Kesseli, Daniel Kroening, Ofer Strichman

JAIR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present experiments that demonstrate the power of an abstract visual representation of errors and of program segments, accessible via the project s web page. In addition, our experiments in this paper demonstrate that such automata can be learned eﬃciently over real-world programs. Tab. 1 summarizes the average runtime decrease as well as the percentage of saved CBMC queries when using lazy learning.
Researcher Affiliation	Collaboration	Hana Chockler EMAIL Department of Informatics, King s College London; Pascal Kesseli EMAIL; Daniel Kroening EMAIL Department of Computer Science, University of Oxford, UK; Ofer Strichman EMAIL Information Systems Engineering, Technion, Haifa, Israel
Pseudocode	Yes	Figure 7: Pseudocode generated for a particular membership query.
Open Source Code	No	The paper states: 'We implemented lazy learning in pv, the error/program visualization tool (Chapman et al., 2015).' and mentions 'pv uses the bounded software model checker CBMC 5.4 (Clarke et al., 2004) for answering queries.' It also notes 'Source code for all examples we use is available online (http://www.cprover.org/learning-errors/, 2015).' However, this only refers to examples and not the source code for the 'pv' tool or the lazy learning methodology itself.
Open Datasets	Yes	They were drawn from three sources: the Competition on Software Veriﬁcation (Beyer, 2015), the Software-artifact Infrastructure Repository2, and a Docking Approach program... Footnote 2: http://sir.unl.edu. The JQ3 project is a lightweight command-line JSON processor... a known and at the time of our ﬁrst experiments open bug from the JQ bug tracker page4. Footnote 3: https://github.com/stedolan/jq Footnote 4: https://github.com/stedolan/jq/issues/817
Dataset Splits	No	The paper focuses on software verification benchmarks and evaluates the tool's performance under different loop unwinding and word length settings. It does not involve splitting datasets into training, validation, or test sets in the traditional sense of machine learning models or data-driven experiments.
Hardware Specification	No	The paper discusses runtime performance in Section 8 and Appendix A, including specific time reductions (e.g., 'reduced the runtime of pv from over 59 minutes down to 28 minutes'). However, it does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for conducting these experiments.
Software Dependencies	Yes	The implementation of the algorithm uses the automata library libalf (Bollig et al., 2010) as the learner. This library supports L , its improvement RS Rivest and Schapire (1993) and several other variants; we experimented with the ﬁrst two. pv uses the bounded software model checker CBMC 5.4 (Clarke et al., 2004) for answering queries.
Experiment Setup	Yes	We let b range between 1 and bmax, where bmax is relatively small (4 in our default conﬁguration). As an initial value for bwl (bwl min), we take a conservative estimation of the shortest word possible... We increase the value of bwl up to a maximum of bwl max, which is user-deﬁned. Each of these benchmarks is executed at diﬀerent loop unwinding and word length settings, resulting in 83 data points. For this particular benchmark, the lazy learning algorithm described in Sec. 6 reduced the runtime of pv from over 59 minutes down to 28 minutes at user unwind limit 4 and maximum word length 12.