reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

APIRL: Deep Reinforcement Learning for REST API Fuzzing

Authors: Myles Foley, Sergio Maffeis

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The evaluation of APIRL across 26 REST APIs shows significant improvement over state-of-the-art methods in terms of bugs found, coverage, and test case efficiency. We also study how reward functions, and other key design choices, affect learnt policies with a thorough ablation study.
Researcher Affiliation	Academia	Myles Foley12, Sergio Maffeis1 1Department of Computing, Imperial College London 2The Alan Turing Institute EMAIL, EMAIL
Pseudocode	No	The paper describes the REST API testing process using APIRL in Figure 2 and lists mutation actions in Table 1, but it does not contain an explicit pseudocode block or a clearly labeled algorithm section detailing the overall method.
Open Source Code	Yes	We release APIRL at https://github.com/ICL-ml4csec/ APIRL.
Open Datasets	No	The paper mentions training APIRL using an 'open-source REST API containing known bugs: Generic University' and pre-training a RoBERTa transformer using 'HTTP responses from 103 different REST APIs', with API specifications from 'a public Open API specification platform (see Appendix C)'. However, no concrete links, DOIs, or formal citations (with author and year) are provided in the main text for public access to these specific datasets.
Dataset Splits	No	The paper describes training strategies, such as training 'on each of Generic University s operations for 10,000 episodes' and testing with 'three' episodes per operation. However, it does not specify explicit percentages or sample counts for training, validation, or test dataset splits for the underlying data.
Hardware Specification	Yes	Experiments are run on Ubuntu Linux, with 16GB RAM and Intel core i7 8700k processor.
Software Dependencies	No	The paper mentions implementing the neural network in 'Py Torch' and using a 'Ro BERTa (Liu et al. 2019) transformer model', but it does not provide specific version numbers for these or any other ancillary software dependencies.
Experiment Setup	Yes	We implement the neural network in Py Torch, with an input layer of size 772, and hidden layers of size 64, 96, 64, with an output layer of 23, corresponding to the actions in Table 1. We use a standard ε-greedy decay with ε = 1 (decaying by 0.999 after each episode). We select γ = 0.9, α = 0.005, and batch size 128. APIRL trains on each of Generic University s operations for 10,000 episodes, with the maximum steps per episode of 10.