reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Probabilistic Temporal Logic Specifications for Stochastic Systems

Authors: Rajarshi Roy, Yash Pote, Dave Parker, Marta Kwiatkowska

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate PriTL through two case studies. In the first case study, we consider learning PLTL formulas to distinguish between desirable and undesirable strategies obtained using RL for a variety of temporal tasks. In the second case study, we consider distinguishing between different variants of a probabilistic protocol. In both case studies, PriTL effectively infers concise and descriptive PLTL formulas that explain the probabilistic temporal behaviour of the systems.
Researcher Affiliation	Academia	1University of Oxford, Oxford OX1 2JD, UK 2National University of Singapore EMAIL,EMAIL,EMAIL
Pseudocode	Yes	Algorithm 1 Inductive step in GBE Input: Fn, Max depth D 1: for d = 0 to D do 2: Fd n+1 = 3: for φ Fd 1 n do 4: Construct ψ = φ for {X, F, G} 5: Add ψ to Fd n+1 if temporal simplify does not hold 6: end for 7: for k = 1 to n 1 do 8: for φ Fd 1 k and φ S d <d Fd n k do 9: ψ = φ U φ 10: Add ψ to Fd n+1 if Boolean simplify does not hold 11: end for 12: for φ Fd k and φ S d d Fd n k do 13: ψ = φ φ for { , } 14: Add ψ to Fd n+1 if Boolean simplify does not hold 15: end for 16: end for 17: end for 18: return Fn+1
Open Source Code	Yes	To this end, we developed a prototype tool, PriTL3, implemented in Python3, which integrates the three procedures, GBE, PTS and BSC, of the learning algorithm. ... 3https://github.com/rajarshi008/PriTL
Open Datasets	Yes	As the underlying MDP, we select the widely used Open AI Gym frozen lake environment [Brockman et al., 2016], employing the same layout as in [Shao and Kwiatkowska, 2023].
Dataset Splits	Yes	For each correct and incorrect task, we generate 10 positive and 10 negative optimal strategy DTMCs, respectively... Overall, for each LTL task, we collected at least 30 positive DTMCs and 30 negative DTMCs corresponding to optimal and suboptimal strategies, respectively. We evaluated our algorithm using varying sample sizes \|S\|, ranging from 10 to 60 DTMCs per sample, with an equal split between positive and negative examples.
Hardware Specification	Yes	We conducted all experiments on a Mac Book Pro M3 (mac OS 14.6.1) with 18 GB RAM.
Software Dependencies	No	The paper mentions 'implemented in Python3', 'rely on LTL simplification and satisfaction features from the SPOT library', and 'rely on the PRISM tool'. While Python3 is mentioned, specific version numbers for the SPOT library or PRISM tool are not explicitly provided in the main text.
Experiment Setup	Yes	For all experiments, we set the maximum depth D = 2, tolerance δ = 0.05, and a Boolean combination limit L = 10.