reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Simple Regret Optimization in Online Planning for Markov Decision Processes

Authors: Z. Feldman, C. Domshlak

JAIR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical evaluation shows that both BRUE and its generalization, BRUE(α), are also very eﬀective in practice and compare favorably to the state-of-the-art. We have evaluated BRUE empirically on the MDP Sailing domain (P eret & Garcia, 2004), used in previous works for evaluating MCTS algorithms (P eret & Garcia, 2004; Kocsis & Szepesv ari, 2006; Tolpin & Shimony, 2012), as well as on an MDP version of random game trees used in the original empirical evaluation of UCT (Kocsis & Szepesv ari, 2006).
Researcher Affiliation	Academia	Zohar Feldman EMAIL Carmel Domshlak EMAIL Faculty of Industrial Engineering & Management, Technion Israel Institute of Technology, Haifa, Israel
Pseudocode	Yes	MCTS: [input: S, A, Tr, R ; s0 S] while time permits do ρ Rollout // generate rollout Update(ρ) return arg maxa b Q(s0 H , a) procedure Rollout... (Figure 1). procedure Update(ρ) for d \|ρ\|, . . . , 1 do... (Figure 2). procedure MC-backup(s h , a, r) nα α n(s h , a) n n(s h , a) L(s h , a)[n] r b Q(s h , a) 1 nα Pn i=n nα L(s h , a)[i] (Figure 4).
Open Source Code	No	The paper does not contain any explicit statement about making the source code available, nor does it provide a link to a code repository for the methodology described.
Open Datasets	Yes	We have evaluated BRUE empirically on the MDP Sailing domain (P eret & Garcia, 2004), used in previous works for evaluating MCTS algorithms (P eret & Garcia, 2004; Kocsis & Szepesv ari, 2006; Tolpin & Shimony, 2012), as well as on an MDP version of random game trees used in the original empirical evaluation of UCT (Kocsis & Szepesv ari, 2006).
Dataset Splits	No	For the Sailing domain: Each algorithm was run on 1000 randomly chosen initial states s0, with the target being ﬁxed at one of the corners of the grid. For random game trees: Figure 6 shows the performance of the four algorithms for two game conﬁgurations, B = 6, D = 8 and B = 2, D = 22, with each conﬁguration being represented by 1000 game trees. The paper describes using environments (Sailing domain, game trees) and generating instances/initial states for evaluation, but does not provide details on training/test/validation splits for a static dataset.
Hardware Specification	No	The paper does not provide specific hardware details (like CPU/GPU models, memory, or specific computing infrastructure) used for running the experiments.
Software Dependencies	No	The paper mentions that 'All four algorithms were implemented within a single software infrastructure' but does not specify any programming languages, libraries, or solvers with version numbers.
Experiment Setup	Yes	In the Sailing domain, we set H to 4 n, where n is the grid-size of the problem instance... the exploration coeﬃcient for UCT and u UCT (parameter c in Eq. 2) was set to the diﬀerence between the largest possible and the smallest possible values of the H-step rollouts from the root. In the Sailing domain, this corresponds to the maximal move duration, 6, multiplied by the number of steps-to-go h. ...the exploration coeﬃcient for UCT and u UCT was set to the range of the game values, 127H, since rewards are bounded by the interval 127H 2. ...we used α = n H nh , where n H denotes the average number of samples of leaf nodes, and nh denotes the average number of samples of nodes at the same depth of the value estimator under consideration.