reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multiple-policy Evaluation via Density Estimation

Authors: Yilei Chen, Aldo Pacchiano, Ioannis Paschalidis

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We propose an algorithm named CAESAR for this problem. Our approach is based on computing an approximately optimal sampling distribution and using the data sampled from it to perform the simultaneous estimation of the policy values. Up to low order and logarithmic terms CAESAR achieves a sample complex- ity.
Researcher Affiliation	Academia	1Boston University, Boston, USA 2Broad Institute of MIT and Harvard, Cambridge, USA. Correspondence to: Yilei Chen <EMAIL>.
Pseudocode	Yes	Algorithm 1 Importance Density Estimation (IDES) Input: Horizon H, accuracy ϵ, target policy π, coarse estimator { ˆdπ h}H h=1 , {ˆµh}H h=1 and dataset µ Define feasible sets {Dh}H h=1 where Dh(s, a) = [0, 2 ˆdπ h(s, a)]. Initialize w0 h = 0, h = 1, . . . , H, and set µ0(s0, a0) = 1, P0(s\|s0, a0) = ν(s), ˆw0 = ˆµ0 = 1. for h = 1 to H do Set the iteration number of optimization, nh = s,a ( ˆdπ h(s,a))2 ˆµh(s,a) + ( ˆdπ h 1(s,a))2 is a known constant. for i = 1 to nh do Sample {si h, ai h} from µh and {si h 1, ai h 1, si h} from µh 1. Calculate gradient g(wi 1 h ), g(wi 1 h )(s, a) = wi 1 h (s, a) ˆµh(s, a) I(si h = s, ai h = a) ˆwh 1(si h 1, ai h 1) ˆµh 1(si h 1, ai h 1) π(a\|s)I(si h = s). Update wi h = Projw Dh{wi 1 h ηi hg(wi 1 h )}. end for Output the estimator ˆwh = 1 Pnh i=1 i Pnh i=1 wi h. end for
Open Source Code	No	The paper does not provide any explicit statements about releasing code or links to source code repositories.
Open Datasets	No	The paper is theoretical and discusses a general 'offline dataset' or 'batch of data' within the problem formulation, but it does not specify or use any particular publicly available dataset for experiments.
Dataset Splits	No	The paper is theoretical and does not describe experiments with specific datasets, therefore, it does not mention any training, test, or validation dataset splits.
Hardware Specification	No	The paper is theoretical and focuses on algorithm design and sample complexity analysis. It does not describe any experimental setup or the specific hardware used to run experiments.
Software Dependencies	No	The paper mentions theoretical concepts and algorithms like 'stochastic gradient descent' and 'Dual DICE', and references related works, but it does not specify any software libraries or packages with version numbers used for implementation.
Experiment Setup	No	The paper is theoretical and does not present experimental results, therefore, it does not include details on experimental setup, hyperparameters, or training configurations.