reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Situational-Constrained Sequential Resources Allocation via Reinforcement Learning

Authors: Libo Zhang, Yang Chen, Toru Takisaka, Kaiqi Zhao, Weidong Li, Jiamou Liu

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate SCRL across two scenarios: medical resource allocation during a pandemic and pesticide distribution in agriculture. Experiments demonstrate that SCRL outperforms existing baselines in satisfying constraints while maintaining high resource efficiency, showcasing its potential for real-world, contextsensitive decision-making tasks.
Researcher Affiliation	Academia	Libo Zhang1,2 , Yang Chen3 , Toru Takisaka1 , Kaiqi Zhao2 , Weidong Li2 and Jiamou Liu2 1School of Computer Science and Engineering, University of Electronic Science and Technology of China 2The University of Auckland 3The University of New South Wales EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Algorithm Scheme with Punitive Term 1: Input: An MDP M with a constraint Ψ 2: Initialize: An initial policy π; A punitive term σ 3: while Not converged do 4: Generate trajectories Dπ {τ1, τ2, \| η, π, P} 5: Evaluate violation degree V ioπ(Ψ) using Dπ 6: Update punitive term function σ(Ψ, s) 7: for each transition (s, a, r, s ) τi where τi Dπ do 8: Apply punitive term on reward r r σ(Ψ, s) 9: end for 10: Update policy π arg maxπ EDπ [P t γtr (st, at)] 11: end while 12: Return π
Open Source Code	No	The paper does not provide any concrete access information (e.g., a link to a repository or an explicit statement of code release) for the methodology described.
Open Datasets	Yes	We evaluate SCRL in two real-world-inspired scenarios: medical resource allocation during the COVID-19 pandemic in Beijing, China [Hao et al., 2021] and agricultural resource distribution in Saskatchewan, Canada [Qin et al., 2021].
Dataset Splits	No	The paper mentions spatial divisions of the simulation environments (e.g., "city is divided into modules, grouped into five sub-regions", "divided into five sub-regions based on crop types", "50 x 50 grids"), but it does not specify any training, validation, or test dataset splits in the conventional machine learning sense.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup	No	The paper does not provide specific hyperparameter values (e.g., learning rate for DDPG, RCPO, CAL, DCRL, batch size, number of epochs) or detailed training configurations for the experimental setup. It mentions a 'learning rate β' for the penalty factor update but not specific values for the overall models.