Situational-Constrained Sequential Resources Allocation via Reinforcement Learning
Authors: Libo Zhang, Yang Chen, Toru Takisaka, Kaiqi Zhao, Weidong Li, Jiamou Liu
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate SCRL across two scenarios: medical resource allocation during a pandemic and pesticide distribution in agriculture. Experiments demonstrate that SCRL outperforms existing baselines in satisfying constraints while maintaining high resource efficiency, showcasing its potential for real-world, contextsensitive decision-making tasks. |
| Researcher Affiliation | Academia | Libo Zhang1,2 , Yang Chen3 , Toru Takisaka1 , Kaiqi Zhao2 , Weidong Li2 and Jiamou Liu2 1School of Computer Science and Engineering, University of Electronic Science and Technology of China 2The University of Auckland 3The University of New South Wales EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Algorithm Scheme with Punitive Term 1: Input: An MDP M with a constraint Ψ 2: Initialize: An initial policy π; A punitive term σ 3: while Not converged do 4: Generate trajectories Dπ {τ1, τ2, | η, π, P} 5: Evaluate violation degree V ioπ(Ψ) using Dπ 6: Update punitive term function σ(Ψ, s) 7: for each transition (s, a, r, s ) τi where τi Dπ do 8: Apply punitive term on reward r r σ(Ψ, s) 9: end for 10: Update policy π arg maxπ EDπ [P t γtr (st, at)] 11: end while 12: Return π |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., a link to a repository or an explicit statement of code release) for the methodology described. |
| Open Datasets | Yes | We evaluate SCRL in two real-world-inspired scenarios: medical resource allocation during the COVID-19 pandemic in Beijing, China [Hao et al., 2021] and agricultural resource distribution in Saskatchewan, Canada [Qin et al., 2021]. |
| Dataset Splits | No | The paper mentions spatial divisions of the simulation environments (e.g., "city is divided into modules, grouped into five sub-regions", "divided into five sub-regions based on crop types", "50 x 50 grids"), but it does not specify any training, validation, or test dataset splits in the conventional machine learning sense. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | No | The paper does not provide specific hyperparameter values (e.g., learning rate for DDPG, RCPO, CAL, DCRL, batch size, number of epochs) or detailed training configurations for the experimental setup. It mentions a 'learning rate β' for the penalty factor update but not specific values for the overall models. |