Avoiding Undesired Future with Sequential Decisions

Authors: Lue Tao, Tian-Zuo Wang, Yuan Jiang, Zhi-Hua Zhou

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, experimental results confirm the practical effectiveness of the proposed approach in both simulated and real-world tasks.
Researcher Affiliation Academia Lue Tao , Tian-Zuo Wang , Yuan Jiang and Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University, China School of Artifcial Intelligence, Nanjing University, China EMAIL
Pseudocode Yes Algorithm 1: Multi-Stage Rehearsal Input: Number of stages M Output: Sequence of alterations A 1 Initialize the sequence of alterations A = [ ]. 2 for m 1 to M do 3 Acquire rehearsal model M G, f, p(ϵ) . 4 Make a new observation o on O m . 5 Obtain the updated noise po(ϵ) by incorporating o into p(ϵ) through retrospective inference. 6 Update rehearsal model M G, f, po(ϵ) . 7 Select an alteration Rh(A = a) from A m by minimizing the probability of failure. 8 Obtain the altered graph GA from G by removing the incoming arrows of A in G. 9 Obtain the altered equations f a from f by setting the equation of A to A = a. 10 Update rehearsal model M GA, f a, po(ϵ) . 11 Append the selected alteration Rh(A = a) to the sequence of alterations A.
Open Source Code No The paper does not contain any explicit statement about providing source code or a link to a code repository.
Open Datasets Yes For the Bermuda data [Aglietti et al., 2020], which includes eleven variables, the goal is to maintain the net coral ecosystem calcification (NEC) within the desired range of [0.5, 2].
Dataset Splits No The paper describes a learning process over 'seasons' and a simulated task, but does not provide specific train/test/validation dataset splits for any explicitly used dataset.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions that the SRM is learned through Bayesian ridge regression and compares with DDPG, PPO, and SAC, but does not provide specific version numbers for any software dependencies.
Experiment Setup No The paper mentions desired ranges for outcome variables, that an SRM is learned through Bayesian ridge regression over 100 seasons, and that experiments are repeated 100 times. It also states 'More detailed experimental settings are provided in the appendix.', implying that specific hyperparameters or full system-level settings are not present in the main text.