reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

KIPPO: Koopman-Inspired Proximal Policy Optimization

Authors: Andrei Cozma, Landon Harris, Hairong Qi

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results demonstrate consistent improvements over the PPO baseline with 6 60% increased performance while reducing variability by up to 91% when evaluated on various continuous control tasks.
Researcher Affiliation	Academia	Andrei Cozma , Landon Harris and Hairong Qi University of Tennessee, Knoxville EMAIL, EMAIL
Pseudocode	No	We refer readers to the supplementary materials for complete implementation details and pseudocode.
Open Source Code	Yes	Extended version with comprehensive appendices containing ablation studies, hyperparameter analyses, pseudocode, and implementation details is available at: https://andreicozma.com/KIPPO.
Open Datasets	Yes	We evaluate six continuous control environments from Gymnasium [Towers et al., 2023] using Mu Jo Co [Todorov et al., 2012] and Box2D [Catto, 2007]
Dataset Splits	No	The paper does not describe traditional training/test/validation dataset splits for static datasets, as it uses reinforcement learning environments where data is generated dynamically through interaction. It mentions mini-batches for optimization: "The algorithm divides 2,048 steps into 32 mini-batches".
Hardware Specification	No	Hardware specifications and reference runtime are provided in the supplementary material.
Software Dependencies	No	The paper mentions using "PPO and RPO implementations from the Clean RL library [Huang et al., 2022]" but does not specify version numbers for Clean RL or other software components.
Experiment Setup	Yes	Each rollout phase collects 2,048 environment steps across multiple trajectories... The algorithm divides 2,048 steps into 32 mini-batches... The optimization process runs for 10 epochs... Each training run consists of exactly 1 million environment steps.