KIPPO: Koopman-Inspired Proximal Policy Optimization

Authors: Andrei Cozma, Landon Harris, Hairong Qi

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results demonstrate consistent improvements over the PPO baseline with 6 60% increased performance while reducing variability by up to 91% when evaluated on various continuous control tasks.
Researcher Affiliation Academia Andrei Cozma , Landon Harris and Hairong Qi University of Tennessee, Knoxville EMAIL, EMAIL
Pseudocode No We refer readers to the supplementary materials for complete implementation details and pseudocode.
Open Source Code Yes Extended version with comprehensive appendices containing ablation studies, hyperparameter analyses, pseudocode, and implementation details is available at: https://andreicozma.com/KIPPO.
Open Datasets Yes We evaluate six continuous control environments from Gymnasium [Towers et al., 2023] using Mu Jo Co [Todorov et al., 2012] and Box2D [Catto, 2007]
Dataset Splits No The paper does not describe traditional training/test/validation dataset splits for static datasets, as it uses reinforcement learning environments where data is generated dynamically through interaction. It mentions mini-batches for optimization: "The algorithm divides 2,048 steps into 32 mini-batches".
Hardware Specification No Hardware specifications and reference runtime are provided in the supplementary material.
Software Dependencies No The paper mentions using "PPO and RPO implementations from the Clean RL library [Huang et al., 2022]" but does not specify version numbers for Clean RL or other software components.
Experiment Setup Yes Each rollout phase collects 2,048 environment steps across multiple trajectories... The algorithm divides 2,048 steps into 32 mini-batches... The optimization process runs for 10 epochs... Each training run consists of exactly 1 million environment steps.