reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Data Center Cooling System Optimization Using Offline Reinforcement Learning

Authors: Xianyuan Zhan, Xiangyu Zhu, Peng Cheng, Xiao Hu, Ziteng He, Hanfei Geng, Jichao Leng, Huiwen Zheng, Chenhui Liu, Tianshun Hong, Yan Liang, Yunxin Liu, Feng Zhao

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we present a novel physics-informed offline reinforcement learning (RL) framework for energy efficiency optimization of DC cooling systems. ... Our framework has been successfully deployed and verified in a large-scale production DC for closed-loop control of its air-cooling units (ACUs). We conducted a total of 2000 hours of short and long-term experiments in the production DC environment. The results show that our method achieves 14 21% energy savings in the DC cooling system, without any violation of the safety or operational constraints. We have also conducted a comprehensive evaluation of our approach in a real-world DC testbed environment.
Researcher Affiliation	Collaboration	1 Institute for AI Industry Research, Tsinghua University 2 Shanghai Artificial Intelligence Laboratory 3 Global Data Solutions Co., Ltd. EMAIL, EMAIL
Pseudocode	Yes	Algorithm pseudocode. The pseudocode of our proposed physics-informed offline RL framework can be found in Algorithm 1.
Open Source Code	No	The paper mentions developing a 'full-function software system' and 'deployment-friendly software system' to facilitate validation and deployment, but it does not provide any explicit statement or link indicating that the source code for the methodology is open-source or publicly available.
Open Datasets	No	We collected about 20 months historical operational data from the logging system... Similarly, for Server Room B, we collected historical data over 15 months... We collected the historical operational data over 61 days...
Dataset Splits	No	The paper mentions collecting historical operational data and using it to 'train and validate our model on real-world data'. However, it does not specify any particular training, validation, or testing splits (e.g., percentages, sample counts, or specific methodologies for partitioning the data).
Hardware Specification	No	The paper describes the 'real-world small-scale DC testbed environment, which contains 22 servers and an inter-column air conditioner as the ACU'. It also mentions 'a Kubernetes (k8s) cluster architecture' and 'compressor-based ACU'. For the commercial data center, it refers to 'server rooms' and 'ACUs' but does not specify details like CPU or GPU models, memory, or specific ACU models used for computational tasks.
Software Dependencies	No	The software framework for the testbed 'employs a Kubernetes (k8s) cluster architecture and is implemented under the Cent OS Stream 9 operating system'. It also mentions 'data collection and database management system using Influx DB and Telegraf'. While CentOS Stream 9 is a specific version, Kubernetes (k8s), InfluxDB, and Telegraf are mentioned without specific version numbers for the applications themselves.
Experiment Setup	Yes	Table 4: Hyperparameter details. This table lists specific values for Optimizer type (Adam), Learning rate (3e-4), Weight decay (1e-5), Channel number (6), GNN hidden layers (2), TTDM GNN hidden units (256), Forward / reverse model hidden layers (2), Forward / reverse model hidden units (128), Fusion layers (2), Fusion layer units (128), Weight of ℓT sym and ℓrec (1), Weight of ℓrvs and ℓfwd (0.1), α (Tuned in the range of [2.5,10]), Discount factor γ (0.99), Target update rate (0.005), Policy noise (0.2), Critic neural network layer width (512), Actor neural network layer width (512), Actor learning rate (3e-4), Critic learning rate (3e-4), Policy noise clipping (0.5), Policy update frequency (2), Number of iterations (5e5).