Data Center Cooling System Optimization Using Offline Reinforcement Learning
Authors: Xianyuan Zhan, Xiangyu Zhu, Peng Cheng, Xiao Hu, Ziteng He, Hanfei Geng, Jichao Leng, Huiwen Zheng, Chenhui Liu, Tianshun Hong, Yan Liang, Yunxin Liu, Feng Zhao
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we present a novel physics-informed offline reinforcement learning (RL) framework for energy efficiency optimization of DC cooling systems. ... Our framework has been successfully deployed and verified in a large-scale production DC for closed-loop control of its air-cooling units (ACUs). We conducted a total of 2000 hours of short and long-term experiments in the production DC environment. The results show that our method achieves 14 21% energy savings in the DC cooling system, without any violation of the safety or operational constraints. We have also conducted a comprehensive evaluation of our approach in a real-world DC testbed environment. |
| Researcher Affiliation | Collaboration | 1 Institute for AI Industry Research, Tsinghua University 2 Shanghai Artificial Intelligence Laboratory 3 Global Data Solutions Co., Ltd. EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm pseudocode. The pseudocode of our proposed physics-informed offline RL framework can be found in Algorithm 1. |
| Open Source Code | No | The paper mentions developing a 'full-function software system' and 'deployment-friendly software system' to facilitate validation and deployment, but it does not provide any explicit statement or link indicating that the source code for the methodology is open-source or publicly available. |
| Open Datasets | No | We collected about 20 months historical operational data from the logging system... Similarly, for Server Room B, we collected historical data over 15 months... We collected the historical operational data over 61 days... |
| Dataset Splits | No | The paper mentions collecting historical operational data and using it to 'train and validate our model on real-world data'. However, it does not specify any particular training, validation, or testing splits (e.g., percentages, sample counts, or specific methodologies for partitioning the data). |
| Hardware Specification | No | The paper describes the 'real-world small-scale DC testbed environment, which contains 22 servers and an inter-column air conditioner as the ACU'. It also mentions 'a Kubernetes (k8s) cluster architecture' and 'compressor-based ACU'. For the commercial data center, it refers to 'server rooms' and 'ACUs' but does not specify details like CPU or GPU models, memory, or specific ACU models used for computational tasks. |
| Software Dependencies | No | The software framework for the testbed 'employs a Kubernetes (k8s) cluster architecture and is implemented under the Cent OS Stream 9 operating system'. It also mentions 'data collection and database management system using Influx DB and Telegraf'. While CentOS Stream 9 is a specific version, Kubernetes (k8s), InfluxDB, and Telegraf are mentioned without specific version numbers for the applications themselves. |
| Experiment Setup | Yes | Table 4: Hyperparameter details. This table lists specific values for Optimizer type (Adam), Learning rate (3e-4), Weight decay (1e-5), Channel number (6), GNN hidden layers (2), TTDM GNN hidden units (256), Forward / reverse model hidden layers (2), Forward / reverse model hidden units (128), Fusion layers (2), Fusion layer units (128), Weight of ℓT sym and ℓrec (1), Weight of ℓrvs and ℓfwd (0.1), α (Tuned in the range of [2.5,10]), Discount factor γ (0.99), Target update rate (0.005), Policy noise (0.2), Critic neural network layer width (512), Actor neural network layer width (512), Actor learning rate (3e-4), Critic learning rate (3e-4), Policy noise clipping (0.5), Policy update frequency (2), Number of iterations (5e5). |