AlphaQCM: Alpha Discovery in Finance with Distributional Reinforcement Learning
Authors: Zhoufan Zhu, Ke Zhu
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical applications to realworld datasets demonstrate that our Alpha QCM method significantly outperforms its competitors, particularly when dealing with large datasets comprising numerous stocks. ... We apply our Alpha QCM method to three real-world market datasets to assess its empirical performance, together with baseline methods such as the Alpha Gen method and GP-based methods. Extensive experimental results demonstrate that the Alpha QCM method consistently achieves the best performance, with Information Coefficient (IC) values of 8.49%, 9.55%, and 9.16% across the three datasets. Its superior performance is particularly evident when the dataset originates from a complex financial system. Finally, we conduct several ablation studies to investigate the contribution of each component in the Alpha QCM method. |
| Researcher Affiliation | Academia | Zhoufan Zhu 1 2 Ke Zhu 3 1Wang Yanan Institute for Studies in Economics, Xiamen University, Xiamen, China 2Department of Finance, Xiamen University, Xiamen, China 3Department of Statistics & Actuarial Science, School of Computing & Data Science, University of Hong Kong, Hong Kong. Correspondence to: Ke Zhu <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Pseduo code for calculating rt. |
| Open Source Code | Yes | The source code and related source of this work are available at https://github.com/Zhu Zhou Fan/ Alpha QCM for reproducibility. |
| Open Datasets | No | Our experiments are also conducted on Chinese A-share stock market datasets to capture the 20-day future stock returns. ... We consider three different stock pools: (1) the largest 300 stocks (CSI 300), (2) the largest 500 stocks (CSI 500), and (3) all stocks (Market) listed on the Shanghai and Shenzhen Stock Exchanges. ... we conduct additional experiments using the U.S. stock market dataset, specifically the largest 500 stocks (S&P 500). |
| Dataset Splits | Yes | Moreover, each dataset is split chronologically into a training set (2010/01/01 to 2019/12/31), a validation set (2020/01/01 to 2020/12/31), and a test set (2021/01/01 to 2022/12/31). |
| Hardware Specification | No | The paper does not explicitly mention any specific hardware (e.g., CPU, GPU models, memory, or cloud computing instances with detailed specifications) used for running the experiments. |
| Software Dependencies | No | Following Yu et al. (2023), we implement the MLP, XGBoost, and Light GBM methods using the open-source library Qlib (Yang et al., 2020), with pre-specified hyperparameters. ... Optimizer Adam. |
| Experiment Setup | Yes | Table E.6. Additional hyperparameters. Hyperparameter Values Min history to start learning 10,000 ϵ-greedy 0.01 Memory size 100,000 Learning rate 5e-5 Optimizer Adam Online network update interval (replay period) 1 Target network update interval 5,000 Batch size 128 K (length of τ in the IQN algorithm) 64 K (length of τ in the IQN algorithm) 64 κ (constant in the Huber loss) 1.0 η (constant in the prior probability) 0.5 λ (tuning parameter) 0.5, 1, 2 P (Alpha pool size) 10, 20, 50, 100 Total step 250,000 (P = 10), 300,000 (P = 20), 350,000 (P = 50), 400,000 (P = 100). ... Specifically, in both the Q network and quantile network, the LSTM feature extractor ψ( ) has a 2-layer structure with a hidden layer dimension of 128 with dropout rate of 0.1, and the fully connected heads have two hidden layers of 64 dimensions. Moreover, the τ-embedding network maps each quantile level into a 64-dimensional embedding, as defined in Dabney et al. (2018a). |