DHMoE: Diffusion Generated Hierarchical Multi-Granular Expertise for Stock Prediction
Authors: Weijun Chen, Yanze Wang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on three stock trading datasets reveal that DHMo E outperforms state-of-the-art methods in terms of both cumulative and riskadjusted returns. |
| Researcher Affiliation | Academia | 1School of Computer Science, Peking University, Beijing, China 2Wangxuan Institute of Computer Technology, Peking University, Beijing, China |
| Pseudocode | No | The paper describes methods and equations in detail but does not present any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing the code, nor does it include a link to a code repository. |
| Open Datasets | Yes | The stock news data for NASDAQ and NYSE are sourced from the Kaggle2. We collect some missing daily quote data of all stocks including normalized opening-high-low-closing prices (OHLC) and trading volumes from professional Wind-Financial Terminal3. The third dataset (Huang et al. 2018) Ashare&HK collects 162,976 news headlines from major financial websites in Chinese. |
| Dataset Splits | Yes | Table 2: Dataset statistics. Datasets Stocks Train Days Valid Days Test Days NASDAQ 112 2014/6/5-2018/6/4 2018/6/5-2019/6/4 2019/6/5-2020/6/5 NYSE 127 2014/6/5-2018/6/4 2018/6/5-2019/6/4 2019/6/5-2020/6/5 Ashare&HK 80 2014/1/1-2014/9/30 2014/10/1-2014/12/31 2015/1/1-2015/12/31 |
| Hardware Specification | Yes | We conduct experiments on four GeForce RTX 3090 GPUs by Adam W optimizer (Loshchilov and Hutter 2019) for 30 epochs, the learning rate is set to 1e-3, and the batch size is set to 4. |
| Software Dependencies | No | Our model is implemented with Py Torch. We conduct experiments on four Ge Force RTX 3090 GPUs by Adam W optimizer (Loshchilov and Hutter 2019). |
| Experiment Setup | Yes | The hidden dimension F is searched within {5, 10, 20, 30, 40, 50} and finally set to 20. For DIT, we investigate the number of blocks, the embedded dimension, and the attention heads within the sets {1, 2, 4, 6, 8, 10}, {32, 64, 128, 256, 512}, and {2, 4, 6, 8, 16}. These parameters are determined to be 4 for the number of blocks, 256 for the embedded dimension, and 4 for the number of attention heads. The weighting coefficient λ and α are searched within {2, 4, 6, 8, 10, 12} and {1, 2, 3, 4, 5, 6}, are finally set to 8 and 4, respectively. Besides, we adopt the following quadratic schedule for variance schedule: βk = ( K k K 1 β1 + k 1 K 1 βK)2. We set the minimum noise level β1 = 0.0001 and search the number of the diffusion step K and the maximum noise level βK from a given parameter space (K {50, 100, 200}), and βK {0.1, 0.2, 0.3, 0.4}). The K and βK are set to 100 and 0.2. The number of the middle-level and bottomlevel experts a, b are searched within {3, 6, 9, 12, 15, 18} and {3, 6, 9, 12, 15, 18} and are finally set to 9 and 6, respectively. We conduct experiments on four Ge Force RTX 3090 GPUs by Adam W optimizer (Loshchilov and Hutter 2019) for 30 epochs, the learning rate is set to 1e-3, and the batch size is set to 4. |