DHMoE: Diffusion Generated Hierarchical Multi-Granular Expertise for Stock Prediction

Authors: Weijun Chen, Yanze Wang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on three stock trading datasets reveal that DHMo E outperforms state-of-the-art methods in terms of both cumulative and riskadjusted returns.
Researcher Affiliation Academia 1School of Computer Science, Peking University, Beijing, China 2Wangxuan Institute of Computer Technology, Peking University, Beijing, China
Pseudocode No The paper describes methods and equations in detail but does not present any explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about open-sourcing the code, nor does it include a link to a code repository.
Open Datasets Yes The stock news data for NASDAQ and NYSE are sourced from the Kaggle2. We collect some missing daily quote data of all stocks including normalized opening-high-low-closing prices (OHLC) and trading volumes from professional Wind-Financial Terminal3. The third dataset (Huang et al. 2018) Ashare&HK collects 162,976 news headlines from major financial websites in Chinese.
Dataset Splits Yes Table 2: Dataset statistics. Datasets Stocks Train Days Valid Days Test Days NASDAQ 112 2014/6/5-2018/6/4 2018/6/5-2019/6/4 2019/6/5-2020/6/5 NYSE 127 2014/6/5-2018/6/4 2018/6/5-2019/6/4 2019/6/5-2020/6/5 Ashare&HK 80 2014/1/1-2014/9/30 2014/10/1-2014/12/31 2015/1/1-2015/12/31
Hardware Specification Yes We conduct experiments on four GeForce RTX 3090 GPUs by Adam W optimizer (Loshchilov and Hutter 2019) for 30 epochs, the learning rate is set to 1e-3, and the batch size is set to 4.
Software Dependencies No Our model is implemented with Py Torch. We conduct experiments on four Ge Force RTX 3090 GPUs by Adam W optimizer (Loshchilov and Hutter 2019).
Experiment Setup Yes The hidden dimension F is searched within {5, 10, 20, 30, 40, 50} and finally set to 20. For DIT, we investigate the number of blocks, the embedded dimension, and the attention heads within the sets {1, 2, 4, 6, 8, 10}, {32, 64, 128, 256, 512}, and {2, 4, 6, 8, 16}. These parameters are determined to be 4 for the number of blocks, 256 for the embedded dimension, and 4 for the number of attention heads. The weighting coefficient λ and α are searched within {2, 4, 6, 8, 10, 12} and {1, 2, 3, 4, 5, 6}, are finally set to 8 and 4, respectively. Besides, we adopt the following quadratic schedule for variance schedule: βk = ( K k K 1 β1 + k 1 K 1 βK)2. We set the minimum noise level β1 = 0.0001 and search the number of the diffusion step K and the maximum noise level βK from a given parameter space (K {50, 100, 200}), and βK {0.1, 0.2, 0.3, 0.4}). The K and βK are set to 100 and 0.2. The number of the middle-level and bottomlevel experts a, b are searched within {3, 6, 9, 12, 15, 18} and {3, 6, 9, 12, 15, 18} and are finally set to 9 and 6, respectively. We conduct experiments on four Ge Force RTX 3090 GPUs by Adam W optimizer (Loshchilov and Hutter 2019) for 30 epochs, the learning rate is set to 1e-3, and the batch size is set to 4.