reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DHMoE: Diffusion Generated Hierarchical Multi-Granular Expertise for Stock Prediction

Authors: Weijun Chen, Yanze Wang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on three stock trading datasets reveal that DHMo E outperforms state-of-the-art methods in terms of both cumulative and riskadjusted returns.
Researcher Affiliation	Academia	1School of Computer Science, Peking University, Beijing, China 2Wangxuan Institute of Computer Technology, Peking University, Beijing, China
Pseudocode	No	The paper describes methods and equations in detail but does not present any explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement about open-sourcing the code, nor does it include a link to a code repository.
Open Datasets	Yes	The stock news data for NASDAQ and NYSE are sourced from the Kaggle2. We collect some missing daily quote data of all stocks including normalized opening-high-low-closing prices (OHLC) and trading volumes from professional Wind-Financial Terminal3. The third dataset (Huang et al. 2018) Ashare&HK collects 162,976 news headlines from major financial websites in Chinese.
Dataset Splits	Yes	Table 2: Dataset statistics. Datasets Stocks Train Days Valid Days Test Days NASDAQ 112 2014/6/5-2018/6/4 2018/6/5-2019/6/4 2019/6/5-2020/6/5 NYSE 127 2014/6/5-2018/6/4 2018/6/5-2019/6/4 2019/6/5-2020/6/5 Ashare&HK 80 2014/1/1-2014/9/30 2014/10/1-2014/12/31 2015/1/1-2015/12/31
Hardware Specification	Yes	We conduct experiments on four GeForce RTX 3090 GPUs by Adam W optimizer (Loshchilov and Hutter 2019) for 30 epochs, the learning rate is set to 1e-3, and the batch size is set to 4.
Software Dependencies	No	Our model is implemented with Py Torch. We conduct experiments on four Ge Force RTX 3090 GPUs by Adam W optimizer (Loshchilov and Hutter 2019).
Experiment Setup	Yes	The hidden dimension F is searched within {5, 10, 20, 30, 40, 50} and finally set to 20. For DIT, we investigate the number of blocks, the embedded dimension, and the attention heads within the sets {1, 2, 4, 6, 8, 10}, {32, 64, 128, 256, 512}, and {2, 4, 6, 8, 16}. These parameters are determined to be 4 for the number of blocks, 256 for the embedded dimension, and 4 for the number of attention heads. The weighting coefficient λ and α are searched within {2, 4, 6, 8, 10, 12} and {1, 2, 3, 4, 5, 6}, are finally set to 8 and 4, respectively. Besides, we adopt the following quadratic schedule for variance schedule: βk = ( K k K 1 β1 + k 1 K 1 βK)2. We set the minimum noise level β1 = 0.0001 and search the number of the diffusion step K and the maximum noise level βK from a given parameter space (K {50, 100, 200}), and βK {0.1, 0.2, 0.3, 0.4}). The K and βK are set to 100 and 0.2. The number of the middle-level and bottomlevel experts a, b are searched within {3, 6, 9, 12, 15, 18} and {3, 6, 9, 12, 15, 18} and are finally set to 9 and 6, respectively. We conduct experiments on four Ge Force RTX 3090 GPUs by Adam W optimizer (Loshchilov and Hutter 2019) for 30 epochs, the learning rate is set to 1e-3, and the batch size is set to 4.