Logic-Q: Improving Deep Reinforcement Learning-based Quantitative Trading via Program Sketch-based Tuning

Authors: Zhiming Li, Junzhe Jiang, Yushi Cao, Aixin Cui, Bozhi Wu, Bo Li, Yang Liu, Danny Dongning Sun

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluations of two popular quantitative trading tasks demonstrate that Logic-Q can significantly improve the performance of previous state-of-the-art DRL trading strategies.
Researcher Affiliation Academia 1Nanyang Technological University, Singapore 2Hong Kong Polytechnic University, Hong Kong, China 3Chinese University of Hong Kong, Hong Kong, China 4Singapore Management University, Singapore 5Peng Cheng Lab, Shenzhen, China EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the 'Program Sketch' with conditional statements in Figure 3 as a logical description of market trends, but it does not present this or any other methodology in a formal pseudocode or algorithm block.
Open Source Code No No explicit statement or link for the paper's source code is provided. While it mentions supplementary materials, it does not specify if code is included there.
Open Datasets Yes We conduct all experiments on the historical transaction data of the stocks of the China A-shares market provided by Fang et al. (Fang et al. 2021). The dataset consists of minute-level intraday price-volume market data of CHINA SECURITIES INDEX 800 (CSI 800) constituent stocks and the order amount of each instrument of each trading day.
Dataset Splits No The paper mentions using training data, testing context sequence, and validation data, and that results are 'averaged over 5 random seeds', but it does not provide specific details on the split percentages, sample counts, or methodology for creating these splits within the main text.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running the experiments.
Software Dependencies No The paper refers to several DRL algorithms and methods (e.g., PPO, DDPG, A2C) but does not specify the versions of any underlying software libraries, frameworks, or programming languages used for implementation.
Experiment Setup No The paper states that 'The results of the evaluated methods are all averaged over 5 random seeds' and mentions the optimization objectives for different tasks, but it defers the 'detailed experimental setup' to supplementary materials and does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations in the main text.