The Adaptive Q-Network for Recommendation Tasks with Dynamic Item Space
Authors: Jianxiang Zhu, Dandan Lai, Zhongcui Ma, Yaxin Peng
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our approach has achieved state-of-the-art performance in the dynamic recommendation task. |
| Researcher Affiliation | Academia | Jianxiang Zhu1, Dandan Lai1, Zhongcui Ma1, Yaxin Peng1,2,* 1the Department of Mathematics, College of Sciences, Shanghai University, Shanghai 200444, China 2the School of Future Technology, Shanghai University, Shanghai 200444, China EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Adaptive Q-Network Training process: Require: Item set in the training environment Itrain, the weights of pre-trained embedding layer. Output: the weights of CEQN. 1: Initialize all trainable parameters in the statecharacteristic value function CEQN, load and freeze the weights of the pre-trained embedding layer as function f. 2: Initialize replay buffer B. 3: Project item set Itrain on the characteristic set Vtrain through f. 4: for each iteration do 5: Apply current behaviour policy πb train through Eq. (2), collect and store samples to B. 6: Sample mini-batch (st, vt, rt, st+1) from B. 7: Update CEQN according Eq. (3). 8: end for Testing process: Require: Item set in the test environment Itest, the weights of pre-trained embedding layer and weights of CEQN. 1: Project item set Itest on the characteristic space Vtest through f. 2: Apply current policy πtest based on Eq. (1). |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code for the described methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | We conduct experiments on two representative datasets Movie Lens1 and Kuai Rec2 (Gao et al. 2022). 1https://grouplens.org/datasets/movielens/1m/ 2https://kuairec.com/ |
| Dataset Splits | Yes | We primarily assess the dynamic recommendation task, and the ratio of items in the training environment to the testing environment is 0.5. A ratio of 0.5 ensures that the number of items in the training set is similar to that in the test set. Additionally, we provide settings for other ratios in ablation studies on task setting. |
| Hardware Specification | No | The paper mentions running experiments but does not specify any particular hardware details such as GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper mentions using 'Open AI Gymnasium (Brockman et al. 2016)' but does not specify any version numbers for this or other software components. |
| Experiment Setup | Yes | For the training stage, all policies are trained with 100 epochs. The policy is evaluated using 100 interaction trajectories after each epoch, and the maximum recommended sequence length is limited to 30. Following (Yu et al. 2024), we report the mean values of all metrics during the last 25% of training epochs to achieve a fair comparison. |