Contrastive Representation for Interactive Recommendation

Authors: Jingyu Li, Zhiyong Feng, Dongxiao He, Hongqi Chen, Qinghang Gao, Guoli Wu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments have been carried out to show our method s superior improvement on the sample efficiency while training an DRL-based IR agent. Extensive experiments conducted on Virtual-Taobao simulation environment and a simulator based on ml-1m dataset further verify the effectiveness of the whole proposed CRIR.
Researcher Affiliation Academia College of Intelligence and Computing, Tianjin University EMAIL
Pseudocode No The paper describes the methods through text and figures (e.g., Figure 1 and Figure 2) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes So we use Virtual-Taobao (Shi et al. 2019) and a dataset-oriented simulator based on ML-1M 1 to evaluated CRIR and baseline methods. The footnote 1 indicates the link: https://grouplens.org/datasets/movielens/1m/
Dataset Splits No The paper describes experiments conducted in interactive simulation environments (Virtual-Taobao and an ML-1M based simulator) under 'cold-start settings'. It does not specify fixed train/validation/test dataset splits, which is common for DRL settings where data is often sampled from a replay buffer.
Hardware Specification No The paper does not provide any specific details regarding the hardware used for running the experiments (e.g., specific GPU or CPU models, memory, or cloud computing instances).
Software Dependencies No The paper mentions DRL algorithms and frameworks like DDPG, PER, SAC, CRR, PPO, DRR, and NICF, but it does not specify any software libraries or their version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed to replicate the experiments.
Experiment Setup Yes In our implementation, we use DDPG (Lillicrap et al. 2016) along with Priority Experience Replay mechanism (PER) (Schaul et al. 2015) as our DRL backbone for its effectiveness and stability. The length for an interaction history is n with max sequence length M (n M). For the first experiment (RQ2-1), we set the PRCL frequencies in {0, 0.25, 0.5, 0.75, 1.0}. We set all the coefficients to w = (1/ T/2 ) P T/2 i=2 (1/ i) 0.3183 where T = 50 is the max sequence length for state representation. We choose γ {0, 0.5, 1.0}.