Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Fast Second-Order Online Kernel Learning Through Incremental Matrix Sketching and Decomposition

Authors: Dongxie Wen, Xiao Zhang, Zhewei Wei, Chenping Hou, Shuai Li, Weinan Zhang

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate the performance of our method through extensive experiments conducted on real-world datasets, demonstrating its superior scalability and robustness against adversarial attacks. In this section, we conduct experiments to evaluate the performance of FORKS on several datasets.
Researcher Affiliation Academia 1Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 2National University of Defense Technology 3Shanghai Jiao Tong University EMAIL, EMAIL, EMAIL
Pseudocode Yes We summarize the above stages into Algorithm 1. The incremental update of the feature mapping, combined with the reset mechanism, ensures that we can effectively capture changes in user preferences over time. We summarize the algorithm and provide the pseudo-code for TISVD in Appendix B.2.
Open Source Code No The paper does not provide an explicit statement or a link to open-source code for the methodology described.
Open Datasets Yes In this section, we conduct experiments to evaluate the performance of FORKS on several datasets. The details of datasets and experimental setup are presented in Appendix D.1, D.2. We use Kuai Rec, which is a real-world dataset collected from the recommendation logs of the video-sharing mobile app Kuaishou [Gao et al., 2022]. Table 1 lists the classification benchmarks used: german, svmguide3, spambase, codrna, w7a, ijcnn1.
Dataset Splits No The paper describes its evaluation in an online learning context, where data arrives in a stream and processes of training and testing are intermixed. It does not provide explicit training, testing, or validation dataset splits in the traditional sense, such as percentage-based splits or distinct sample counts for static sets.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments within the provided text.
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup Yes For all the algorithms, we set a fixed budget B = 50 for small datasets (N <= 10000) and B = 100 for large datasets. Furthermore, we set buffer size B = 2B, γ = 0.2, sp = B, sm = γsp, θ = 0.3, and update cycle ρ = θN in Ske GD and FORKS if not specially specified. For algorithms with rank-k approximation, we uniformly set k = 0.1B. Besides, we use the same experimental settings for FOGD (feature dimension = 4B). We set γ = 0.2, sp = 0.75B, sm = γsp, k = 0.1B and update cycle ρ = 0.005(N B) in Ske GD and FORKS. Inspired by the adversarial settings...We set b = 500, r = 10 for codrna-1 and german-1.