reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Fast-Adaptive Cognitive Diagnosis Framework for Computerized Adaptive Testing Systems

Authors: Yuanhao Liu, Yiya You, Shuo Liu, Hong Qian, Ying Qian, Aimin Zhou

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on realworld datasets show that, compared with existing static CDMs, FACD not only achieves superior prediction performance across various selection strategies with an improvement between roughly 5%–10% in the early stage of CAT, but also maintains a commendable inference speed.
Researcher Affiliation	Collaboration	1Shanghai Institute of AI Education, and School of Computer Science and Technology, East China Normal University, Shanghai 200062, China 2Game AI Center, Tencent Inc, Shenzhen 518057, China
Pseudocode	No	The paper describes its methodology using mathematical formulations and descriptive text, accompanied by figures. However, it does not include an explicitly labeled 'Pseudocode' or 'Algorithm' block detailing structured steps in a code-like format.
Open Source Code	Yes	The source code for our implementation can be found in https://github.com/BW297/FACD.
Open Datasets	Yes	The experiments are conducted on three real-world datasets, i.e., Frc Sub [De Carlo, 2011; Tat-suoka, 1984], EDMCup2023 [Ethan Prihar, 2023] and Neur IPS2020 [Wang et al., 2020b].
Dataset Splits	No	The paper mentions 'pretrain ratios pt = {0.0, 0.1, 0.2, 0.3}' for analyzing performance at different stages of the CAT process and discusses evaluating models on a 'test set', but it does not provide specific train/validation/test split percentages, sample counts, or explicit methodology for how the overall datasets were partitioned for model training.
Hardware Specification	No	The paper discusses 'Inference Time (s)' and compares 'CPU time per round' in Figure 3. However, it does not specify any particular CPU models, GPU models, or other hardware specifications used for running the experiments or training the models.
Software Dependencies	No	The paper mentions several algorithms and frameworks like 'Xavier [Glorot and Bengio, 2010]', 'Adam [Kingma and Ba, 2015]', 'GRU network [Chung et al., 2014]', and 'Light GCN [He et al., 2020]'. However, it does not provide specific version numbers for any software libraries (e.g., PyTorch, TensorFlow, scikit-learn) or programming languages used in the implementation.
Experiment Setup	Yes	The batch size is set within the range {32, 64, 128, 256}. The learning rate is chosen from {1e-3, 3e-3, 5e-3, 7e-3, 1e-2}. The dimensions of the MLPs for all methods are consistent, being 512 and 256. We study the impact of the hyperparameters on the dynamic graph layer, GRU Layer, embedding dimension and graph mask ratio on the EDMcup2023 dataset. Specially, the model achieves optimal performance when the number of graph layers is set at 1 and 2, the number of GRU layers is set to 3, the embedding dimension is set to 32 or 64 and the graph mask ratio is set to 0.1 and 0.2.