reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Knowledge Is Power: Harnessing Large Language Models for Enhanced Cognitive Diagnosis

Authors: Zhiang Dong, Jingyuan Chen, Fei Wu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on several real-world datasets demonstrate the effectiveness of our proposed framework. Our code and datasets are available at https://github.com/ Player Dza/KCD.
Researcher Affiliation	Academia	Zhejiang University EMAIL
Pseudocode	No	The paper describes the methodology in detail, including framework overview, LLM diagnosis, and cognitive level alignment, but does not present a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	Our code and datasets are available at https://github.com/ Player Dza/KCD.
Open Datasets	Yes	Experiments on several public datasets with different CDMs demonstrate the effectiveness of our framework. Our code and datasets are available at https://github.com/ Player Dza/KCD.
Dataset Splits	Yes	The datasets are divided into training, validation, and testing sets, with a ratio of 8:1:1.
Hardware Specification	Yes	All the experiments are conducted on a Ge Force RTX 3090 GPU.
Software Dependencies	Yes	We utilize Py Torch to implement both the baseline methods and our proposed KCD framework. For the baseline models, We use the default hyper-parameters as stated in their papers and for KCD, we use the same hyper-parameter settings, such as training epoch, learning rate, and batch size. We employ Chat GPT to represent LLMs (specifically, gpt-3.5-turbo-16k) and textembedding-ada002 as the text embedding model.
Experiment Setup	Yes	We use the default hyper-parameters as stated in their papers and for KCD, we use the same hyper-parameter settings, such as training epoch, learning rate, and batch size. We employ Chat GPT to represent LLMs (specifically, gpt-3.5-turbo-16k) and textembedding-ada002 as the text embedding model. All the experiments are conducted on a Ge Force RTX 3090 GPU. We train the model on train set and at the end of each epoch, we evaluate the model on the validation set. The hyperparameter α, β, and λ was set to 0.04, 0.015, and 0.2.