reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Concept Graphs from Online Educational Data

Authors: Hanxiao Liu, Wanli Ma, Yiming Yang, Jaime Carbonell

JAIR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on our newly collected datasets of courses from MIT, Caltech, Princeton and CMU show promising results.
Researcher Affiliation	Academia	Hanxiao Liu EMAIL Wanli Ma EMAIL Yiming Yang EMAIL Jaime Carbonell EMAIL School of Computer Science Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 USA
Pseudocode	Yes	Algorithm 1 CGL.Rank with Nestrerov s Accelerated Gradient Descent Algorithm 2 Sparse CGL.Rank with Accelerated PGD Algorithm 3 trans-CGL.Rank with accelerated GD
Open Source Code	No	The paper does not provide concrete access to source code. It discusses algorithms and their efficiency but does not explicitly state that the code is open-source or provide a link to a repository for the methodology described in this paper.
Open Datasets	Yes	We collected course listings, including course descriptions and available prerequisite structure from MIT Open Course Ware, Caltech, CMU and Princeton2. The datasets are available at http://nyc.lti.cs.cmu.edu/teacher/dataset/
Dataset Splits	Yes	We used one third of the data for testing, and the remaining two thirds for training and validation. We conducted 5-fold cross validation on the training two-thirds, i.e., trained the model on 80% of the training/validation dataset, and tuned extra parameters on the remaining 20%.
Hardware Specification	Yes	We tested the eﬃciency of our proposed algorithms (based on the optimization formulation after variable reduction) on a single machine with an Intel i7 8-core processor and 32GB RAM.
Software Dependencies	No	The paper mentions various algorithms and methods (e.g., SVM algorithms, accelerated gradient descent, coordinate descent) but does not provide specific version numbers for any software libraries, frameworks, or solvers used in the implementation.
Experiment Setup	Yes	We set k = 100 in our experiments based on cross validation. Via cross validation, we have found k = 1 (1NN) works best for this problem on the current datasets. CGL.Rank with gradient descent took 37.3 minutes and 1490 iterations to reach the convergence rate of 10-3. To achieve the same objective value, the accelerated gradient descent took 3.08 minutes with 401MB memory at 103 iterations, and the inexact Newton method took only 43.4 seconds with 587MB memory. for sparse CGL, it took the accelerated proximal gradient method 2.07 minutes to reach the convergence rate of 10-3 on MIT with 3.9GB peak memory consumption.