reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Removing Bias and Incentivizing Precision in Peer-grading

Authors: Anujit Chakraborty, Jatin Jindal, Swaprava Nath

JAIR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Data from our classroom experiments is consistent with our theoretical assumptions and show that PEQA outperforms the popular median mechanism, which is used in several massive open online courses (MOOCs).
Researcher Affiliation	Collaboration	Anujit Chakraborty EMAIL University of California Davis; Jatin Jindal EMAIL Google (India); Swaprava Nath EMAIL Indian Institute of Technology Bombay
Pseudocode	Yes	Algorithm 1 PEQA mechanism; Algorithm 2 PEQA
Open Source Code	Yes	All data and code are available via: https://www.cse.iitb.ac.in/~swaprava/papers/Codes_Peer_Grading.zip
Open Datasets	Yes	All data and code of this paper are available at: https://www.cse.iitb.ac.in/~swaprava/papers/Codes_Peer_Grading.zip
Dataset Splits	Yes	We ran two experimental sessions: one with the median scoring mechanism (27 students), another with the PEQA mechanism (42 students). ... We partitioned each quiz into three sub-quizzes ... In every round, the students were asked to peer-grade five sub-quizzes (each corresponding to one of five of her anonymous peers). ... We randomly chose two of the five sub-quizzes that Median subjects graded in each round, and treated them as probes, and the rest as non-probes.
Hardware Specification	No	We conducted both sessions during the weekly Prog101 labs, that happen in a large computer lab. The paper does not specify particular CPU/GPU models or other detailed hardware components used.
Software Dependencies	No	The paper does not mention any specific software dependencies with version numbers (e.g., library names with versions).
Experiment Setup	Yes	The PEQA performance score on each question you have graded that is worth x points, is assigned on the scale of [0, x/2]. ... The relative weight α that an instructor assigns in PEQA (see Step 7 of the computet function in Algorithm 1) on the peer-grading performance score, determines what percentage of total grades come from own exam-score versus completing the peer-grading exercise ... In Section 6, we extend our analysis to how PEQA deals with social welfare in a world where increasing reliability is costly to the grader. ... we assume that each paper has a single question. All graders face the same reliability-cost function c while grading that paper/question. ... We paid students by the relative ranking of their total scores in the class, in both the sessions. The students who ranked in the first quartile of the total scores received M 650, the next three quartiles received M 450, M 250, and M 50 respectively. They also received a show-up fee of M 50, irrespective of their total score.