Removing Bias and Incentivizing Precision in Peer-grading

Authors: Anujit Chakraborty, Jatin Jindal, Swaprava Nath

JAIR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Data from our classroom experiments is consistent with our theoretical assumptions and show that PEQA outperforms the popular median mechanism, which is used in several massive open online courses (MOOCs).
Researcher Affiliation Collaboration Anujit Chakraborty EMAIL University of California Davis; Jatin Jindal EMAIL Google (India); Swaprava Nath EMAIL Indian Institute of Technology Bombay
Pseudocode Yes Algorithm 1 PEQA mechanism; Algorithm 2 PEQA
Open Source Code Yes All data and code are available via: https://www.cse.iitb.ac.in/~swaprava/papers/Codes_Peer_Grading.zip
Open Datasets Yes All data and code of this paper are available at: https://www.cse.iitb.ac.in/~swaprava/papers/Codes_Peer_Grading.zip
Dataset Splits Yes We ran two experimental sessions: one with the median scoring mechanism (27 students), another with the PEQA mechanism (42 students). ... We partitioned each quiz into three sub-quizzes ... In every round, the students were asked to peer-grade five sub-quizzes (each corresponding to one of five of her anonymous peers). ... We randomly chose two of the five sub-quizzes that Median subjects graded in each round, and treated them as probes, and the rest as non-probes.
Hardware Specification No We conducted both sessions during the weekly Prog101 labs, that happen in a large computer lab. The paper does not specify particular CPU/GPU models or other detailed hardware components used.
Software Dependencies No The paper does not mention any specific software dependencies with version numbers (e.g., library names with versions).
Experiment Setup Yes The PEQA performance score on each question you have graded that is worth x points, is assigned on the scale of [0, x/2]. ... The relative weight α that an instructor assigns in PEQA (see Step 7 of the computet function in Algorithm 1) on the peer-grading performance score, determines what percentage of total grades come from own exam-score versus completing the peer-grading exercise ... In Section 6, we extend our analysis to how PEQA deals with social welfare in a world where increasing reliability is costly to the grader. ... we assume that each paper has a single question. All graders face the same reliability-cost function c while grading that paper/question. ... We paid students by the relative ranking of their total scores in the class, in both the sessions. The students who ranked in the first quartile of the total scores received M 650, the next three quartiles received M 450, M 250, and M 50 respectively. They also received a show-up fee of M 50, irrespective of their total score.