reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning to Rank with Top-$K$ Fairness

Authors: Boyang Zhang, Quanqi Hu, Mingxuan Sun, Qihang Lin, Tianbao Yang

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To empirically validate the effectiveness of our method, we conduct a comprehensive set of experiments using popular benchmark datasets. The experimental results demonstrate that our method not only achieves high ranking accuracy but also significantly alleviates exposure disparities at top-K positions when compared to several state-of-the-art methods.
Researcher Affiliation	Academia	Boyang Zhang EMAIL Computer Science & Engineering, Louisiana State University Quanqi Hu EMAIL Computer Science & Engineering, Texas A&M University Mingxuan Sun EMAIL Computer Science & Engineering, Louisiana State University Qihang Lin EMAIL Business Analytics, University of Iowa Tianbao Yang EMAIL Computer Science & Engineering, Texas A&M University
Pseudocode	Yes	Algorithm 1 Stochastic Optmization of top-K Ranking with Exposure Disparity: KSO-RED 1: for t = 0, . . . , T 1 do 2: Draw sample batches B S and let BQ be the set of q s in B. 3: For each q BQ, draw sample batches Bq Sq, Bq a Sa, Bq b Sb 4: for (q, xq i ) B do 5: Compute ˆgq,i(wt) and u(t+1) q,i . 6: end for 7: for q BQ do 8: Compute ˆgq,a(wt), ˆgq,b(wt), ˆgq(wt), u(t+1) q,a , u(t+1) q,b , u(t+1) q , sq,t+1, vq,t+1 and λq,t+1. 9: end for 10: Compute Gt 1 and Gt 2 according to (18) and (22). 11: Update zt+1 = (1 γ5)zt + γ5(Gt 1 + CGt 2) 12: Update wt+1 = wt η1zt+1 13: end for
Open Source Code	Yes	KSO-RED: Our top-K Stochastic Optimization for both top-K NDCG and top-K Ranking Exposure Disparity defined in 11. The code is available at: Git Hub repository.
Open Datasets	Yes	Movie Lens20M (Harper & Konstan, 2015): This dataset comprises 20 million ratings from 138,000 users across 27,000 movies. ... Netflix Prize dataset (Bennett et al., 2007): Originally containing 100 million ratings for 17,770 movies from 480,189 users, we use a random subset of 20 million ratings for computational feasibility, maintaining a similar structure including movie name, genre, and year.
Dataset Splits	Yes	For model training and evaluation, we adopt a conventional split of training, validation, and test as in Wang et al. (2020); Qiu et al. (2022). Specifically, we adopt the testing protocol by sampling 5 rated and 300 unrated items per user to evaluate the NDCG and fairness metrics, whereas the training employs a similar protocol as in Wang et al. (2020).
Hardware Specification	Yes	Our main experiments were conducted on a system equipped with the following hardware: 24-core Intel CPU 96 GB of memory 1 NVIDIA V100S GPU (with 32 GB memory) 1.5 TB SSD drive
Software Dependencies	No	The paper mentions using the Neu MF model (He et al., 2017) and refers to existing frameworks but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	Initially, the model undergoes a 20-epoch pre-training with a learning rate of 0.001 and a batch size of 256. Subsequent fine-tuning reinitializes the last layer, adjusting the learning rate to 0.0004 and applying a weight decay of 1 10 7 over 120 epochs with a learning rate reduction by a factor of 0.25 after 60 epochs. To streamline our experiment, we leverage the tuned results from K-SONG and adopt the best value for the hyper-parameter γ0, which is set to 0.3, serving as the base model hyper-parameter. The averaging parameters γ1, γ2, and γ3 are tuned from the set of {0.2, 0.6, 1}.