Learning to Rank with Top-$K$ Fairness

Authors: Boyang Zhang, Quanqi Hu, Mingxuan Sun, Qihang Lin, Tianbao Yang

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To empirically validate the effectiveness of our method, we conduct a comprehensive set of experiments using popular benchmark datasets. The experimental results demonstrate that our method not only achieves high ranking accuracy but also significantly alleviates exposure disparities at top-K positions when compared to several state-of-the-art methods.
Researcher Affiliation Academia Boyang Zhang EMAIL Computer Science & Engineering, Louisiana State University Quanqi Hu EMAIL Computer Science & Engineering, Texas A&M University Mingxuan Sun EMAIL Computer Science & Engineering, Louisiana State University Qihang Lin EMAIL Business Analytics, University of Iowa Tianbao Yang EMAIL Computer Science & Engineering, Texas A&M University
Pseudocode Yes Algorithm 1 Stochastic Optmization of top-K Ranking with Exposure Disparity: KSO-RED 1: for t = 0, . . . , T 1 do 2: Draw sample batches B S and let BQ be the set of q s in B. 3: For each q BQ, draw sample batches Bq Sq, Bq a Sa, Bq b Sb 4: for (q, xq i ) B do 5: Compute ˆgq,i(wt) and u(t+1) q,i . 6: end for 7: for q BQ do 8: Compute ˆgq,a(wt), ˆgq,b(wt), ˆgq(wt), u(t+1) q,a , u(t+1) q,b , u(t+1) q , sq,t+1, vq,t+1 and λq,t+1. 9: end for 10: Compute Gt 1 and Gt 2 according to (18) and (22). 11: Update zt+1 = (1 γ5)zt + γ5(Gt 1 + CGt 2) 12: Update wt+1 = wt η1zt+1 13: end for
Open Source Code Yes KSO-RED: Our top-K Stochastic Optimization for both top-K NDCG and top-K Ranking Exposure Disparity defined in 11. The code is available at: Git Hub repository.
Open Datasets Yes Movie Lens20M (Harper & Konstan, 2015): This dataset comprises 20 million ratings from 138,000 users across 27,000 movies. ... Netflix Prize dataset (Bennett et al., 2007): Originally containing 100 million ratings for 17,770 movies from 480,189 users, we use a random subset of 20 million ratings for computational feasibility, maintaining a similar structure including movie name, genre, and year.
Dataset Splits Yes For model training and evaluation, we adopt a conventional split of training, validation, and test as in Wang et al. (2020); Qiu et al. (2022). Specifically, we adopt the testing protocol by sampling 5 rated and 300 unrated items per user to evaluate the NDCG and fairness metrics, whereas the training employs a similar protocol as in Wang et al. (2020).
Hardware Specification Yes Our main experiments were conducted on a system equipped with the following hardware: 24-core Intel CPU 96 GB of memory 1 NVIDIA V100S GPU (with 32 GB memory) 1.5 TB SSD drive
Software Dependencies No The paper mentions using the Neu MF model (He et al., 2017) and refers to existing frameworks but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes Initially, the model undergoes a 20-epoch pre-training with a learning rate of 0.001 and a batch size of 256. Subsequent fine-tuning reinitializes the last layer, adjusting the learning rate to 0.0004 and applying a weight decay of 1 10 7 over 120 epochs with a learning rate reduction by a factor of 0.25 after 60 epochs. To streamline our experiment, we leverage the tuned results from K-SONG and adopt the best value for the hyper-parameter γ0, which is set to 0.3, serving as the base model hyper-parameter. The averaging parameters γ1, γ2, and γ3 are tuned from the set of {0.2, 0.6, 1}.