Policy Design for Two-sided Platforms with Participation Dynamics

Authors: Haruka Kiyohara, Fan Yao, Sarah Dean

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our control and game-theoretic findings warn against the use of the standard myopic-greedy" policy and shed light on the importance of provider-side considerations (i.e., effectively distributing exposure among provider groups) to improve social welfare via population growth. We also present a simple algorithm to optimize long-term social welfare by taking the population effects into account, and demonstrate its effectiveness in synthetic and real-data experiments. Our experiment code is available at https://github.com/sdean-group/dynamics-twosided-market.
Researcher Affiliation Academia 1Cornell University 2University of Virginia. Correspondence to: Haruka Kiyohara <EMAIL>, Sarah Dean <EMAIL>.
Pseudocode No The paper describes an algorithm in Section 5 with numbered steps under the subsection "Estimation the dynamics" (1. For t Tb, deploy some epsilon-greedy policy... 2. For t > Tb, update the dynamics models...), but it is not formatted as a distinct pseudocode block or explicitly labeled as "Algorithm" or "Pseudocode".
Open Source Code Yes Our experiment code is available at https://github.com/sdean-group/dynamics-twosided-market.
Open Datasets Yes Finally, we propose a simple algorithm that balances the pol-icy and population regrets by projecting the long term population that will result from the current viewer satisfaction and provider exposure. The proposed look-ahead policy optimizes the utility at the projected long term population instead of the immediate population. The synthetic and real-data experiments using the Kuai Rec dataset (Gao et al., 2022) demonstrate that the proposed algorithm works better than both myopic-greedy and uniform random policies in multiple configurations by better trading off the long and short term goals accounting for the population growth.
Dataset Splits No The paper describes how subgroups are created (clustering viewers and providers into K=L=20 subgroups based on embeddings) and how populations are initialized (sampling from normal distributions), but does not provide specific details on training, validation, or test dataset splits for evaluation.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies No The paper mentions "Py Torch (Paszke et al., 2019)" and "Sci Py (Virtanen et al., 2020)" with citations, but does not specify exact version numbers for these software libraries.
Experiment Setup Yes We first study the dynamics and the performance of the proposed method in a synthetic experiment. In this task, we use K = L = 20 subgroups. To define the base utility, we first sample 20-dimensional binary feature vectors (uk, cl) from a Bernoulli distribution for each viewer and provider group and let their inner products be the base utility bk,l = u k cl, (k, l) [K] [L]. Then we simulate the following concave dynamics: λk(z) = λ(max) k (σ(z/τ (λ) k ) 0.5), (12) where σ(z) := 1/(1 + exp( z)) is the sigmoid function, and λ( ) follows the upper half of the sigmoid function. ... We initialized the subgroups populations by sampling values from the normal distribution, so we have the majority and minority subgroups at t = 0. Specifically, we use two initializations: (1) a small population (λ N(20, 102)) and (2) a large one (λ N(100, 302)) to see how policies perform in both increasing and decreasing dynamics. ... We run the compared methods for 200 timesteps and report the results in Figure 2. ... The lookahead policy is computed with gradient ascent on the objective (Eq. (10)) for 100 iterations. ... In this experiment, we estimate the dynamics functions λ and f using regression. We use the model F(z) = a0(1 exp( a1(x a2)))+a3 due to its concavity and flexibility, and fit the params (a0, , a3) from interaction data as described in Section 5.1. Note that we add perturbations in the population dynamics ξt sampled from a normal distribution as ξt N(0.0, 0.012) λt (i.e., the scale of perturbation is proportional to the population) to account for the difficulty in learning the real-world dynamics. During the burn-in period (10 steps), we deploy epsilon-greedy with the corresponding value of β (= ϵ).