Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks
Authors: Swapna Buccapatnam, Fang Liu, Atilla Eryilmaz, Ness B. Shroff
JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we use numerical examples on a real-world social network and a routing example network to demonstrate the benefits obtained by our policies over other existing policies. Keywords: Multi-armed Bandits, Side Observations, Bipartite Graph, Regret Bounds |
| Researcher Affiliation | Collaboration | Swapna Buccapatnam EMAIL AT&T Labs Research, Middletown, NJ 07748, USA Fang Liu EMAIL Atilla Eryilmaz EMAIL Department of Electrical and Computer Engineering The Ohio State University Columbus, OH 43210, USA |
| Pseudocode | Yes | Algorithm 1 : ϵt-greedy-LP Algorithm 2 : UCB-LP policy |
| Open Source Code | No | The paper does not provide an explicit statement about code release or a link to a code repository. The license mentioned is for the paper itself, not the methodology's implementation. |
| Open Datasets | Yes | We consider the Flixster network dataset for the numerical evaluation of our algorithms. The authors in Jamali and Ester (2010) collected this social network data, which contains about 1 million users and 14 million links. |
| Dataset Splits | Yes | We use graph clustering by Dhillon et al. (2007) to identify two strongly clustered sub-networks of sizes 1000 and 2000 nodes. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. It only mentions 'simulation' and 'numerical evaluation'. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiments. |
| Experiment Setup | Yes | For the ϵt-greedy-LP policy, we let c = 5 and d = 0.2 (we choose d = 0.2 to show that our algorithm seems to have good performance in more general settings, even when the bounds in the Proposition 7 are not known or used). For both networks, we see that our policies outperform the UCB-N and UCB-Max N policies Caron et al. (2012). For the ϵt-greedy-LP policy, we let c = 4 and d = 0.05. |