reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Neural Contextual Bandits with Deep Representation and Shallow Exploration

Authors: Pan Xu, Zheng Wen, Handong Zhao, Quanquan Gu

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on contextual bandit problems based on real-world datasets, demonstrating a better performance and computational efﬁciency of Neural-Lin UCB over Lin UCB and existing neural bandits algorithms such as Neural UCB, which well aligns with our theory.
Researcher Affiliation	Collaboration	Pan Xu California Institute of Technology EMAIL Zheng Wen Deep Mind EMAIL Handong Zhao Adobe Research EMAIL Quanquan Gu University of California, Los Angeles EMAIL
Pseudocode	Yes	Algorithm 1 Deep Representation and Shallow Exploration (Neural-Lin UCB) ... Algorithm 2 Update Weight Parameters with Gradient Descent
Open Source Code	No	The paper does not provide a statement or link for open-sourcing the code.
Open Datasets	Yes	Speciﬁcally, following the experimental setting in Zhou et al. (2020),we use datasets (Shuttle) Statlog, Magic and Covertype from UCI machine learning repository (Dua & Graff, 2017), and the MINST dataset from Le Cun et al. (1998).
Dataset Splits	No	The paper mentions using
Hardware Specification	Yes	All numerical experiments were run on a workstation with Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz.
Software Dependencies	No	The paper mentions using 'Re LU neural network' and 'stochastic gradient decent' but does not specify software versions for libraries like PyTorch, TensorFlow, or scikit-learn.
Experiment Setup	Yes	We use a Re LU neural network deﬁned as in (2.3) with L = 2 and m = 100 for the UCI datasets (Statlog, Magic, Covertype). ... We set the time horizon T = 15, 000... We use stochastic gradient decent to optimize the network weights, with a step size ηq =1e-5 and maximum iteration number n = 1, 000. ... the network parameter w is updated every H = 100 rounds... We set λ = 1 and αt = 0.02 for all algorithms, t [T].