reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Can Temporal-Diﬀerence and Q-Learning Learn Representation? A Mean-Field Theory

Authors: Yufeng Zhang, Qi Cai, Zhuoran Yang, Yongxin Chen, Zhaoran Wang

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We prove that, utilizing an overparameterized two-layer neural network, temporaldifference and Q-learning globally minimize the mean-squared projected Bellman error at a sublinear rate. Moreover, the associated feature representation converges to the optimal one, generalizing the previous analysis of [21] in the neural tangent kernel regime, where the associated feature representation stabilizes at the initial one. The key to our analysis is a mean-ﬁeld perspective
Researcher Affiliation	Academia	Yufeng Zhang Northwestern University Evanston, IL 60208 EMAIL Qi Cai Northwestern University Evanston, IL 60208 EMAIL Zhuoran Yang Princeton University Princeton, NJ 08544 EMAIL Yongxin Chen Georgia Institute of Technology Atlanta, GA 30332 EMAIL Zhaoran Wang Northwestern University Evanston, IL 60208 EMAIL
Pseudocode	Yes	For an initial distribution 0 2 P(RD), we initialize { i}m i.i.d. 0 (i 2 [m]). See Algorithm 1 in A for a detailed description.
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper is theoretical and does not conduct experiments or use datasets, thus no information about public dataset availability is provided.
Dataset Splits	No	The paper is theoretical and does not conduct experiments with datasets, thus no information about training/validation/test splits is provided.
Hardware Specification	No	The paper is theoretical and does not describe any experiments that would require hardware specifications.
Software Dependencies	No	The paper is theoretical and does not describe an experimental setup that would involve software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe an experimental setup with specific hyperparameters or training configurations.