reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Lenient Learning in Independent-Learner Stochastic Cooperative Games

Authors: Ermo Wei, Sean Luke

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We discuss the existing literature, then compare LMRL2 against other algorithms drawn from the literature which can be used for games of this kind: traditional (Distributed) Q-learning, Hysteretic Q-learning, WoLF-PHC, SOoN, and (for repeated games only) FMQ. The results show that LMRL2 is very eﬀective in both of our measures (complete and correct policies), and is found in the top rank more often than any other technique. We tested against twelve games, either from the literature or of our own devising. This collection was meant to test a diverse array of situations... Table 4 summarizes the rank order among the methods and statistically signiﬁcant diﬀerences. Table 5 shows the actual results.
Researcher Affiliation	Academia	Ermo Wei EMAIL Sean Luke EMAIL Department of Computer Science George Mason University 4400 University Drive MSN 4A5 Fairfax, VA 22030, USA
Pseudocode	Yes	4. The LMRL2 Algorithm LMRL2 then iterates for some n times through the following four steps. First, it computes a mean temperature T. Second, using this mean temperature it selects an action to perform. Third, it performs the action and receives a reward resulting from the joint actions of all agents (all agents perform this step simultaneously and synchronously). Fourth, it updates the Q and T tables.
Open Source Code	No	The paper does not provide any explicit statements about open-sourcing the code for LMRL2 or a link to a code repository. It focuses on describing the algorithm and its experimental results.
Open Datasets	Yes	We tested with four repeated-game test problems from the literature. The widely used Climb and Penalty games (Claus and Boutilier, 1998) are designed to test some degree of relative overgeneralization and miscoordination. We also included versions of the Climb game with partially stochastic and fully stochastic rewards, here designated Climb-PS and Climb-FS respectively (Kapetanakis and Kudenko, 2002). We also tested against several stochastic games. The Boutilier game (Boutilier, 1999) was a repeated game with deterministic transitions which distributed a miscoordination situation among several stages. The Common Interest game (Vrancx et al., 2008) is also recurrent, but with stochastic transitions and some miscoordination.
Dataset Splits	No	The paper describes experiments conducted on various 'games' or 'test problems' where agents learn policies. It specifies running '10,000 iterations up front' and '1000 independent runs' for statistical comparison, but it does not describe traditional dataset splits (e.g., training, validation, test sets) for static data.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper describes the LMRL2 algorithm and compares it to other multi-agent reinforcement learning techniques. However, it does not specify any software libraries, frameworks, or programming language versions used for its implementation.
Experiment Setup	Yes	LMRL2 for repeated games relies on the following parameters. Except for θ and ω, all of them will be ﬁxed to the defaults shown, and will not be modiﬁed: α 0.1 learning rate γ 0.9 discount for inﬁnite horizon δ 0.995 temperature decay coeﬃcient Max Temp 50.0 maximum temperature Min Temp 2.0 minimum temperature ω > 0 action selection moderation factor (by default 1.0) θ > 0 lenience moderation factor (by default 1.0). Table 2: Default Parameter Settings for each technique. Table 3: Tuned Parameter Settings.