reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Primal-dual Perspective for Distributed TD-learning

Authors: Han Dong Lim, Donghwan Lee

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Experiments This section provides the experimental results of Algorithm 1. First, we give an explanation of the MAMDP setup, where the number of states is three and the dimension of the feature is two. An agent can transit to every state with uniform prob-ability. The feature matrix is set as Φ = 1 0 1 0 1 0 rewards are generated uniformly random between the interval (0, 10). The discount factor is set as 0.8. For each experiment with N {23, 25} number of agents, we construct a cycle, a graph G consisting of V := {1, 2, . . . , N} and E := {(i, i+1)}N 1 i=1 {(N, 1)}. The smallest non-zero eigenvalue of graph Laplacian corresponding to a cycle with even number of vertices decreases as the number of vertices increases, while maximum eigenvalue remains same. The smallest non-zero eigenvalue is 2 2 cos 2π N , and the largest eigenvalue is four [Mohar, 1997]. As N gets larger, the smallest non-zero eigenvalue gets smaller, which becomes 0.59 and 0.04 for N = 23, 25, respectively. Therefore, as number of agents increases, the convergence rate will be slower as expected in Theorem 5, and this can be verified in Figure (1a) and Figure (3) in the Appendix. The plots show the result for constant step-size α0 {2 3, 2 4, 2 5, 2 6}. Moreover, the convergence under a diminishing step-size can be seen in Figure (1b). To investigate the effect of λmax( L), we construct a star graph, where one vertex has degree N 1 and the others have degree one. The maximum eigenvalue of star graph is N and the smallest non-zero eigenvalue is one [Nica, 2016]. Even though N gets larger, we could see in Figure (1e) and (1f) that the convergence rate or bias term does not vary. Therefore, we can expect that there could be a tighter bound without λmax( L) under particular cases. To verify the effect of η, we use a random graph model [Erd os et al., 1960], where among possible N(N 1)/2 edges, (N 3)(N 4)/2 edges are randomly selected. Figure (1c) shows the evolution of the mean squared error for N = 32, and step-size 0.1 with different η values. When η = 0.5 or η = 1, the algorithm diverges. Moreover, the bias gets smaller around 2 λmax(L) 0.046. This implies that appropriate choice of η can control the variance when the number of neighbors is large but if η is too small or large, Algorithm 1 may cause divergence or large bias. This matches the result of the bound in Theorem 5. Similar arguments hold when N = 8, and the result is given in Figure (1d). Lastly, the comparison with other algorithms are given in Appendix G. In summary, while no single algorithm consistently outperforms the others, the performance of methods that rely on the doubly stochastic matrix is highly sensitive to the choice of this matrix.
Researcher Affiliation	Academia	Han-Dong Lim , Donghwan Lee Department of Electrical Engineering, KAIST EMAIL
Pseudocode	Yes	Algorithm 1 Distributed TD-learning Initialize α0 (0, 1), {θi 0, wi 0 Rq}N i=1, η (0, ). for k = 1, 2, . . . , T do for i = 1, 2, . . . , N do Agent i observes oi k := (sk, s k, ri k). Update as follows: δ(oi k; θi k) =ri k + γϕ (s k)θi k ϕ (sk)θi k (6) θi k+1 =θi k + αk(δ(oi k; θi k)ϕ(sk) η(\|Ni\|θi k P j Ni θj k) η(\|Ni\|wi k P j Ni wj k)) (7) wi k+1 =wi k + αkη(\|Ni\|θi k P j Ni θj k) (8) end for end for
Open Source Code	Yes	2The code is provided in this link.
Open Datasets	No	The paper describes its own MAMDP setup and generates rewards uniformly randomly. It does not refer to any publicly available datasets or provide access information for any data used in the experiments. Text: "First, we give an explanation of the MAMDP setup, where the number of states is three and the dimension of the feature is two. An agent can transit to every state with uniform prob-ability. The feature matrix is set as Φ = 1 0 1 0 1 0 rewards are generated uniformly random between the interval (0, 10). The discount factor is set as 0.8."
Dataset Splits	No	The paper uses a simulated multi-agent Markov decision process (MAMDP) setup rather than a traditional dataset with predefined splits. There is no mention of training, validation, or test splits. Text: "First, we give an explanation of the MAMDP setup, where the number of states is three and the dimension of the feature is two. An agent can transit to every state with uniform prob-ability. The feature matrix is set as Φ = 1 0 1 0 1 0 rewards are generated uniformly random between the interval (0, 10). The discount factor is set as 0.8."
Hardware Specification	No	The paper describes the experimental results and setup parameters, but does not provide any specific details about the hardware (e.g., CPU, GPU, memory) used to run the experiments.
Software Dependencies	No	The paper mentions that code is provided in a link, but does not list any specific software dependencies or their version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	5 Experiments This section provides the experimental results of Algorithm 1. First, we give an explanation of the MAMDP setup, where the number of states is three and the dimension of the feature is two. An agent can transit to every state with uniform prob-ability. The feature matrix is set as Φ = 1 0 1 0 1 0 rewards are generated uniformly random between the interval (0, 10). The discount factor is set as 0.8. For each experiment with N {23, 25} number of agents, we construct a cycle, a graph G consisting of V := {1, 2, . . . , N} and E := {(i, i+1)}N 1 i=1 {(N, 1)}. The smallest non-zero eigenvalue of graph Laplacian corresponding to a cycle with even number of vertices decreases as the number of vertices increases, while maximum eigenvalue remains same. The smallest non-zero eigenvalue is 2 2 cos 2π N , and the largest eigenvalue is four [Mohar, 1997]. As N gets larger, the smallest non-zero eigenvalue gets smaller, which becomes 0.59 and 0.04 for N = 23, 25, respectively. Therefore, as number of agents increases, the convergence rate will be slower as expected in Theorem 5, and this can be verified in Figure (1a) and Figure (3) in the Appendix. The plots show the result for constant step-size α0 {2 3, 2 4, 2 5, 2 6}. Moreover, the convergence under a diminishing step-size can be seen in Figure (1b). To investigate the effect of λmax( L), we construct a star graph, where one vertex has degree N 1 and the others have degree one. The maximum eigenvalue of star graph is N and the smallest non-zero eigenvalue is one [Nica, 2016]. Even though N gets larger, we could see in Figure (1e) and (1f) that the convergence rate or bias term does not vary. Therefore, we can expect that there could be a tighter bound without λmax( L) under particular cases. To verify the effect of η, we use a random graph model [Erd os et al., 1960], where among possible N(N 1)/2 edges, (N 3)(N 4)/2 edges are randomly selected. Figure (1c) shows the evolution of the mean squared error for N = 32, and step-size 0.1 with different η values. When η = 0.5 or η = 1, the algorithm diverges. Moreover, the bias gets smaller around 2 λmax(L) 0.046. This implies that appropriate choice of η can control the variance when the number of neighbors is large but if η is too small or large, Algorithm 1 may cause divergence or large bias. This matches the result of the bound in Theorem 5. Similar arguments hold when N = 8, and the result is given in Figure (1d). Lastly, the comparison with other algorithms are given in Appendix G. In summary, while no single algorithm consistently outperforms the others, the performance of methods that rely on the doubly stochastic matrix is highly sensitive to the choice of this matrix.