reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Deep Equilibrium Algorithmic Reasoning

Authors: Dobrik Georgiev, Joseph Wilson, Davide Buffelli, Pietro Lió

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical evidence, leveraging algorithms from the CLRS-30 benchmark, validates that one can train a network to solve algorithmic problems by directly finding the equilibrium. 5 Evaluation Setup For each algorithm we generated 105/100/100-sized training/validation/test datasets.
Researcher Affiliation	Collaboration	Dobrik Georgiev University of Cambridge EMAIL JJ Wilson Independent Researcher EMAIL Davide Buffelli Media Tek Research EMAIL Pietro Liò University of Cambridge EMAIL
Pseudocode	Yes	Listing 1: BFS algorithm. Clearly implemented as while b do c.
Open Source Code	Yes	0Source code available here: https://github.com/Hekpo Ma H/DEAR
Open Datasets	Yes	Our empirical evidence, leveraging algorithms from the CLRS-30 benchmark, validates that one can train a network to solve algorithmic problems by directly finding the equilibrium.
Dataset Splits	Yes	For each algorithm we generated 105/100/100-sized training/validation/test datasets. Training sample sizes vary between 8 and 16 elements (uniformly randomly chosen) validation samples are of size 16.
Hardware Specification	Yes	If run on a single 4090 GPU one would need about 3 weeks of total compute.
Software Dependencies	Yes	The final node embeddings, from which the output is decoded, are the solution to the equation: H( ) = PUE(H( )) (4) where PUE(H( )) = P(H( ), U, E), U/E are the encoded node and edge feature matrices and P is the processor function. H(t) are the stacked latent states of the nodes at timestep t (with H(0) = 0). The above Equation 4 matches the signature of Equation 2, and can be solved via root-finding (we employ torchdeq [19]; MIT License), as if it is fθ of a DEQ. Our differences are mostly required by software engineering rather than research, hence they live here. Differences are: Different DL framework (Pytorch [36])
Experiment Setup	Yes	In our experiments the models have a latent dimensionality of 128, the batch size is 32, the learning rate is 3 10 4 and we use the Adam optimizer [31]. We train our algorithmic reasoners for 100 epochs, choosing the model with the lowest task validation loss (discounting any regularisation; focusing on performance only) for testing.