Deep Equilibrium Algorithmic Reasoning

Authors: Dobrik Georgiev, Joseph Wilson, Davide Buffelli, Pietro Lió

NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical evidence, leveraging algorithms from the CLRS-30 benchmark, validates that one can train a network to solve algorithmic problems by directly finding the equilibrium. 5 Evaluation Setup For each algorithm we generated 105/100/100-sized training/validation/test datasets.
Researcher Affiliation Collaboration Dobrik Georgiev University of Cambridge EMAIL JJ Wilson Independent Researcher EMAIL Davide Buffelli Media Tek Research EMAIL Pietro Liò University of Cambridge EMAIL
Pseudocode Yes Listing 1: BFS algorithm. Clearly implemented as while b do c.
Open Source Code Yes 0Source code available here: https://github.com/Hekpo Ma H/DEAR
Open Datasets Yes Our empirical evidence, leveraging algorithms from the CLRS-30 benchmark, validates that one can train a network to solve algorithmic problems by directly finding the equilibrium.
Dataset Splits Yes For each algorithm we generated 105/100/100-sized training/validation/test datasets. Training sample sizes vary between 8 and 16 elements (uniformly randomly chosen) validation samples are of size 16.
Hardware Specification Yes If run on a single 4090 GPU one would need about 3 weeks of total compute.
Software Dependencies Yes The final node embeddings, from which the output is decoded, are the solution to the equation: H( ) = PUE(H( )) (4) where PUE(H( )) = P(H( ), U, E), U/E are the encoded node and edge feature matrices and P is the processor function. H(t) are the stacked latent states of the nodes at timestep t (with H(0) = 0). The above Equation 4 matches the signature of Equation 2, and can be solved via root-finding (we employ torchdeq [19]; MIT License), as if it is fθ of a DEQ. Our differences are mostly required by software engineering rather than research, hence they live here. Differences are: Different DL framework (Pytorch [36])
Experiment Setup Yes In our experiments the models have a latent dimensionality of 128, the batch size is 32, the learning rate is 3 10 4 and we use the Adam optimizer [31]. We train our algorithmic reasoners for 100 epochs, choosing the model with the lowest task validation loss (discounting any regularisation; focusing on performance only) for testing.