Learning Sub-Second Routing Optimization in Computer Networks requires Packet-Level Dynamics

Authors: Andreas Boltres, Niklas Freymuth, Patrick Jahnke, Holger Karl, Gerhard Neumann

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental All findings are backed by extensive experiments in realistic network conditions in our fast and versatile training and evaluation framework.1
Researcher Affiliation Collaboration Andreas Boltres EMAIL Autonomous Learning Robots, Karlsruhe Institute of Technology SAP SE Niklas Freymuth EMAIL Autonomous Learning Robots, Karlsruhe Institute of Technology Patrick Jahnke EMAIL Turba AI Holger Karl EMAIL Internet-Technology and Softwarization, Hasso-Plattner-Institut Potsdam Gerhard Neumann EMAIL Autonomous Learning Robots, Karlsruhe Institute of Technology
Pseudocode No The paper describes algorithms and models using mathematical formulations and architectural diagrams (e.g., Figure 13 for MPN architecture), but it does not include any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code Yes Code available via project webpage: https://alrhub.github.io/packerl-website/
Open Datasets No We do not work with pre-generated datasets of network scenarios. Instead, using synnet and controllable random seeds for training and evaluation, we generate a new network topology, traffic demands and link failures for all timesteps at the start of every episode.
Dataset Splits Yes The performance values presented in this work are obtained by taking the mean over 100 evaluation episodes, except for nx XL for which we use 30 evaluation episodes.
Hardware Specification Yes take between 3 and 14 hours of training on 4 cores of an Intel Xeon Gold 6230 CPU.
Software Dependencies No The paper mentions several software components and frameworks like "ns-3", "ns3-ai", "Gymnasium", "PPO", "Tensorflow", "PyTorch", and "Adam optimizer" along with their respective originating papers. However, it does not provide specific version numbers for these software dependencies, which is required for a reproducible description.
Experiment Setup Yes We train M-Slim and Field Lines on 16 random seeds for 100 iterations of 16 episodes each. We use PPO (Schulman et al., 2017) and refer to Section B.7 for hyperparameter details. Given the episode length H 100, each training iteration of 16 episodes by default uses 1600 sampled environment transitions to do 10 update epochs with a minibatch size of 400. We multiply the value loss function with a factor of 0.5, clip the gradient norm to 0.5 and use policy and value clip ratios of 0.2 as per Schulman et al. (2018). We use a discount factor of γ 0.99 and use λGAE 0.95 for Generalized Advantage Estimation (Andrychowicz et al., 2020). ... For the learnable softmax temperature τψ used during exploration by Field Lines s selector module ψ, we use an initial value of 4. For the learnable standard deviation σM-Slim, we use an initial value of 1. ... use the Adam optimizer with a learning rate of α 5e-5 (Kingma & Ba, 2014) for Field Lines, and 3e-3 for M-Slim.