HiRemate: Hierarchical Approach for Efficient Re-materialization of Neural Networks
Authors: Julia Gusak, Xunyi Zhao, Théotime Le Hellard, Zhe Li, Lionel Eyraud-Dubois, Olivier Beaumont
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Section 4. Experimental Evaluation: The experiments were performed on an NVIDIA Quadro RTX8000 GPU with 48 GB of memory and an NVIDIA V100 GPU with 16 GB of memory, using Py Torch 2.0.1, CUDA 11.6, and Gurobi 9.5.0. We intentionally report performance on settings where the experimental platform can run the original model, so that we can compare our results with the training time obtained with regular Py Torch Autodiff. All experiments can be scaled up by increasing image or batch size, to a point where training requires using HIREMATE. Additional experiments including an ablation study, varying batch sizes and sequence lengths, as well as a dozen different architectures are available in the Appendix. |
| Researcher Affiliation | Academia | 1Inria Center at the University of Bordeaux 2 Ecole Normale Sup erieure, PSL University, Paris. Correspondence to: Julia Gusak <EMAIL>, Lionel Eyraud Dubois <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 H-Partition Bottom-to-Top algorithm |
| Open Source Code | No | The paper mentions external tools like ROTOR (https://gitlab.inria.fr/hiepacs/rotor) and TW-REMAT (https://github.com/nshepperd/gpt-2/tree/finetuning/twremat) used in their framework, and discusses 'RK-GB module as in ROCKMATE'. However, there is no explicit statement or link providing the source code for HIREMATE itself, nor is HIREMATE identified as an open-source project with a direct repository link. |
| Open Datasets | No | The paper discusses various neural network architectures (e.g., GPT2, UNet, MLPMixer, RegNet32, ResNet101, Transformer, FNO, U-FNO, UNO) on which HIREMATE is evaluated. It also mentions varying batch sizes, sequence lengths, and image resolutions for inputs. However, it does not explicitly state which specific datasets were used for training these models, nor does it provide any concrete access information (links, DOIs, citations to specific datasets) for publicly available data. |
| Dataset Splits | No | The paper does not provide specific details on training/test/validation splits. It mentions experiments on various types of networks and varying input parameters like batch sizes and sequence lengths, but no information regarding how data might have been partitioned for these experiments. |
| Hardware Specification | Yes | The experiments were performed on an NVIDIA Quadro RTX8000 GPU with 48 GB of memory and an NVIDIA V100 GPU with 16 GB of memory, using Py Torch 2.0.1, CUDA 11.6, and Gurobi 9.5.0. All models passed a sanity check: both forward and backward passes produce the exact same result as the original module. Experiments are done on a NVIDIA P100 GPU with 16GB. |
| Software Dependencies | Yes | The experiments were performed on an NVIDIA Quadro RTX8000 GPU with 48 GB of memory and an NVIDIA V100 GPU with 16 GB of memory, using Py Torch 2.0.1, CUDA 11.6, and Gurobi 9.5.0. |
| Experiment Setup | Yes | We perform a warm-up phase consisting of five initial runs; the subsequent ten runs are used to evaluate the peak memory and computation time, providing reliable estimates of performance. The subgraph sizes are bounded by two main parameters: M l denotes the maximum number of nodes in a lower-level subgraph, and M t denotes the maximum number of nodes in the top-level graph. The default value for α is 0.5. The number of binary variables in the H-ILP formulation depends linearly on the total number of options of all nodes. To avoid wasting resources when several very similar options are available for a given node, we include in H-ILP a hyperparameter No that imposes a limit on the total number of options. Table 4 (c) describes the result of HIREMATE on each model: the budget provided to HIREMATE; the relative memory usage (compared to the peak memory of the autodiff solution) of the resulting nn.Module created by HIREMATE. |