Hamiltonian Monte Carlo on ReLU Neural Networks is Inefficient
Authors: Vu C. Dinh, Lam S. Ho, Cuong V. Nguyen
NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments with both synthetic and real datasets, we validate our theoretical analyses and highlight the inefficiency of Re LU neural networks compared to analytical networks. |
| Researcher Affiliation | Academia | Vu C. Dinh Department of Mathematical Sciences University of Delaware EMAIL Lam Si Tung Ho Department of Mathematics and Statistics Dalhousie University EMAIL Cuong V. Nguyen Department of Mathematical Sciences Durham University EMAIL |
| Pseudocode | No | The paper describes the leapfrog updates in Equations (1)-(3) but does not present them in a clearly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | No | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: We would like to keep the code confidential. |
| Open Datasets | Yes | In addition to the synthetic dataset above, we also conduct experiments to validate our theoretical findings on a subset of the real-world UTKFace dataset (Zhang et al., 2017). |
| Dataset Splits | No | The subset is then randomly split into a training set (167 images) and a test set (100 images). The paper does not explicitly mention a validation dataset split. |
| Hardware Specification | No | The experiments were run on a single CPU machine. The paper does not provide specific details such as CPU model, memory, or other hardware specifications. |
| Software Dependencies | No | Our simulations are implemented using the Autograd package (Maclaurin et al., 2015). The paper mentions the software 'Autograd' but does not specify its version number. |
| Experiment Setup | Yes | We choose a standard normal prior π(q) = N(0, I) and sample 2,000 parameter vectors from the posterior after a burn-in period of 100 samples. We vary the number of steps L {200, 400, 600, 800, 1000} together with the step size ϵ {0.0005, 0.0010, 0.0015, 0.0020, 0.0025}. We fix the travel time T = ϵL = 0.1 and vary ϵ {0.0005, 0.0010, 0.0015, . . . , 0.0040}. For each network, we run HMC with L = 200 and ϵ = 0.001 while keeping other hyper-parameters the same as in previous simulations. In this experiment, we fix T = 0.01 and vary ϵ {0.00005, 0.00010, 0.00015, . . . , 0.00040}. For each run of the HMC, we sample 300 parameter vectors from the posterior after a burn-in period of 50 samples. |