Enhancing Low-Precision Sampling via Stochastic Gradient Hamiltonian Monte Carlo
Authors: Ziyi Wang, Yujie Chen, Qifan Song, Ruqi Zhang
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we conduct experiments on synthetic data, and MNIST, CIFAR-10 & CIFAR-100 datasets, which validate our theoretical findings. Our study highlights the potential of low-precision SGHMC as an efficient and accurate sampling method for large-scale and resource-limited machine learning. We provide promising empirical results across various datasets and models. |
| Researcher Affiliation | Academia | Ziyi Wang EMAIL Department of Statistics Purdue University Yujie Chen EMAIL Department of Statistics Purdue University Qifan Song EMAIL Department of Statistics Purdue University Ruqi Zhang EMAIL Department of Computer Science Purdue University |
| Pseudocode | Yes | Algorithm 1 Low-Precision Training for SGHMC. Algorithm 2 Variance-Corrected Quantization Function Qvc. (Zhang et al., 2022) |
| Open Source Code | Yes | Our code is available here. |
| Open Datasets | Yes | Empirically, we conduct experiments on synthetic data, and MNIST, CIFAR-10 & CIFAR-100 datasets, which validate our theoretical findings. |
| Dataset Splits | No | We use logistic and multilayer perceptron (MLP) models to represent the class of strongly log-concave and non-log-concave distributions, respectively. The results are shown in Figure 5 and 6. We use N 0, 10 2 as the prior distribution and fixed point number representation, where we set 2 integer bits and various fractional bits. We consider image tasks CIFAR-10 and CIFAR-100 on the Res Net-18. We use 8-bit number representation following Zhang et al. (2022). We report the test errors averaging over 3 runs in Tables 2 and 4. Explanation: The paper mentions using well-known datasets (MNIST, CIFAR-10, CIFAR-100) and reporting "test errors" or "training NLL" which implies splits, but it does not explicitly provide specific details about how these datasets were split into training, validation, and test sets (e.g., percentages, sample counts, or clear references to standard predefined splits with full specification). |
| Hardware Specification | No | When implementing low-precision SGHMC on classification tasks in the MNIST, CIFAR-10 and CIFAR-100 dataset, we observed that the momentum term v tend to gather in a small range around zero in which case the low-precision representations of v end up in using few bits, thus the momentum information is seriously lost and cause in performance degradation. In order to tackle this problem and fully utilize all the low-precision representations, we borrowed the idea of rescaling from the bit-centering trick and adopted the low-precision SGHMC method. The detailed algorithm is listed in Algorithms 1. Explanation: The paper does not provide specific hardware details such as GPU/CPU models, memory, or processor types used for running the experiments. It only vaguely mentions "our machine" in a conceptual context rather than for experimental setup. |
| Software Dependencies | No | Throughout all experiments, low-precision arithmetic is implemented using qtorch (Zhang et al., 2019). Explanation: The paper mentions a specific software package, qtorch, but does not provide its version number, which is required for reproducibility. |
| Experiment Setup | Yes | For the standard normal distribution experiment, we use 8-bit fixed point low-precision representation with 4 of them representing fractional parts. Moreover, we set the step size η = 0.09, inverse mass u = 2, and friction γ = 3. Similarly, for Gaussian mixture distribution, we also use 8-bit fixed point low-precision representation with 4 of them representing fractional parts for both low-precision SGHMC and SGLD, but we set the step size η = 0.1, inverse mass u = 1, and friction γ = 3. Next, for both logistic, MLP models, low-precision SGLD and SGHMC in MNIST task, we set N 0, 10 2 as the prior distribution, and step size η = 0.01. Moreover, for SGHMC, we set the inverse mass u = 2, and friction γ = 2. Then we introduce the training detail of low-precision SGHMC for CIFAR-10 & CIFAR-100. We use N 0, 10 4 as the prior distribution. Furthermore, we set the set the step size η = 0.1, and u = 2, γ = 2 for low-precision SGHMC. |