reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improving Generalization with Flat Hilbert Bayesian Inference

Authors: Tuan Truong, Quyen Tran, Ngoc-Quan Pham, Nhat Ho, Dinh Phung, Trung Le

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the effectiveness of FHBI, we conduct comprehensive comparisons against nine baseline methods on the VTAB-1K benchmark, which encompasses 19 diverse datasets across various domains with diverse semantics. Empirical results demonstrate that FHBI consistently outperforms the baselines by notable margins, highlighting its practical efficacy.
Researcher Affiliation	Collaboration	1 Movian AI, Vietnam 2 The University of Texas at Austin, USA 3 Monash University, Australia.
Pseudocode	Yes	Algorithm 1 FLAT HILBERT BAYESIAN INFERENCE (FHBI)
Open Source Code	No	The paper does not provide a direct link to a code repository, nor does it explicitly state that the code for the described methodology is being released. It only mentions that the implementation is based on the repository V-PETL.
Open Datasets	Yes	To assess the efficacy of FHBI, we experiment on VTAB-1K (Zhai et al., 2020), a challenging image classification/prediction benchmark consisting of 19 datasets from various domains.
Dataset Splits	Yes	To assess the efficacy of FHBI, we experiment on VTAB-1K (Zhai et al., 2020), a challenging image classification/prediction benchmark consisting of 19 datasets from various domains.
Hardware Specification	Yes	The experiments were conducted on a single Tesla V100 GPU.
Software Dependencies	No	The experiments were run with Py Torch on a Tesla V100 GPU with 40GB of RAM. No specific version for PyTorch is provided.
Experiment Setup	Yes	For each experiment, we conducted five runs of FHBI and reported the mean and standard deviation. All Bayesian methods were trained with four particles on the same set of Lo RA parameters. We used ten warm-up epochs, batch size 64, the Gaussian kernel, and the cosine annealing learning rate scheduler for all settings. [...] FHBI involves three hyperparameters: the learning rate ϵ, ascent step size ρ, and kernel width σ. To tune these hyperparameters, we grid-search hyperparameters on the validation set, where the key hyperparameters are: the kernel width σ, the initial learning rate ϵ, and the ascent step size ρ. The candidate sets are formed as ϵ {0.15, 1, 1.5, 1.7, 1.9, 2.1, 2.3, 2.5}, ρ {0.01, 0.03, 0.05}, σ {0.7, 1, 1.2}.