Local Function Complexity for Active Learning via Mixture of Gaussian Processes

Authors: Danny Panknin, Stefan Chmiela, Klaus Robert Muller, Shinichi Nakajima

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We assess the effectiveness of our LFC estimate in an AL application on a prototypical low-dimensional synthetic dataset, before taking on the challenging real-world task of reconstructing a quantum chemical force field for a small organic molecule and demonstrating state-of-the-art performance with a significantly reduced training demand. (Abstract) 5 Experiments In this section, we will first analyze our approach on toy-data, regarding the Mo E model, LFC, and the superior training density. Then, we apply our approach to a high-dimensional MD simulation dataset from quantum chemistry, by which we can deduce deeper insights into this regression problem.
Researcher Affiliation Academia Danny Panknin EMAIL Uncertainty, Inverse Modeling and Machine Learning Group, Berlin Institute of Technology, 10587 Berlin, Germany Physikalisch-Technische Bundesanstalt, 10587 Berlin, Germany Stefan Chmiela EMAIL Machine Learning Department, Berlin Institute of Technology, 10587 Berlin, Germany BIFOLD-Berlin Institute for the Foundations of Learning and Data, Germany Klaus-Robert Müller EMAIL Machine Learning Department, Berlin Institute of Technology, 10587 Berlin, Germany BIFOLD-Berlin Institute for the Foundations of Learning and Data, Germany Department of Artificial Intelligence, Korea University, Seoul 136-713, South Korea Max Planck Institute for Informatics, 66123 Saarbrücken, Germany Shinichi Nakajima EMAIL Machine Learning Department, Berlin Institute of Technology, 10587 Berlin, Germany BIFOLD-Berlin Institute for the Foundations of Learning and Data, Germany RIKEN AIP, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, Japan
Pseudocode Yes Algorithm 1: Superior training data process (Xn, Yn)n N with labels Yn of training inputs Xn d p GPR,n Sup Algorithm 2: (ΘH, ΣE) hyper_init(Xn0, Yn0, p0, Xval, Yval)
Open Source Code Yes 1https://github.com/DPanknin/modelagnostic_superior_training
Open Datasets No The Doppler function (see, for example, Donoho & Johnstone (1994)), which was also discussed in related work that deals with inhomogeneous complexity (Panknin et al., 2021; Bull et al., 2013). For x X = [0, 1], let P(y|x) = N(y; f(x), 1), f(x) = C p x(1 x) sin (2π(1 + ϵ)/(x + ϵ)) , where ϵ = 0.05, C is chosen such that f 2 = 7 and N( ; µ, σ2) denotes the Gaussian distribution with mean µ and variance σ2. We assume a uniform test distribution q U(X) in all Doppler function experiments. Experimental setup All experiments use an extensive pre-computed reference trajectory (almost a million data points (Xpool, Ypool)) as ground truth, as opposed to generating new data points on demand. This test setup allows a post-hoc verification of the training distribution generated by our AL approach, while still providing ample redundancy and therefore sampling freedom. (Section 5.2)
Dataset Splits No Prior to the AL procedure, we separate the validation samples Xval and test samples XT at random from the pool Xpool. We apply an initial expert training size of n0 = 29, doubling the sample size with each iteration of the AL procedure. The initial expert training set Xn0 and the gate training set XG n G are drawn via importance sampling from the remaining pool with weights bp 1/2 X (Xpool \ (Xval XT)). By this it is Xn0 q1/2, which is more in alignment with the superior training density (21) than sampling Xn0 q. (Section 5.2)
Hardware Specification No We implement our model in Py Torch (Paszke et al., 2019), using the GPy Torch-package (Gardner et al., 2018). (Section 4.4.2) ... (GPy Torch is stated to support GPU acceleration, but no specific hardware models are mentioned).
Software Dependencies No We implement our model in Py Torch (Paszke et al., 2019), using the GPy Torch-package (Gardner et al., 2018). (Section 4.4.2) ...we deploy the DGP model of Sauer et al. (2023b) using the CRAN package deepgp4. (Section 5.1)
Experiment Setup Yes We apply 512, respectively 128 IPs for the experts and the gate, which are chosen via SVGD (see Appendix E). Furthermore we apply σj = 10(j 10)/3, 1 j 7, as the expert bandwidths, λE = 20 as the initial expert regularization, and σG = 0.05 and λG = 10 for the gate. For the training, we apply a batch size of B = 512, a terminal expert sparsity κ = 2, a penalty factor of ϑσ = 0.5 for small bandwidth choices, gate noise parameters s0 = 0.1 and ηs = 1/2, and learning rate parameters η = 0.01, ηH = 0.2, ηG = 1. (Section 5.1) We apply dense s GDML experts with Σj = σjΣE, where σj = 2 5/4+j/2, 1 j 8 as the individual expert bandwidths, λE = 1 as the initial expert regularization, and σG = 0.1 and λG = 104 for the sparse gate with 1024 IPs. For the training, we apply a batch size of B = 1024, a terminal expert sparsity κ = 8, a penalty factor of ϑσ = 0.01 for small bandwidth choices, gate noise parameters s0 = 0.01 and ηs = 1/2, and learning rate parameters η = 0.005, ηH = 0.05, ηG = 0.1. (Section 5.2)