Local Function Complexity for Active Learning via Mixture of Gaussian Processes
Authors: Danny Panknin, Stefan Chmiela, Klaus Robert Muller, Shinichi Nakajima
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We assess the effectiveness of our LFC estimate in an AL application on a prototypical low-dimensional synthetic dataset, before taking on the challenging real-world task of reconstructing a quantum chemical force field for a small organic molecule and demonstrating state-of-the-art performance with a significantly reduced training demand. (Abstract) 5 Experiments In this section, we will first analyze our approach on toy-data, regarding the Mo E model, LFC, and the superior training density. Then, we apply our approach to a high-dimensional MD simulation dataset from quantum chemistry, by which we can deduce deeper insights into this regression problem. |
| Researcher Affiliation | Academia | Danny Panknin EMAIL Uncertainty, Inverse Modeling and Machine Learning Group, Berlin Institute of Technology, 10587 Berlin, Germany Physikalisch-Technische Bundesanstalt, 10587 Berlin, Germany Stefan Chmiela EMAIL Machine Learning Department, Berlin Institute of Technology, 10587 Berlin, Germany BIFOLD-Berlin Institute for the Foundations of Learning and Data, Germany Klaus-Robert Müller EMAIL Machine Learning Department, Berlin Institute of Technology, 10587 Berlin, Germany BIFOLD-Berlin Institute for the Foundations of Learning and Data, Germany Department of Artificial Intelligence, Korea University, Seoul 136-713, South Korea Max Planck Institute for Informatics, 66123 Saarbrücken, Germany Shinichi Nakajima EMAIL Machine Learning Department, Berlin Institute of Technology, 10587 Berlin, Germany BIFOLD-Berlin Institute for the Foundations of Learning and Data, Germany RIKEN AIP, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, Japan |
| Pseudocode | Yes | Algorithm 1: Superior training data process (Xn, Yn)n N with labels Yn of training inputs Xn d p GPR,n Sup Algorithm 2: (ΘH, ΣE) hyper_init(Xn0, Yn0, p0, Xval, Yval) |
| Open Source Code | Yes | 1https://github.com/DPanknin/modelagnostic_superior_training |
| Open Datasets | No | The Doppler function (see, for example, Donoho & Johnstone (1994)), which was also discussed in related work that deals with inhomogeneous complexity (Panknin et al., 2021; Bull et al., 2013). For x X = [0, 1], let P(y|x) = N(y; f(x), 1), f(x) = C p x(1 x) sin (2π(1 + ϵ)/(x + ϵ)) , where ϵ = 0.05, C is chosen such that f 2 = 7 and N( ; µ, σ2) denotes the Gaussian distribution with mean µ and variance σ2. We assume a uniform test distribution q U(X) in all Doppler function experiments. Experimental setup All experiments use an extensive pre-computed reference trajectory (almost a million data points (Xpool, Ypool)) as ground truth, as opposed to generating new data points on demand. This test setup allows a post-hoc verification of the training distribution generated by our AL approach, while still providing ample redundancy and therefore sampling freedom. (Section 5.2) |
| Dataset Splits | No | Prior to the AL procedure, we separate the validation samples Xval and test samples XT at random from the pool Xpool. We apply an initial expert training size of n0 = 29, doubling the sample size with each iteration of the AL procedure. The initial expert training set Xn0 and the gate training set XG n G are drawn via importance sampling from the remaining pool with weights bp 1/2 X (Xpool \ (Xval XT)). By this it is Xn0 q1/2, which is more in alignment with the superior training density (21) than sampling Xn0 q. (Section 5.2) |
| Hardware Specification | No | We implement our model in Py Torch (Paszke et al., 2019), using the GPy Torch-package (Gardner et al., 2018). (Section 4.4.2) ... (GPy Torch is stated to support GPU acceleration, but no specific hardware models are mentioned). |
| Software Dependencies | No | We implement our model in Py Torch (Paszke et al., 2019), using the GPy Torch-package (Gardner et al., 2018). (Section 4.4.2) ...we deploy the DGP model of Sauer et al. (2023b) using the CRAN package deepgp4. (Section 5.1) |
| Experiment Setup | Yes | We apply 512, respectively 128 IPs for the experts and the gate, which are chosen via SVGD (see Appendix E). Furthermore we apply σj = 10(j 10)/3, 1 j 7, as the expert bandwidths, λE = 20 as the initial expert regularization, and σG = 0.05 and λG = 10 for the gate. For the training, we apply a batch size of B = 512, a terminal expert sparsity κ = 2, a penalty factor of ϑσ = 0.5 for small bandwidth choices, gate noise parameters s0 = 0.1 and ηs = 1/2, and learning rate parameters η = 0.01, ηH = 0.2, ηG = 1. (Section 5.1) We apply dense s GDML experts with Σj = σjΣE, where σj = 2 5/4+j/2, 1 j 8 as the individual expert bandwidths, λE = 1 as the initial expert regularization, and σG = 0.1 and λG = 104 for the sparse gate with 1024 IPs. For the training, we apply a batch size of B = 1024, a terminal expert sparsity κ = 8, a penalty factor of ϑσ = 0.01 for small bandwidth choices, gate noise parameters s0 = 0.01 and ηs = 1/2, and learning rate parameters η = 0.005, ηH = 0.05, ηG = 0.1. (Section 5.2) |