reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Robust Hybrid Learning With Expert Augmentation

Authors: Antoine Wehenkel, Jens Behrmann, Hsiang Hsu, Guillermo Sapiro, Gilles Louppe, Joern-Henrik Jacobsen

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate the expert augmentation on three controlled experiments modelling dynamical systems with ordinary and partial diﬀerential equations. Finally, we assess the potential real-world applicability of expert augmentation on a dataset of a real double pendulum. Our experiments on various controlled problems demonstrate that AHMs improve the generalization capabilities of state-of-the-art hybrid learning algorithms on synthetic and real-world data in the amortize setting. Section 4 is titled 'Experiments', and it includes subsections like 'Synthetic experiments' and 'A real world dataset the double pendulum', along with 'Results' which presents 'average log-MSEs over 10 runs' (Figure 5) and 'mean relative precision (in %, indicates one standard deviation) over 10 runs' (Figure 6).
Researcher Affiliation	Collaboration	Antoine Wehenkel EMAIL Apple Jens Behrmann EMAIL Apple Hsiang Hsu EMAIL Harvard Guillermo Sapiro EMAIL Apple Gilles Louppe EMAIL University of Liège Jörn-Henrik Jacobsen EMAIL Apple
Pseudocode	Yes	Algorithm 1 Expert augmented hybrid learning
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository or indicate code availability in supplementary materials. The Open Review link is for peer review, not code.
Open Datasets	Yes	The dataset of a double pendulum introduced by Asseman et al. (2018) contains 21 videos of the pendulum shown in Figure 4a.
Dataset Splits	Yes	We create a dataset with many initial conditions by splitting the videos into consecutive chunks of 20 frames sub-sampled at 100Hz, i.e., 200ms of video. We construct a distribution shift, as shown in Figure 11 from Appendix C.5, over the expert variables ze by splitting each 40 seconds sequence into three parts. The training set only contains chunks from the last 16 seconds of each run. It corresponds to conﬁgurations with smaller energy and, thus, slower angular speeds than the test set, which only contains frames from the ﬁrst 12 seconds. The validation set contains the remaining 12 seconds of frames in the middle.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It mentions training models but no hardware environment.
Software Dependencies	No	The paper mentions methods like 'Neural ODEs' and algorithms like 'APHYNITY' and 'Hybrid-VAE' but does not specify any software libraries, frameworks, or solvers with version numbers (e.g., PyTorch 1.9, Python 3.8).
Experiment Setup	No	The paper states: 'For all experiments we train the models to maximize p ,Â(y = y1:t1\|x = y0) on the training data. We validate and test the models on the predictive distribution p(y = y1:t2\|x = y0, xo = y0, yo = y1:t1), where t2 > t1 assesses the generalization over time. The best models are always selected based on validation performance, that is with samples from Ω. In our experiments we use a Gaussian distribution for the posterior, which is equivalent to a mean squared error (MSE) loss on the physical parameters.' However, it lacks specific hyperparameters such as learning rate, batch size, optimizer type, or number of training epochs, which are crucial for reproducibility.