Integrating Random Effects in Deep Neural Networks
Authors: Giora Simchoni, Saharon Rosset
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach which we call LMMNN is demonstrated to improve performance over natural competitors in various correlation scenarios on diverse simulated and real datasets. |
| Researcher Affiliation | Academia | Giora Simchoni EMAIL Saharon Rosset EMAIL Department of Statistics and Operations Research Tel Aviv University Tel Aviv, Israel, 69978 |
| Pseudocode | No | The paper describes mathematical formulations and procedures but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps. |
| Open Source Code | Yes | Our code is available at https://github.com/gsimchoni/lmmnn. |
| Open Datasets | Yes | Table 6: Real datasets with K categorical features: summary table (Imdb Kaggle Free Wrandrall (2021), News UCI ML Free Moniz and Torgo (2018), Inst Eval lme4 Free Bates et al. (2015), Spotify Tidy Tuesday Free Mock (2022), UKB-blood UK Biobank Authorized Sudlow et al. (2015)). |
| Dataset Splits | Yes | We perform 5 iterations for each (q, σ2 b, g) combination (27 combinations in total), in which we sample the data, randomly split it into training (80%) and testing (20%). |
| Hardware Specification | Yes | All experiments in this paper were implemented in Python using Keras (Chollet et al., 2015) and Tensorflow (Abadi et al., 2015), run on Google Colab with NVIDIA Tesla V100 GPU machines. |
| Software Dependencies | No | The paper mentions 'Python using Keras (Chollet et al., 2015) and Tensorflow (Abadi et al., 2015)' but does not specify exact version numbers for Keras or TensorFlow. |
| Experiment Setup | Yes | We use the same DNN architecture for all neural networks, that is 4 hidden layers with 100, 50, 25, 12 neurons, a Dropout of 25% in each, a ReLU activation and a final output layer with a single neuron. ... In all experiments in this paper we use a batch size of 100 and an early stopping rule where training is stopped if no improvement in 10% validation loss is seen within 10 epochs, up to a maximum of 500 epochs. For prediction in LMMNN, in case g(Z) = Z the formula in (7) is used adjusted for LMMNN output ˆf(Xtr), and when g(Z) = ZW we sample 10000 observations when calculating (17), in order to avoid inverting V (θ) which is of dimension 80000 80000. We initialize both ˆσ2 e, ˆσ2 b to be 1.0 where appropriate: R s lme4 and LMMNN, and compare the resulting final estimates for these two methods. |