General Latent Feature Models for Heterogeneous Datasets
Authors: Isabel Valera, Melanie F. Pradier, Maria Lomeli, Zoubin Ghahramani
JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show the flexibility of the proposed model by solving both prediction and data analysis tasks on several real-world datasets. [...] In this section, we apply the proposed model to solve two different tasks on several realworld datasets. In Section 5.1, we focus on a prediction task in which we aim to estimate and replace the missing data, which is assumed to be missing completely at random. [...] In Section 5.2, we focus on a data analysis task on several real-world datasets from different application domains such as medicine, psychiatry, clinical trials and politics. |
| Researcher Affiliation | Collaboration | Isabel Valera EMAIL Department of Computer Science Saarland University Saarbr ucken, Germany; and Max Planck Institute for Intelligent Systems T ubingen, Germany; Melanie F. Pradier EMAIL School of Engineering and Applied Sciences Harvard University Cambridge, USA Maria Lomeli EMAIL Department of Engineering University of Cambridge Cambridge, UK Zoubin Ghahramani EMAIL Department of Engineering University of Cambridge Cambridge, UK; and Uber AI, San Francisco, California, USA |
| Pseudocode | Yes | Algorithm 1 Inference Algorithm. Input: X Initialize: Z and {Yd}D d=1 1: for each iteration do 2: Update Z given {Yd}D d=1 as detailed in Section 4.1. 3: for d = 1, . . . , D do 4: Sample Bd given Z and Yd according to (9). 5: Sample Yd given X, Z and Bd as shown in Section 4.2. 6: Sample Ψd (if needed) as shown in Section 4.2. 8: end for Output: Z, {Bd}D d=1 and {Ψd}D d=1 |
| Open Source Code | Yes | Finally, a software package, called GLFM toolbox, is made publicly available for other researchers to use and extend. It is available at https://ivaleram.github.io/GLFM/. [...] The source software package is publicly available at https://github.com/ivalera M/ GLFM, that provides users with the necessary functions and scripts to use the GLFM for both missing data estimation and data exploration tasks. |
| Open Datasets | Yes | We evaluate the predictive power of the proposed model at estimating missing data on five real datasets, which are summarized in Table 1. The datasets contain different numbers of objects and attributes, which cover all the discrete and continuous variables described in Section 3. ... Statlog German credit dataset (Eggermont et al., 2004) ... QSAR biodegradation dataset (Mansouri et al., 2013) ... Internet usage survey dataset (Centre, 2014) ... Wine quality dataset (Cortez et al., 2009) ... Nesarc dataset (Ruiz et al., 2013) |
| Dataset Splits | Yes | Each value in Figure 2 was obtained by averaging the results across 20 independently split sets where the missing values were randomly chosen. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models or memory) were mentioned for running experiments. The paper focuses on the software implementation and theoretical aspects of the model, along with experimental results on various datasets. |
| Software Dependencies | No | The core inference algorithm is developed in C++, and the corresponding user interfaces are provided in Matlab, Python and R. [...] Finally, our implementation of the GLFM makes use of the GNU Scientific Library (GSL). No specific version numbers were provided for C++, Matlab, Python, R, or GSL. |
| Experiment Setup | Yes | In the GLFM model, for real positive and/or count data, we consider the following transformation that maps from the real numbers to the real positive numbers, f(x) = log(exp(wx) + 1). We select the parameter w such that the data is scaled to a common range. For each dataset we run 5,000 iterations of the proposed MCMC sampler from Section 4. ... In our experiments, we sample the variance of the pseudo-observations in each dimension and choose the parameter values as follows: α = 5, σ2 B = 1, and σ2 θ = 1. We also consider the following transformation that maps from the real numbers to the positive real numbers, for the positive real and count data: f(x) = log(w (x µ) + 1), where µ = min(xd) and w = 2/std(xd) are data-driven parameters whose objective is to shift and scale the data. In order to obtain more interpretable results, we also activated the bias term, as explained in Section 5.2.1. |