Morpho-MNIST: Quantitative Assessment and Diagnostics for Representation Learning

Authors: Daniel C. Castro, Jeremy Tan, Bernhard Kainz, Ender Konukoglu, Ben Glocker

JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To address this issue we introduce Morpho-MNIST, a framework that aims to answer: to what extent has my model learned to represent specific factors of variation in the data? We extend the popular MNIST dataset by adding a morphometric analysis enabling quantitative comparison of trained models, identification of the roles of latent variables, and characterisation of sample diversity. We further propose a set of quantifiable perturbations to assess the performance of unsupervised and supervised methods on challenging tasks such as outlier detection and domain adaptation.
Researcher Affiliation Academia Daniel C. Castro EMAIL Jeremy Tan EMAIL Bernhard Kainz EMAIL Biomedical Image Analysis Group Imperial College London London SW7 2AZ, United Kingdom Ender Konukoglu EMAIL Computer Vision Laboratory ETH Zürich 8092 Zürich, Switzerland Ben Glocker EMAIL Biomedical Image Analysis Group Imperial College London London SW7 2AZ, United Kingdom
Pseudocode No The paper describes a "Processing Pipeline" in Section 2.1 with numbered steps but does not present it as a formal pseudocode or algorithm block. The steps are described in natural language.
Open Source Code Yes Data and code are available at https://github.com/dccastro/Morpho-MNIST.
Open Datasets Yes We extend the popular MNIST dataset by adding a morphometric analysis... Data and code are available at https://github.com/dccastro/Morpho-MNIST. ...The MNIST (modified NIST) dataset (Le Cun et al., 1998) was constructed from handwritten digits in NIST Special Databases 1 and 3, now released as Special Database 19 (Grother and Hanaoka, 2016).
Dataset Splits Yes Figure A.1: Distribution of morphological attributes for plain MNIST digits. Top: training set; bottom: test set. ...We take the global dataset, restrict the training data to digits whose thickness is below 3 px, and evaluate the model s response to test digits across the entire thickness range...
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions 'scikit-image' and 'scikit-image defaults (van der Walt et al., 2014)' but does not provide specific version numbers for this or any other software library used.
Experiment Setup Yes We train a β-VAE with β = 4, as in Higgins et al. (2017) s experiments on small binary images, and a vanilla GAN with non-saturating loss, both with 64-dimensional latent space. ...Models were trained for 20 epochs using 64 images per batch, with no hyperparameter tuning. ...k-nearest-neighbours (k NN) using k = 5 neighbours and ℓ1 distance weighting, a support vector machine (SVM) with polynomial kernel and penalty parameter C = 100, a multi-layer perceptron (MLP) with 784 200 200 L architecture (L: number of outputs), and a Le Net-5 (Le Cun et al., 1998) convolutional neural network (CNN).