Estimating Density Models with Truncation Boundaries using Score Matching
Authors: Song Liu, Takafumi Kanamori, Daniel J. Williams
JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the usefulness of our method via numerical experiments and a study on the Chicago crime data set. We also show that the proposed density estimation can correct the outlier-trimming bias caused by aggressive outlier detection methods. Section 8 is titled "Numerical and Real-world Data Analysis" and includes various experiments with datasets and performance comparisons. |
| Researcher Affiliation | Academia | Song Liu EMAIL University of Bristol; Takafumi Kanamori EMAIL Tokyo Institute of Technology, RIKEN AIP; Daniel J. Williams EMAIL University of Bristol. All listed affiliations are academic institutions. |
| Pseudocode | No | The paper describes its methodology using mathematical formulations and descriptive text, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code and data sets to reproduce our experiments are available at https://github.com/anewgithubname/ Truncated-Score-Matching. |
| Open Datasets | Yes | We demonstrate the usefulness of our method via numerical experiments and a study on the Chicago crime data set. The code and data sets to reproduce our experiments are available at https://github.com/anewgithubname/ Truncated-Score-Matching. We also experiment on a real-world data set, CIFAR-10, which contains ten different classes of 32 by 32 images. |
| Dataset Splits | No | The paper mentions generating samples (e.g., "We generate 10,000 samples, only 1417 of which can be used for parameter estimation"), and for CIFAR-10, it states using a "hold-out likelihood". However, specific percentages, absolute counts, or explicit methodologies for splitting data into training, validation, and test sets for reproducibility are not provided in the main text. |
| Hardware Specification | Yes | Our experiments are run on a workstation with an AMD Ryzen 1700 CPU with 32GB memory. |
| Software Dependencies | No | We optimize both objective functions using MATLAB s fminunc function with default settings. While MATLAB is mentioned, a specific version number for MATLAB itself is not provided. |
| Experiment Setup | Yes | Our unnormalized density model is a Gaussian mixture model with four components (parametrized by θ1, . . . , θ4) and the unit variance-covariance matrix: pθ1,...,θ4(x) = P4 i=1 Nx(θi, I). In this experiment, 500,000 particles are used to approximate ZV (θ). We fit a Gaussian mixture model with two components on this data set. The standard deviations of the two components are fixed to the same value, roughly half of the width of the city. The outlier percentage (ν) in OSVM is set to 20%. |