Megaman: Scalable Manifold Learning in Python
Authors: James McQueen, Marina Meilă, Jacob VanderPlas, Zhongyue Zhang
JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In benchmarks, even on a single-core desktop computer, our code embeds millions of data points in minutes, and takes just 200 minutes to embed the main sample of galaxy spectra from the Sloan Digital Sky Survey consisting of 0.6 million samples in 3750-dimensions a task which has not previously been possible. We display total embedding time (including time to compute the graph G, the Laplacian matrix and the embedding 2) for megaman versus scikit-learn, as the number of samples N varies or the data dimension D varies (Figure 1). All benchmark computations were performed on a single desktop computer running Linux with 24.68GB RAM and a Quad-Core 3.07GHz Intel Xeon CPU. We use a relatively weak machine to demonstrate that our package can be reasonably used without high performance hardware. The experiments show that megaman scales considerably better than scikit-learn |
| Researcher Affiliation | Academia | James Mc Queen EMAIL Department of Statistics University of Washington Seattle, WA 98195-4322, USA Marina Meil a EMAIL Department of Statistics University of Washington Seattle, WA 98195-4322, USA Jacob Vander Plas EMAIL e-Science Institute University of Washington Seattle, WA 98195-4322, USA Zhongyue Zhang EMAIL Department of Computer Science and Engineering University of Washington Seattle, WA 98195-4322, USA |
| Pseudocode | No | The paper provides actual Python code snippets in Section 4 'Quick start' to demonstrate library usage, rather than pseudocode or algorithm blocks outlining the methodology. |
| Open Source Code | Yes | megaman is publicly available at: https://github.com/mmp2/megaman. |
| Open Datasets | Yes | The word2vec data used were from Google News-vectors-negative300.bin.gz which can be downloaded from https://code.google.com/archive/p/word2vec/. The Sloan Digital Sky Survey data can be downloaded from www.sdss.org. |
| Dataset Splits | No | The paper mentions using datasets like 'Swiss Roll', 'word2vec', and 'Sloan Digital Sky Survey' for benchmarking, but it does not specify any training, validation, or test splits. It only refers to varying the number of samples (N) or dimensions (D). |
| Hardware Specification | Yes | All benchmark computations were performed on a single desktop computer running Linux with 24.68GB RAM and a Quad-Core 3.07GHz Intel Xeon CPU. |
| Software Dependencies | No | megaman s required dependencies are numpy, scipy, and scikit-learn, but for optimal performance FLANN, cython, pyamg and the C compiler gcc are also required. The paper lists software dependencies but does not provide specific version numbers for them. |
| Experiment Setup | Yes | from megaman.geometry import Geometry from megaman.embedding import Spectral Embedding ... radius = 1.1 ... geom = Geometry( adjacency_kwds = { radius :3* radius}, ... affinity_kwds = { radius : radius}, ... SE = Spectral Embedding ( n_components = 2, eigen_solver = amg , geom=geom) |