Kernel Operations on the GPU, with Autodiff, without Memory Overflows
Authors: Benjamin Charlier, Jean Feydy, Joan Alexis Glaunès, François-David Collin, Ghislain Durif
JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To illustrate this, we compare Ke Ops to similar scientific computing libraries Py Torch, Tensor Flow, Halide (Ragan-Kelley et al., 2017) and TVM (Chen et al., 2018) on a simple benchmark: the Gaussian kernel matrix-vector product of Eq. (1) with an increasing number of points M = N in dimension D = 3. All experiments are performed with float32 precision on a Nvidia RTX 2080 Ti GPU |
| Researcher Affiliation | Academia | Benjamin Charlier EMAIL IMAG Universit e de Montpellier, CNRS Montpellier, France Jean Feydy EMAIL DMA Ecole Normale Sup erieure Paris, France Joan Alexis Glaun es EMAIL MAP5 Universit e de Paris, CNRS Paris, France Franc ois-David Collin EMAIL Ghislain Durif EMAIL IMAG Universit e de Montpellier, CNRS Montpellier, France |
| Pseudocode | No | For instance, we can specify the Gaussian matrix-vector product of Eq. (1) with: 1 from pykeops.torch import Lazy Tensor # Wrapper for Py Torch Tensors 2 x_i = Lazy Tensor(x[:,None,:]) # (M,D) Tensor -> (M,1,D) Symbolic Tensor 3 y_j = Lazy Tensor(y[None,:,:]) # (N,D) Tensor -> (1,N,D) Symbolic Tensor 4 D_ij = ((x_i y_j)**2).sum(dim=2) # (M,N,1) Symbolic matrix of squared distances 5 K_ij = (D_ij / (2 * s**2)).exp() # (M,N,1) Symbolic Gaussian kernel matrix 6 a = K_ij @ b # Genuine torch Tensor. (M,N,1) @ (N,D) = (M,D). The paper provides a Python code snippet demonstrating usage rather than formal pseudocode or an algorithm block. |
| Open Source Code | Yes | Ke Ops brings graphics-like performances for kernel methods and is freely available on standard repositories (Py Pi, CRAN). To showcase its versatility, we provide tutorials in a wide range of settings online at www.kernel-operations.io. code is available on our repository in the benchmarks folder. |
| Open Datasets | No | The paper states "on a simple benchmark: the Gaussian kernel matrix-vector product of Eq. (1) with an increasing number of points M = N in dimension D = 3". This describes how data was generated or parameterized for the benchmark, not a specific external dataset that is publicly available and requires access information. |
| Dataset Splits | No | The paper uses a synthetic benchmark dataset ("Gaussian kernel matrix-vector product of Eq. (1) with an increasing number of points M = N in dimension D = 3") and therefore does not discuss train/test/validation splits. |
| Hardware Specification | Yes | All experiments are performed with float32 precision on a Nvidia RTX 2080 Ti GPU, with the exception of the Py Torch-TPU column that was run in Google Colab |
| Software Dependencies | No | Ke Ops combines optimized C++/CUDA schemes with binders for high-level languages: Python (Numpy and Py Torch), Matlab and GNU R. Binaries are then compiled and stored on the hard drive for later use: compilation relies on the standard CUDA stack (nvcc, gcc and/or clang compilers) and is only performed once per reduction. The paper mentions software components and compilation tools but does not provide specific version numbers for these dependencies to enable exact reproduction. |
| Experiment Setup | Yes | All experiments are performed with float32 precision on a Nvidia RTX 2080 Ti GPU, with the exception of the Py Torch-TPU column that was run in Google Colab. The benchmark itself is defined as "the Gaussian kernel matrix-vector product of Eq. (1) with an increasing number of points M = N in dimension D = 3". |