Pseudo-Spherical Contrastive Divergence
Authors: Lantao Yu, Jiaming Song, Yang Song, Stefano Ermon
NeurIPS 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we demonstrate the effectiveness of PS-CD on several 1-D and 2-D synthetic datasets as well as commonly used image datasets. |
| Researcher Affiliation | Academia | Lantao Yu Computer Science Department Stanford University EMAIL Jiaming Song Computer Science Department Stanford University EMAIL Yang Song Computer Science Department Stanford University EMAIL Stefano Ermon Computer Science Department Stanford University EMAIL |
| Pseudocode | Yes | Algorithm 1 Pseudo-Spherical Contrastive Divergence. 1: Input: Empirical data distribution pdata. Pseudo-spherical scoring rule hyperparameter γ. 2: Initialize energy function Eθ. 3: repeat 4: Draw a minibatch of samples {x+ 1 , . . . , x+ N} from pdata. 5: Draw a minibatch of samples {x 1 , . . . , x N} from qθ exp( Eθ) (e.g., using Langevin dynamics with a sample replay buffer). 6: Update the energy function by stochastic gradient descent: \theta\hat{L}_N^{\gamma} (\theta; p) = \theta \frac{1}{N} \sum_{i=1}^N \exp(\gamma E_{\theta}(x^+_i)) \left( \frac{\sum_{i=1}^N \exp(-\gamma E_{\theta}(x^-_i)) \nabla_{\theta} E_{\theta}(x^-_i)}{\sum_{i=1}^N \exp(-\gamma E_{\theta}(x^-_i))} \right) 7: until Convergence |
| Open Source Code | Yes | In Appendix A, we also provide a simple Py Torch implementation for stochastic gradient descent (SGD) with the gradient estimator in Equation (19). |
| Open Datasets | Yes | To test the practical usefulness, we use MNIST [54], CIFAR-10 [48] and Celeb A [57] in our experiments for modeling natural images. |
| Dataset Splits | Yes | For quantitative evaluation of the 2-D synthetic data experiments, we follow [79] and report the maximum mean discrepancy (MMD, [5]) between the generated samples and validation samples in Table 3 in App. D.1, which demonstrates that PS-CD outperforms its CD counterpart on all but the Funnel dataset. We conduct similar experiments on MNIST and CIFAR-10 datasets, where we use uniform noise as the contamination distribution and the contamination ratio is 0.1 (i.e. 10% images in the training set are replaced with random noise). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'PyTorch' for implementation but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | More experimental details about the data processing, model architectures, sampling strategies and additional experimental results can be found in App. D. For Celeb A, we use a simple 5-layer CNN architecture. For all experiments, we use Langevin dynamics with K=100 MCMC steps to sample from EBMs. We use Adam optimizer with learning rate 1e-4 and batch size 64. |