Unsupervised Learning of Neurosymbolic Encoders

Authors: Eric Zhan, Jennifer J. Sun, Ann Kennedy, Yisong Yue, Swarat Chaudhuri

TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on learning latent representations for real-world trajectory data from animal biology and sports analytics. We show that our approach offers significantly better separation of meaningful categories than standard VAEs and leads to practical gains on downstream analysis tasks, such as for behavior classification.
Researcher Affiliation Academia Eric Zhan* EMAIL California Institute of Technology Jennifer J. Sun* EMAIL California Institute of Technology Ann Kennedy Northwestern University Feinberg School of Medicine Yisong Yue California Institute of Technology Swarat Chaudhuri University of Texas at Austin
Pseudocode Yes Algorithm 1 Learning a neurosymbolic encoder 1: Input: program space P, program graph G 2: initialize ϕ, ψ, θ, α = α0 (empty architecture) 3: while α is not complete do 4: ϕ, ψ, θ optimize Eq. 2 with α fixed 5: (α, ψ) optimize Eq. 3 6: end while 7: ϕ, ψ, θ optimize Eq. 2 with complete α 8: Return: encoder {qϕ, q(α,ψ)} Algorithm 2 Learning a neurosymbolic encoder with k programs 1: Input: program space P, program graph G, k 2: for i = 1..k do 3: fix programs {q(α1,ψ1), . . . , q(αi 1,ψi 1)} 4: execute Algorithm 1 to learn q(αi,ψi) 5: remove q(αi,ψi) from P to avoid redundancies 6: end for 7: Return: encoder {qϕ, q(α1,ψ1), . . . , q(αk,ψk)}
Open Source Code Yes Code can be found at https://github.com/ezhan94/neurosymbolic-encoders.
Open Datasets Yes Our primary real-world dataset is the Cal MS21 dataset (Sun et al., 2021a), containing trajectories of socially interacting mice captured for neuroscience experiments. We use the same basketball dataset as in Shah et al. (2020) and Zhan et al. (2020) that tracks professional basketball players.
Dataset Splits Yes We generate 10k/2k/2k trajectories of length 25 for train/validation/test. We have 231k/52k/262k trajectories of length 21 for train/val/test. In total, we have 177k/31k/27k trajectories for train/val/test.
Hardware Specification Yes Experiments were run locally with an Intel 3.6-GHz i7-7700 CPU with 4 cores and an NVIDIA GTX 1080 Ti GPU with 3584 CUDA cores. The experiments are ran on Amazon EC2 with an Intel 2.3 GHz Xeon CPU with 4 cores equipped with a NVIDIA Tesla M60 GPUs with 2048 CUDA cores.
Software Dependencies No We used the Adam Kingma & Ba (2014) optimizer for all training runs.
Experiment Setup Yes The hyperparameters for our approach are in Tables 6, 7 and the hyperparameters for baselines are in Table 8. We used the Adam Kingma & Ba (2014) optimizer for all training runs.