reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Unsupervised Learning of Neurosymbolic Encoders

Authors: Eric Zhan, Jennifer J. Sun, Ann Kennedy, Yisong Yue, Swarat Chaudhuri

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on learning latent representations for real-world trajectory data from animal biology and sports analytics. We show that our approach offers significantly better separation of meaningful categories than standard VAEs and leads to practical gains on downstream analysis tasks, such as for behavior classification.
Researcher Affiliation	Academia	Eric Zhan* EMAIL California Institute of Technology Jennifer J. Sun* EMAIL California Institute of Technology Ann Kennedy Northwestern University Feinberg School of Medicine Yisong Yue California Institute of Technology Swarat Chaudhuri University of Texas at Austin
Pseudocode	Yes	Algorithm 1 Learning a neurosymbolic encoder 1: Input: program space P, program graph G 2: initialize ϕ, ψ, θ, α = α0 (empty architecture) 3: while α is not complete do 4: ϕ, ψ, θ optimize Eq. 2 with α fixed 5: (α, ψ) optimize Eq. 3 6: end while 7: ϕ, ψ, θ optimize Eq. 2 with complete α 8: Return: encoder {qϕ, q(α,ψ)} Algorithm 2 Learning a neurosymbolic encoder with k programs 1: Input: program space P, program graph G, k 2: for i = 1..k do 3: fix programs {q(α1,ψ1), . . . , q(αi 1,ψi 1)} 4: execute Algorithm 1 to learn q(αi,ψi) 5: remove q(αi,ψi) from P to avoid redundancies 6: end for 7: Return: encoder {qϕ, q(α1,ψ1), . . . , q(αk,ψk)}
Open Source Code	Yes	Code can be found at https://github.com/ezhan94/neurosymbolic-encoders.
Open Datasets	Yes	Our primary real-world dataset is the Cal MS21 dataset (Sun et al., 2021a), containing trajectories of socially interacting mice captured for neuroscience experiments. We use the same basketball dataset as in Shah et al. (2020) and Zhan et al. (2020) that tracks professional basketball players.
Dataset Splits	Yes	We generate 10k/2k/2k trajectories of length 25 for train/validation/test. We have 231k/52k/262k trajectories of length 21 for train/val/test. In total, we have 177k/31k/27k trajectories for train/val/test.
Hardware Specification	Yes	Experiments were run locally with an Intel 3.6-GHz i7-7700 CPU with 4 cores and an NVIDIA GTX 1080 Ti GPU with 3584 CUDA cores. The experiments are ran on Amazon EC2 with an Intel 2.3 GHz Xeon CPU with 4 cores equipped with a NVIDIA Tesla M60 GPUs with 2048 CUDA cores.
Software Dependencies	No	We used the Adam Kingma & Ba (2014) optimizer for all training runs.
Experiment Setup	Yes	The hyperparameters for our approach are in Tables 6, 7 and the hyperparameters for baselines are in Table 8. We used the Adam Kingma & Ba (2014) optimizer for all training runs.