A Self-Supervised Framework for Function Learning and Extrapolation

Authors: Simon Segert, Jonathan Cohen

TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that our choice of encoder and training procedure learns representations that perform better on a collection of downstream function learning and generalizaton tasks than do comparison models for learning and/or representing time series. This should be of particular interest to the field of semi-supervised learning, since works in that field have not yet systematically analyzed time series that correspond to intuitive functions. Moreover, we directly compare the generalization patterns of the model with those of humans asked to perform a multiple-choice extrapolation paradigm modeled after an empirical study by Schulz et al. (2017). We find that the model exhibits a qualitatively similar bias as people in this setting, namely, a greater accuracy on functions that are compositionally structured.
Researcher Affiliation Academia Simon N. Segert EMAIL Princeton Neuroscience Institute Princeton University Jonathan D. Cohen EMAIL Princeton Neuroscience Insitute Princeton University
Pseudocode No The paper describes methods and architectures but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement about open-sourcing the code or a link to a code repository for the described methodology.
Open Datasets No To evaluate the ability of the encoder to learn a representation of intuitive functions, we generated and trained it on two types of functions: one generated from the family of 13 kernels defined by the CG (see Section 2.2); and the other (used as a control) from a non-compositional SM Kernel, for a total of 14 kernels.
Dataset Splits Yes We report the accuracy of each such classifier on a collection of 2800 held-out curves (200 per class). ... We report the accuracy on a collection of 400 held-out curves (200 per class). ... We report the average values for 4200 held-out curves (300 per class). ... In our design, half of the prompt curves were sampled from the CG, meaning that zi assumes each of the two values with 50 percent probability.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No All encoders were trained using a batch size of 512, with an Adam optimizer with learning rate of .001 and weight decay of 10-6. ... We fit the head using the SGDClassifier class from scikit-learn. ... All heads were trained on a cross-entropy loss using the Adam optimizer with a learning rate of .01. The paper mentions software tools like Adam optimizer and scikit-learn, but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes All encoders were trained using a batch size of 512, with an Adam optimizer with learning rate of .001 and weight decay of 10-6. All encoders were exposed to 500,000 curves during training. ... We set n2 = n3 = 128 and = .5. ... All heads were trained on a cross-entropy loss using the Adam optimizer with a learning rate of .01. ... We set L = 20 in all cases. ... Additionally, we separately chose an L2 penalty for each head using cross validation.