reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Self-Supervised Framework for Function Learning and Extrapolation

Authors: Simon Segert, Jonathan Cohen

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that our choice of encoder and training procedure learns representations that perform better on a collection of downstream function learning and generalizaton tasks than do comparison models for learning and/or representing time series. This should be of particular interest to the ﬁeld of semi-supervised learning, since works in that ﬁeld have not yet systematically analyzed time series that correspond to intuitive functions. Moreover, we directly compare the generalization patterns of the model with those of humans asked to perform a multiple-choice extrapolation paradigm modeled after an empirical study by Schulz et al. (2017). We ﬁnd that the model exhibits a qualitatively similar bias as people in this setting, namely, a greater accuracy on functions that are compositionally structured.
Researcher Affiliation	Academia	Simon N. Segert EMAIL Princeton Neuroscience Institute Princeton University Jonathan D. Cohen EMAIL Princeton Neuroscience Insitute Princeton University
Pseudocode	No	The paper describes methods and architectures but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statement about open-sourcing the code or a link to a code repository for the described methodology.
Open Datasets	No	To evaluate the ability of the encoder to learn a representation of intuitive functions, we generated and trained it on two types of functions: one generated from the family of 13 kernels deﬁned by the CG (see Section 2.2); and the other (used as a control) from a non-compositional SM Kernel, for a total of 14 kernels.
Dataset Splits	Yes	We report the accuracy of each such classiﬁer on a collection of 2800 held-out curves (200 per class). ... We report the accuracy on a collection of 400 held-out curves (200 per class). ... We report the average values for 4200 held-out curves (300 per class). ... In our design, half of the prompt curves were sampled from the CG, meaning that zi assumes each of the two values with 50 percent probability.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	All encoders were trained using a batch size of 512, with an Adam optimizer with learning rate of .001 and weight decay of 10-6. ... We ﬁt the head using the SGDClassiﬁer class from scikit-learn. ... All heads were trained on a cross-entropy loss using the Adam optimizer with a learning rate of .01. The paper mentions software tools like Adam optimizer and scikit-learn, but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	All encoders were trained using a batch size of 512, with an Adam optimizer with learning rate of .001 and weight decay of 10-6. All encoders were exposed to 500,000 curves during training. ... We set n2 = n3 = 128 and = .5. ... All heads were trained on a cross-entropy loss using the Adam optimizer with a learning rate of .01. ... We set L = 20 in all cases. ... Additionally, we separately chose an L2 penalty for each head using cross validation.