reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CoLLIE: Continual Learning of Language Grounding from Language-Image Embeddings

Authors: Gabriel Skantze, Bram Willemsen

JAIR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We verify the model s performance on two diﬀerent tasks of identifying the targets of referring expressions, where it has to learn new language use. The results show that the model can eﬃciently learn and generalize from only a few examples, with little interference with the model s original zero-shot performance.
Researcher Affiliation	Academia	Gabriel Skantze EMAIL Bram Willemsen EMAIL KTH Royal Institute of Technology, Stockholm, Sweden
Pseudocode	No	The paper describes the Co LLIE transformation using mathematical equations and figures (Figure 4, 5) to illustrate its components (adjustment function a(T), scaling function s(T)) and how they are trained (e.g., "a(T) = βT + m", "We learn s using a regression model"), but it does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	The code used for running the experiments and reproducing the results in this paper is provided on Git Hub (https://github.com/ gabriel-skantze/Co LLIE), including necessary data or pointers to data.
Open Datasets	Yes	First, we use the LAD dataset (Large-scale Attribute Dataset) by Zhao et al. (2018), from which we selected a set of 200 categories... we also use the images from the KTH Tangrams dataset (Shore et al., 2018), which were used for the task depicted in Figure 2.
Dataset Splits	Yes	We randomly select a set of N categories, Ctrain, (out of the 200 categories) for which we want to teach the model new names. ... At the beginning of each round, we randomly select one image for each of the 200 categories, without ever reusing images between rounds. ... At the end of each round, we add the images from Ctrain and their associated pseudo-words as training examples (i.e., one example per category) to the model, and retrain it. ... This whole procedure is repeated over 50 iterations (with new pseudo-words and categories randomly selected and assigned), in order to get a smooth average performance per round.
Hardware Specification	Yes	On an Intel Core i7-1065G7 CPU, one iteration of Experiment II (i.e., 30 model updates) takes about 1 second for the standard Co LLIE model.
Software Dependencies	No	The models were implemented using scikit-learn (https://scikit-learn.org/) with standard parameters unless stated otherwise.
Experiment Setup	Yes	We learn A using linear regression: a(T) = βT + m, β R512 512, m R512. To avoid overﬁtting (given the limited number of training examples) we use ridge regression (L2 regularization with λ = 0.001). ... For our initial tests, we learn s using support vector regression (SVR) with a linear kernel (coerced in the range [0, 1]). ... The KNN regressor also shows a good performance.