reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Visually Grounded Domain Ontologies via Embodied Conversation and Explanation

Authors: Jonghyuk Park, Alex Lascarides, Subramanian Ramamoorthy

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that teacher-learner pairs utilizing explanations and corrections are more data-efficient than those without such a faculty. Our experiments demonstrate that strategies exploiting agent explanations in this way accomplish significantly better performance after the same number of training examples, compared to baseline strategies that do not especially when the learner s initial model for recognizing object parts is deficient.
Researcher Affiliation	Academia	Jonghyuk Park, Alex Lascarides, Subramanian Ramamoorthy School of Informatics, The University of Edinburgh EMAIL
Pseudocode	No	The paper describes the architecture and processes in detail, including how information is translated into logic programs and factor graphs. However, it does not present any structured pseudocode or algorithm blocks with formal steps.
Open Source Code	Yes	Code https://github.com/jpstyle/ns-arch-unity
Open Datasets	No	The paper states: "The classification target concepts in our experiments are fine-grained types of toy truck as illustrated in Fig. 1a." and "We instantiate the framework in an example testbed scenario where a learner agent must acquire and distinguish among a set of novel visual concepts, namely a range of fine-grained types of toy vehicle." It refers to 3D models and a simulated domain, but no explicit dataset name, link, or public availability information is provided for the toy truck dataset.
Dataset Splits	No	Our evaluation metric is cumulative regret; i.e., the accumulated number of mistakes made across a series of 120 interaction episodes, where subtypes and visual features are randomly sampled for each training instance.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper describes the conceptual modules of the neurosymbolic architecture (vision processing, language processing, long-term memory, symbolic reasoning) and mentions using probabilistic graphical models and normal logic programs, but it does not specify any particular software libraries, frameworks, or their version numbers.
Experiment Setup	Yes	We fix Ud = Ua = 0.99 in our experiments. We conduct a suite of experiments to assess the data-efficiency of the three different interaction strategies during FGVC training: Vis-Only, Vis+Genr and Vis+Genr+Expl. We run experiments with three initial part recognition accuracies: LQ/MQ/HQ (low-/medium-/high-quality). Our evaluation metric is cumulative regret; i.e., the accumulated number of mistakes made across a series of 120 interaction episodes, where subtypes and visual features are randomly sampled for each training instance.