Learning Visually Grounded Domain Ontologies via Embodied Conversation and Explanation

Authors: Jonghyuk Park, Alex Lascarides, Subramanian Ramamoorthy

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that teacher-learner pairs utilizing explanations and corrections are more data-efficient than those without such a faculty. Our experiments demonstrate that strategies exploiting agent explanations in this way accomplish significantly better performance after the same number of training examples, compared to baseline strategies that do not especially when the learner s initial model for recognizing object parts is deficient.
Researcher Affiliation Academia Jonghyuk Park, Alex Lascarides, Subramanian Ramamoorthy School of Informatics, The University of Edinburgh EMAIL
Pseudocode No The paper describes the architecture and processes in detail, including how information is translated into logic programs and factor graphs. However, it does not present any structured pseudocode or algorithm blocks with formal steps.
Open Source Code Yes Code https://github.com/jpstyle/ns-arch-unity
Open Datasets No The paper states: "The classification target concepts in our experiments are fine-grained types of toy truck as illustrated in Fig. 1a." and "We instantiate the framework in an example testbed scenario where a learner agent must acquire and distinguish among a set of novel visual concepts, namely a range of fine-grained types of toy vehicle." It refers to 3D models and a simulated domain, but no explicit dataset name, link, or public availability information is provided for the toy truck dataset.
Dataset Splits No Our evaluation metric is cumulative regret; i.e., the accumulated number of mistakes made across a series of 120 interaction episodes, where subtypes and visual features are randomly sampled for each training instance.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper describes the conceptual modules of the neurosymbolic architecture (vision processing, language processing, long-term memory, symbolic reasoning) and mentions using probabilistic graphical models and normal logic programs, but it does not specify any particular software libraries, frameworks, or their version numbers.
Experiment Setup Yes We fix Ud = Ua = 0.99 in our experiments. We conduct a suite of experiments to assess the data-efficiency of the three different interaction strategies during FGVC training: Vis-Only, Vis+Genr and Vis+Genr+Expl. We run experiments with three initial part recognition accuracies: LQ/MQ/HQ (low-/medium-/high-quality). Our evaluation metric is cumulative regret; i.e., the accumulated number of mistakes made across a series of 120 interaction episodes, where subtypes and visual features are randomly sampled for each training instance.