reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA

Authors: Hanrong Ye, Haotian Zhang, Erik Daxberger, Lin Chen, Zongyu Lin, Yanghao Li, Bowen Zhang, Haoxuan You, Dan Xu, Zhe Gan, Jiasen Lu, Yinfei Yang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We first conduct experiments on our Ego Memoria benchmark, primarily comparing three models: LLaVA-OV (Li et al., 2024a), its fine-tuned version using our MM-Ego SFT data mixture (referred to as Ego SFT), and our MM-Ego model, which incorporates the proposed Memory Pointer Prompting mentioned in Section 2.2.2. We show the Ego Memoria accuracy in the first row of Table 3.
Researcher Affiliation	Collaboration	Hanrong Ye1 , Haotian Zhang2 , Erik Daxberger2, Lin Chen2, Zongyu Lin3, Yanghao Li2, Bowen Zhang2, Haoxuan You2, Dan Xu1, Zhe Gan2 , Jiasen Lu2 , Yinfei Yang2 1CSE, HKUST 2Apple 3UCLA
Pseudocode	No	The paper describes methods in prose and with diagrams (e.g., Figure 4 describes the Memory Pointer Prompting mechanism in two steps: global glimpse and fallback), but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor structured code-like formatting for its procedures.
Open Source Code	No	The REPRODUCIBILITY STATEMENT says: 'We provide a detailed explanation of the data synthesis process in our data engine in Section 2.1. We also elaborate on our model design in Section 2.2.2. Additionally, we outline the implementation details, including the training hyperparameters in Section 3.2.' This describes reproducibility details but does not state that source code is released or provide a link to it.
Open Datasets	Yes	First, as there is a lack of QA data for egocentric video understanding, we automatically generate 7M high-quality QA samples for egocentric videos ranging from 30 seconds to one hour long in Ego4D (Grauman et al., 2022) based on human-annotated data.
Dataset Splits	Yes	We partition the dataset into training and testing sets according to the official Ego4D episodic memory task. ... We divide the videos into seven different length ranges: 0.5 to 1 min, 1 to 2 min, 2 to 4 min, 4 to 10 min, 10 to 20 min, 20 to 40 min, and 40 to 60 min. We aim to balance the number of samples in different video lengths.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, or cloud resources) used for running its experiments.
Software Dependencies	No	The paper mentions specific models like Qwen2-7B, LLaVA-OV 7B, SigLIP-so400M ViT, and GPT-4o, but does not provide specific version numbers for underlying software dependencies such as programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries used for implementation.
Experiment Setup	Yes	The model is trained for 1 epoch with a base learning rate of 1e-5, using a cosine scheduler. The batch size is set to 128. We sample a maximum of 300 frames (N = 300) and select 32 visual embeddings in the proposed memory pointer prompting mechanism. By default, we set the explore-exploit balancing parameter α to 0.1. Greedy decoding is used in generation.