reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

3D-SPATIAL MULTIMODAL MEMORY

Authors: Xueyan Zou, Yuchen Song, Ri-Zhao Qiu, Xuanbin Peng, Jianglong Ye, Sifei Liu, Xiaolong Wang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To validate M3, we conduct comprehensive quantitative evaluations of feature similarity and downstream tasks, as well as qualitative visualizations to highlight the pixel trace of Gaussian memory attention. Our approach encompasses a diverse range of foundation models, including vision-language models (VLMs), perception models, and large multimodal and language models (LMMs/LLMs). Furthermore, to demonstrate real-world applicability, we deploy M3 s feature field in indoor scenes on a quadruped robot. We report the main quantitative results in Tab. 1, where the average training time and the auxiliary low-level metrics are reported. The downstream evaluation results of grounding and retrieval are shown in Table. 2. Table. 3 shows the ablation of the number of foundation models involved in M3.
Researcher Affiliation	Collaboration	Xueyan Zou1 , Yuchen Song1 , Ri-Zhao Qiu1 , Xuanbin Peng1 Jianglong Ye1, Sifei Liu2, Xiaolong Wang1,2 Core Contribution 1UC San Diego 2NVIDIA
Pseudocode	Yes	Algorithm 1 Raw Feature (R) Similarity Reduction Algorithm
Open Source Code	No	The paper provides a project website link (https://m3-spatial-memory.github.io) which is a high-level overview page, not a direct link to a code repository. There is no explicit statement confirming the release of the code for the methodology described.
Open Datasets	Yes	To support extensive quantitative and qualitative evaluation, we perform experiments using several existing scene datasets [3; 18; 10] and collected a custom robot dataset (M3-Robot) using a quadruped robot and a drone. Specifically, we use Garden (an outdoor scene) from Mip Ne RF360 [3], Train from the Tank & Temples dataset [18], and Play Room as well as Dr Johnson from the Deepblending dataset [10].
Dataset Splits	No	The paper states: "We evaluate all the images in the validation sets of the three datasets." However, it does not provide specific training/test/validation dataset splits (e.g., percentages, sample counts, or references to predefined splits) for reproducibility.
Hardware Specification	No	No specific hardware details (such as exact GPU/CPU models, memory specifications, or detailed computer configurations) used for running the experiments are mentioned in the paper.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., library versions like PyTorch 1.9, specific compilers, or operating system versions) are mentioned in the paper.
Experiment Setup	Yes	For fair comparisons, we train all the methods in approximately 30,000 iterations (29,993 iterations for M3 due to last-batch data loader roundoffs). ... In compensate, we use point-based loss, where we sample 2000 points ranging from both predict and ground truth features for distance loss computation. In Table. 4, we ablate the computation budget on training M3 in the balance of memory footprint, training iterations, and performance.