reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Riemann-based Multi-scale Attention Reasoning Network for Text-3D Retrieval

Authors: Wenrui Li, Wei Han, Yandu Chen, Yeyu Chai, Yidan Lu, Xingtao Wang, Xiaopeng Fan

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments We conducted comparative experiments on the T3DR-HIT dataset, utilizing different text and point cloud feature extractors while keeping the retrieval framework unchanged. The experimental results demonstrated the superior retrieval performance of our model. The Table 1 summarizes the performance of various models on the T3DR-HIT dataset, including their respective hyperparameter configurations.
Researcher Affiliation	Academia	Wenrui Li1, Wei Han1, Yandu Chen1, Yeyu Chai1, Yidan Lu1, Xingtao Wang12*, Xiaopeng Fan123 1Harbin Institute of Technology 2 Harbin Institute of Technology Suzhou Research Institute 3Peng Cheng Laboratory EMAIL; EMAIL; EMAIL; EMAIL EMAIL; EMAIL; EMAIL
Pseudocode	No	The paper describes the components of RMARN using mathematical formulations and textual explanations (e.g., equations for Attention, FFN, similarity calculation) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/liwrui/RMARN
Open Datasets	Yes	In this paper, to address the scarcity of paired text-3D data, we developed a large-scale, high-quality open-source dataset named T3DR-HIT, containing over 3,380 pairs of text and point cloud data. The dataset comprises two main parts: one part contains coarse-grained alignments between indoor 3D scenes and text, consisting of 1,380 text-3D pairs; the other part contains fine-grained alignments between Chinese cultural heritage scenes and text, with over 2,000 text3D pairs. The release of the T3DR-HIT dataset provides robust support for multi-scale text-3D retrieval tasks. ... Building on the open-source Stanford 2D-3D-Semantics Dataset, we developed the Indoor Text Point Pairs dataset...
Dataset Splits	No	The paper describes the composition of the T3DR-HIT dataset, including its division into 'coarse-grained Indoor 3D Scenes' and 'fine-grained Chinese Artifact Scenes', and the total number of pairs. However, it does not provide specific details on how these datasets are split into training, validation, and test sets for experimentation.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for conducting the experiments.
Software Dependencies	No	The paper mentions several software components, models, or functions such as CLIP text encoder, Point Net, Adam optimizer (with beta values), GELU activation function (with parameters), dropout rate, Open3D, and LLaVA (v1.6-mistral-7b-hf). While some parameters are provided for the optimizer and activation, and a specific LLaVA model version is named, the paper does not list multiple key software libraries or frameworks with their specific version numbers that are critical for reproducing the RMARN model's implementation.
Experiment Setup	Yes	We trained the model for 100 epochs, utilizing the Adam optimizer, which is well-regarded for its ability to adapt learning rates during training. The learning rate was set to 0.008, providing a balance between making steady progress and avoiding potential overshooting of minima. The 𝛽for the Adam optimizer were configured as (0.91, 0.9993). ... For the activation function, we utilized GELU (Gaussian Error Linear Unit) with the parameters 𝜖= 0.5 and 𝜌= 0.044715... A dropout rate of 0.1 is applied... Both the Attention layer and the Feed-Forward Network (FFN) in the self-attention encoder are configured with a dimensionality of 512. ... Table 1 also shows hyperparameters for the best performing model: Low Rank (256), Epochs (100), Batch Size (64), Nhead (32), SA Layer (8).