reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Fine-Grained Alignment for Aerial Vision-Dialog Navigation

Authors: Yifei Su, Dong An, Kehan Chen, Weichen Yu, Baiyang Ning, Yonggen Ling, Yan Huang, Liang Wang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our explicit entity-landmark alignment learning is beneficial for AVDN. As a result, FELA achieves leading performance with 3.2% SR and 4.9% GP improvements over prior arts. The proposed method is evaluated on the ANDH task (Fan et al. 2023a), which is the only available benchmark for Aerial Vision-Dialog Navigation. The ANDH task splits the AVDN dataset into 6269 sub-trajectories according to dialog rounds. These sub-trajectories are further divided into 4 splits via their scene types, including 4591 for training, 370 for seen validation, 411 for unseen validation, and others for unseen testing. Evaluation Metrics. We use the standard metrics for evaluation (Fan et al. 2023a), including: 1) Success Rate (SR): the ratio of predicted paths being regarded as successful; 2) Success weighted by inverse Path Length (SPL): SR weighted by the total length of the navigation path; 3) Goal Progress (GP): the distance of the navigation progress towards the destination area.
Researcher Affiliation	Collaboration	Yifei Su1,2, Dong An3, Kehan Chen1,2, Weichen Yu4, Baiyang Ning1,2, Yonggen Ling5, Yan Huang1,2 , Liang Wang1,2, 1School of Artificial Intelligence, University of Chinese Academy of Sciences 2MAIS, Institute of Automation of Chinese Academy of Sciences 3Mohamed bin Zayed University of Artificial Intelligence 4Electrical and Computer Engineering Department, Carnegie Mellon University 5Robotics X, Tencent, Shenzhen, China
Pseudocode	No	The paper does not contain an explicit pseudocode block or algorithm section. It describes the methods in paragraph form and through equations.
Open Source Code	Yes	Code https://github.com/yifeisu/FELA
Open Datasets	Yes	Aerial Vision-Dialog Navigation (AVDN) is a new task... Fan et al. (Fan et al. 2023a) propose a challenging ANDH task... The proposed method is evaluated on the ANDH task (Fan et al. 2023a), which is the only available benchmark for Aerial Vision-Dialog Navigation. The ANDH task splits the AVDN dataset into 6269 sub-trajectories according to dialog rounds.
Dataset Splits	Yes	The ANDH task splits the AVDN dataset into 6269 sub-trajectories according to dialog rounds. These sub-trajectories are further divided into 4 splits via their scene types, including 4591 for training, 370 for seen validation, 411 for unseen validation, and others for unseen testing.
Hardware Specification	Yes	Our experiments are conducted on two NVIDIA RTX 3090 GPUs.
Software Dependencies	No	The paper mentions software components like Yolov5-x, Roberta, and Swin-Tiny backbones, and the Adam W optimizer, but does not provide specific version numbers for these or other ancillary software (e.g., Python, PyTorch, CUDA).
Experiment Setup	Yes	All models are optimized for 200,000 iterations ( 50 hours) with a batch size of 8 and a learning rate of 1e-5 via Adam W optimizer. The hidden size of dialog encoding, history encoding, and semantic grid representation D is uniformly set to 768. The number of transformer layers for the text encoder and episodic transformer is set to 9 and 3, respectively. For weight coefficients, we set κ1, κ2 in Formula 10 to 1, 0.1, respectively. The τ in Formula 8 is set to 0.02 following (Jiang and Ye 2023).