Multimodal Knowledge Retrieval-Augmented Iterative Alignment for Satellite Commonsense Conversation

Authors: Qian Li, Xuchen Li, Zongyu Chang, Yuzheng Zhang, Cheng Ji, Shangguang Wang

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that Sat-RIA outperforms existing large language models and provides more comprehensible answers with fewer hallucinations. The paper includes a dedicated section '5 Experiments' with subsections '5.1 Evaluation Datasets', '5.2 Evaluation Metrics', '5.3 Comparison Methods', and '5.5 Main Results' which features a performance comparison table.
Researcher Affiliation Academia 1School of Computer Science, Beijing University of Posts and Telecommunications, China 2Institute of Automation, Chinese Academy of Sciences and Zhongguancun Academy, China 3SKLCCSE, School of Computer Science and Engineering, Beihang University, China EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes methods through text and mathematical formulas (e.g., equations 1-7) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing the source code for the methodology, nor does it provide a link to a code repository.
Open Datasets No To evaluate our models on satellite commonsense conversation, we construct two datasets: one for satellite multi-turn dialogues (Sat Diag) and one for satellite visual question-answering (Sat VQA) (more details in Appendix C). The paper does not provide concrete access information (link, DOI, repository, or external citation) for these constructed datasets in the main body.
Dataset Splits No The paper describes the size and content of the constructed datasets (e.g., 'The Sat Diag dataset includes 2,000 dialogues', 'The Sat VQA dataset consists of 2,000 labeled examples') but does not specify any training, validation, or test splits, nor does it mention cross-validation or specific splitting methodologies.
Hardware Specification Yes We have trained our model through the method of full parameter fine-tuning, using a 2x A800 80G machine, and All experiments were conducted on the same machine.
Software Dependencies No The paper mentions 'Py Torch framework' and specific LLM models like 'Intern VL 2 8B' and 'LLa Ma3 8B', but it does not provide specific version numbers for PyTorch or any other ancillary software libraries or tools.
Experiment Setup Yes We use a total batch size of 1 throughout the training process. The Adam W [Loshchilov and Hutter, 2019] optimizer is applied with a cosine learning rate decay and a warm-up period. In the training stage, every alignment epoch number is 1 with a learning rate of 1 10 5 and a warmup ratio of 0.05.