Multilingual Mathematical Reasoning: Advancing Open-Source LLMs in Hindi and English

Authors: Avinash Anand, Kritarth Prasad, Chhavi Kirtani, Ashwin R Nair, Manvendra Kumar Nema, Raj Jaiswal, Rajiv Ratn Shah

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments result in notable performance enhancements. Wizard Math 7B exceeds Gemini s accuracy on English datasets by +6% and matches Gemini s performance on Hindi datasets.
Researcher Affiliation Academia Indraprastha Institute of Information Technology, Delhi EMAIL
Pseudocode No The paper describes methodologies such as the Decomposition Strategy and Structured Solution Approach with Curriculum Learning in text and through figures, but does not present them in structured pseudocode or algorithm blocks.
Open Source Code Yes Code and Dataset https://github.com/midasresearch/Multilingual-Mathematical-Reasoning.git
Open Datasets Yes Our evaluations on the GSM8K (Cobbe et al. 2021) and MATH (Hendrycks et al. 2021) datasets reveal a stark contrast in their capabilities. ... (Sharma, Mishra, and Sharma 2022) released HAWP (Hindi Arithmetic Word Problems), which is the only publicly available dataset of Hindi mathematical questions. ... Code and Dataset https://github.com/midasresearch/Multilingual-Mathematical-Reasoning.git
Dataset Splits Yes These refined solutions, with 70% and 30% training/testing split, were then used to fine-tune the models Open Hathi 7B, Wizard Math-v1.1 7B, and LLe MMa 7B. ... Each dataset was divided into 70% for training and 30% for testing, ensuring this split was consistently applied across all problem categories: easy, medium, and hard.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions using GPT-4 and LLAMA 3 (405B) for generating and translating data, but does not provide specific version numbers for these or other ancillary software components (e.g., programming languages, libraries, frameworks) required for reproducibility.
Experiment Setup No The paper describes the general methodology, including zero-shot, few-shot chain-of-thought, supervised fine-tuning, curriculum learning, and instruction-tuning. However, it does not provide concrete hyperparameter values such as learning rates, batch sizes, number of epochs, or optimizer settings used during training.