Multilingual Mathematical Reasoning: Advancing Open-Source LLMs in Hindi and English
Authors: Avinash Anand, Kritarth Prasad, Chhavi Kirtani, Ashwin R Nair, Manvendra Kumar Nema, Raj Jaiswal, Rajiv Ratn Shah
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments result in notable performance enhancements. Wizard Math 7B exceeds Gemini s accuracy on English datasets by +6% and matches Gemini s performance on Hindi datasets. |
| Researcher Affiliation | Academia | Indraprastha Institute of Information Technology, Delhi EMAIL |
| Pseudocode | No | The paper describes methodologies such as the Decomposition Strategy and Structured Solution Approach with Curriculum Learning in text and through figures, but does not present them in structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and Dataset https://github.com/midasresearch/Multilingual-Mathematical-Reasoning.git |
| Open Datasets | Yes | Our evaluations on the GSM8K (Cobbe et al. 2021) and MATH (Hendrycks et al. 2021) datasets reveal a stark contrast in their capabilities. ... (Sharma, Mishra, and Sharma 2022) released HAWP (Hindi Arithmetic Word Problems), which is the only publicly available dataset of Hindi mathematical questions. ... Code and Dataset https://github.com/midasresearch/Multilingual-Mathematical-Reasoning.git |
| Dataset Splits | Yes | These refined solutions, with 70% and 30% training/testing split, were then used to fine-tune the models Open Hathi 7B, Wizard Math-v1.1 7B, and LLe MMa 7B. ... Each dataset was divided into 70% for training and 30% for testing, ensuring this split was consistently applied across all problem categories: easy, medium, and hard. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using GPT-4 and LLAMA 3 (405B) for generating and translating data, but does not provide specific version numbers for these or other ancillary software components (e.g., programming languages, libraries, frameworks) required for reproducibility. |
| Experiment Setup | No | The paper describes the general methodology, including zero-shot, few-shot chain-of-thought, supervised fine-tuning, curriculum learning, and instruction-tuning. However, it does not provide concrete hyperparameter values such as learning rates, batch sizes, number of epochs, or optimizer settings used during training. |