RMath: A Logic Reasoning-Focused Datasets Toward Mathematical Multistep Reasoning Tasks

Authors: Ziyi Hu, Jun Liu, Zhongzhi Liu, Yuzhong Liu, Zheng Xie, Yiping Song

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we evaluate RMath on several popular LLMs and present the corresponding results. Experiments LLMs on RMath Experimental setting. We use RMath to train and test a range of large models with parameter sizes ranging from 7 billion, 8 billion, 13 billion, to 70 billion. ... Results. Table 2 shows the performance of various LLMs on our dataset RMath and other datasets related to mathematical problems, which assess the abilities of LLMs from different perspectives.
Researcher Affiliation Academia Ziyi Hu1, Jun Liu2, Zhongzhi Liu1, Yuzhong Liu1, Zheng Xie1, Yiping Song1* 1National University of Defense Technology, Changsha, China 2Sun Yat-sen University, Zhuhai, China
Pseudocode Yes The flow chart of proposition connection and judgment is shown in Figure 3, and the process is divided into nine steps. Initially, input the three types of propositions(step 1). Then starting with propositions in Class C, assume their truth values and connect them with propositions in Class A to check for contradictions. If there is a contradiction, revise the assumption for the propositions in Class C; if not, based on this assumption and according to the requirements in problems about the numbers of true or false propositions, hypothesize the truth values for all Class B propositions one by one. Check if the hypotheses of propositions in Class B contradict with each other. Connect them respectively with propositions in Class A (Loop-A-Contradict-B : step 4-6) and Class C and check for contradictions. If there is a contradiction, check if all the hypothesis combinations of truth values of propositions in Class B are cycled;if not, re-assume for propositions in Class B; otherwise, re-assume for propositions in Class C (Loop-C-Tra-B:step 4-8), output the correct answer, the proposition in Class C that is true and consistent with the requirements in problems. Step 1: Input propositions in Class A, B, and C. Step 2: Assume truth or false value for propositions in Class C. ... Step 9: Output the true propositions in Class C as the correct answer.
Open Source Code Yes Our dataset and code are available at: https://github.com/huziyi19/RMath
Open Datasets Yes In this paper, we construct RMath1, a dataset specifically for multistep reasoning tasks... Our dataset and code are available at: https://github.com/huziyi19/RMath
Dataset Splits No The paper mentions creating a training dataset 'RMath-train' from 'RMath' for prompt tuning, but it does not specify the exact percentages, sample counts, or methodology for splitting RMath into training, validation, or test sets for its evaluations or for the RMath-train creation.
Hardware Specification No The paper mentions evaluating LLMs with various parameter sizes (7 billion, 8 billion, 13 billion, to 70 billion) but does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for these evaluations or for training.
Software Dependencies No The paper lists various LLM models used (e.g., Llama2, Llama3, Wizard Math, Meta Math, To RA) and mentions prompt tuning, but it does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks (e.g., Python version, PyTorch version, CUDA version).
Experiment Setup No The paper describes the general experimental approach (prompt tuning on RMath-train) and the LLMs used, but it does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings) that would be needed for reproduction.