Assessing the Creativity of LLMs in Proposing Novel Solutions to Mathematical Problems

Authors: Junyi Ye, Jingyi Gu, Xinyun Zhao, Wenpeng Yin, Guiling Wang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that, while LLMs perform well on standard mathematical tasks, their capacity for creative problem-solving varies considerably. Notably, the Gemini1.5-Pro model outperformed other LLMs in generating novel solutions. This research opens a new frontier in evaluating AI creativity, shedding light on both the strengths and limitations of LLMs in fostering mathematical innovation, and setting the stage for future developments in AI-assisted mathematical discovery.
Researcher Affiliation Academia 1New Jersey Institute of Technology, Newark, USA 2The Pennsylvania State University, State College, PA, USA
Pseudocode No The paper describes its methodology in textual form and refers to a figure (Figure 3) for illustration, but it does not present any structured pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/NJIT-AI-Center/Creative Math
Open Datasets No The paper introduces the CREATIVEMATH dataset and states it was sourced from Art of Problem Solving (Ao PS) with a link to their wiki. However, it does not provide a direct link, DOI, or specific repository for their curated CREATIVEMATH dataset itself. The provided code link is explicitly for 'Code'.
Dataset Splits Yes We selected a subset from our Creative Math dataset for this study. For each competition, 50 samples were randomly chosen to ensure a representative evaluation of the LLMs performance. The datasets were meticulously curated to ensure that when the problem and all reference solutions were included in the novel solution generation prompt, the total token count did not exceed 3K tokens. In total, the dataset comprises 400 math problems and 605 solutions, forming 605 distinct samples with k varying from 1 to 5.
Hardware Specification Yes Open-source LLMs were run using the Hugging Face library on one to four NVIDIA A100 (80G) GPUs, depending on the model s memory requirements.
Software Dependencies No The paper mentions using the 'Hugging Face library' but does not specify any version numbers for this or any other software dependencies.
Experiment Setup Yes To ensure reproducibility, all experiments were conducted using the greedy decoding strategy, adhering to the recommended settings provided on the official Hugging Face pages or the models respective papers. The system prompt followed the guidelines outlined in the models documentation, with the maximum number of new tokens set to 1024.