reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Identifying Axiomatic Mathematical Transformation Steps using Tree-Structured Pointer Networks

Authors: Sebastian Wankerl, Jan Pfister, Andrzej Dulny, Gerhard Götz, Andreas Hotho

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We benchmark our model against various baselines and perform an ablation study to quantify the influence of our custom embeddings and the copy-pointer component. Furthermore, we test the robustness of our model on data of unseen complexity. Our results clearly show that incorporating the hierarchical structure, embeddings and copy-pointer into a single model is highly beneficial for solving the SETI task.
Researcher Affiliation	Academia	Sebastian Wankerl EMAIL Center for Artificial Intelligence and Data Science (CAIDAS) Julius-Maximilians-Universität Würzburg, Germany Baden-Württemberg Cooperative State University Mosbach, Germany Jan Pfister EMAIL Center for Artificial Intelligence and Data Science (CAIDAS) Julius-Maximilians-Universität Würzburg, Germany Andrzej Dulny EMAIL Center for Artificial Intelligence and Data Science (CAIDAS) Julius-Maximilians-Universität Würzburg, Germany Gerhard Götz EMAIL Baden-Württemberg Cooperative State University Mosbach, Germany Andreas Hotho EMAIL Center for Artificial Intelligence and Data Science (CAIDAS) Julius-Maximilians-Universität Würzburg, Germany
Pseudocode	No	The paper describes the model architecture and data generation process in detail within sections 2 and 3, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	The code for our experiments is available at https://github.com/LSX-Uni Wue/ axiomatic-steps-treepointer.
Open Datasets	Yes	Furthermore, we create and release a new dataset, consisting of equations and axiomatic steps needed to show their equivalence. This is necessary since to the best of our knowledge no well-suited dataset for the SETI task exists in literature. Our dataset contains a wide range of mathematical functions, such as polynomials, logarithms, exponentiation, and trigonometric functions. In addition, it contains equations of varying complexity measured by the depth of the parse tree and the number of axiomatic steps needed to show the equivalence. ... The code for our experiments is available at https://github.com/LSX-Uni Wue/ axiomatic-steps-treepointer.
Dataset Splits	Yes	We create about 8.5 million samples for training, 400,000 samples for validation and 14,000 samples for testing.
Hardware Specification	No	The paper mentions 'modern multi-core CPU' for evaluation time but does not provide specific details such as CPU model, GPU models, memory, or other hardware specifications used for training or running experiments.
Software Dependencies	No	The paper mentions software tools like 'Fairseq', 'Adam optimizer', and 'Optuna framework' but does not specify their version numbers.
Experiment Setup	Yes	We train all models using the Adam optimizer (Kingma & Ba, 2015) on batches of 16 samples for up to 100 epochs or until the loss stagnates or deteriorates over a period of 10 epochs. We ran a hyper-parameter search using the Optuna framework (Akiba et al., 2019), exploring 100 configurations each for our model and the baselines. Our search space included the number of layers in the encoder and decoder, the number of attention heads used per layer, the number of attention heads used for pointing (where applicable), the size of embeddings and the hidden representations in the encoder and decoder, the dropout rates and the learning rate of the optimizer. The exact parameters and search spaces are given in appendix C.