Inverting Cryptographic Hash Functions via Cube-and-Conquer

Authors: Oleg Zaikin

JAIR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this study, Cube-and-Conquer (a combination of CDCL and lookahead) is applied to invert step-reduced versions of MD4 and MD5. For this purpose, two algorithms are proposed. The first one generates inverse problems for MD4 by gradually modifying the Dobbertin s constraints. The second algorithm tries the cubing phase of Cube-and-Conquer with different cutoff thresholds to find the one with the minimum runtime estimate of the conquer phase. This algorithm operates in two modes: (i) estimating the hardness of a given propositional Boolean formula; (ii) incomplete SAT solving of a given satisfiable propositional Boolean formula. While the first algorithm is focused on inverting step-reduced MD4, the second one is not area-specific and is therefore applicable to a variety of classes of hard SAT instances. In this study, 40-, 41-, 42-, and 43-step MD4 are inverted for the first time via the first algorithm and the estimating mode of the second algorithm. Also, 28-step MD5 is inverted for four hashes via the incomplete SAT solving mode of the second algorithm. For three hashes out of them, it is done for the first time.
Researcher Affiliation Academia Oleg Zaikin EMAIL Novosibirsk State University Novosibirsk, 630090, Russia Matrosov Institute for System Dynamics and Control Theory SB RAS Irkutsk, 664033, Russia
Pseudocode Yes Algorithm 1 MD4 compression function on the first 512-bit message block. Algorithm 2 MD4 compression function on the first 512-bit message block with applied Dobbertin s constraints. Algorithm 3 Invert step-reduced MD4 via Dobbertin-like constraints. Algorithm 4 Preselect promising thresholds for the cubing phase of Cube-and-Conquer. Algorithm 5 Find a cutoff threshold with the minimum estimated runtime of the conquer phase.
Open Source Code Yes The sources are available at github3, while the sources, benchmarks, and solutions are available at Zenodo (Zaikin, 2024). 3. https://github.com/olegzaikin/En Cn C Zaikin, O. (2024). Inverting Cryptographic Hash Functions via Cube-and-Conquer Results and Source code, Dataset on Zenodo. https://doi.org/10.5281/zenodo.13766981.
Open Datasets Yes All the constructed CNFs and the corresponding TA programs are available online as a dataset (Zaikin, 2024). Zaikin, O. (2024). Inverting Cryptographic Hash Functions via Cube-and-Conquer Results and Source code, Dataset on Zenodo. https://doi.org/10.5281/zenodo.13766981.
Dataset Splits No The paper focuses on inverting cryptographic hash functions (MD4 and MD5) by solving SAT problems. It does not involve traditional machine learning datasets that require train/test/validation splits. Instead, it generates CNFs for specific hash inversion problems. Therefore, the concept of dataset splits as commonly understood in ML contexts is not applicable.
Hardware Specification Yes All experiments were executed on a personal computer equipped with the 12-core CPU AMD 3900X. The implementations are multithreaded, so all 12 CPU cores were employed.
Software Dependencies Yes Kissat CDCL solver of version sc2021 (Biere, Fleury, & Heisinger, 2021a) because this solver and its modifications won the SAT Competition in 2020 2023. The CDCL solver Ca Di Ca L of version 1.5.0 (Froleyks & Biere, 2021) was used to simplify the CNFs.
Experiment Setup Yes In case of Algorithm 5, values of a cutoff threshold and then subproblems from samples are processed in parallel. In case of the conquer phase, subproblems are processed in parallel. The inputs of Algorithm 3 in case of 40-step MD4 were discussed in Section 5. As for Algorithm 5, the inputs are as follows: March cu lookahead solver (Heule et al., 2011) since it has been recently successfully applied to several hard problems (Heule et al., 2016; Heule, 2018b). nstep = 5. It was chosen in preliminary experiments. If this parameter is equal to 1, then a better threshold usually can be found, but at the same time Algorithm 5 becomes quite time consuming. On the other hand, if nstep is quite large, e.g., 50, then as a rule almost all most promising thresholds are skipped. maxc = 2 000 000. On the considered CNFs, March cu reaches 2 000 000 cubes in about 30 minutes, so this value of maxc looks reasonable. Higher values were also tried, but it did not give any improvement. minr = 1 000. If it is less then 1 000, then subproblems are too hard because they are not simplified enough by lookahead. At the same time, higher value of this parameter did not allow collecting enough amount of promising thresholds. N = 1 000. First N = 100 was tried, but it led to too optimistic estimates which were several times lower than real solving time. On the other hand, N = 10 000 is too time consuming and gives just modest improvement in accuracy compared to N = 1 000. The accuracy of obtained estimates is discussed later in Subsection 7.2. Kissat CDCL solver of version sc2021 (Biere, Fleury, & Heisinger, 2021a) because this solver and its modifications won the SAT Competition in 2020 2023. maxst = 5 000 seconds. It is a standard time limit in SAT Competitions (Balyo, Froleyks, Heule, Iser, J arvisalo, & Suda, 2021), so modern CDCL solvers are designed to show all their power within this time. cores = 12. mode = estimating.