reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Neuron-based Multifractal Analysis of Neuron Interaction Dynamics in Large Models

Authors: Xiongye Xiao, Heng Ping, Chenyu Zhou, Defu Cao, Yaxing Li, Yi-Zhuo Zhou, Shixuan Li, Nikos Kanakaris, Paul Bogdan

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that the proposed method yields a comprehensive measure of the network s evolving heterogeneity and organization, offering theoretical foundations and a new perspective for investigating emergent abilities in large models. In this section, we describe the experiments conducted to analyze the NINs to observe the structural dynamics process during training and evaluate our proposed metrics.
Researcher Affiliation	Academia	1University of Southern California, CA, USA 2University of California, Riverside, CA, USA
Pseudocode	Yes	Algorithm 1 Multiple Rounds of Weight Adjustment, Algorithm 2 Calculation of the Partition Function Z(q), Algorithm 3 Calculation of the Mass Exponent τ(q) in Eq. 6, Algorithm 4 Calculation of the Multifractal Spectrum f(α) in Eq. 9, Algorithm 5 Calculation of the Precise Shortest-Path in Neural Networks, Algorithm 6 Calculation of the Estimated Shortest-Path in Neural Networks
Open Source Code	Yes	B.1 CODE AVAILABILITY The source code is available at https://github.com/joshuaxiao98/Neuron_LLM.
Open Datasets	Yes	Our experiments are conducted on Pythia from 14M to 2.8B (Biderman et al., 2023) models (except for specific requirements like different architectures). Pythia models provide checkpoints during training phases and different model scales, which is helpful for us to reveal the rules behind these scales of training steps and model size. LAMBADA (Paperno et al., 2016), PIQA (Bisk et al., 2020), Wino Grande (Sakaguchi et al., 2021), WSC (Kocijan et al., 2020), ARC (Clark et al., 2018), Sci Q (Welbl et al., 2017), Logi QA (Liu et al., 2020), Hendrycks Test (Hendrycks et al., 2020)
Dataset Splits	No	The paper uses pre-trained Pythia models and existing benchmarks. While it mentions the structure of these benchmarks (e.g., ARC's Easy and Challenge sets), it does not explicitly provide details about the training/test/validation dataset splits for its own experimental analysis or for the Pythia models it uses, beyond referring to pre-existing model training checkpoints and standard benchmark evaluations.
Hardware Specification	No	The paper does not explicitly state the specific hardware (e.g., GPU models, CPU models, memory) used to run the experiments for the Neuro MFA analysis.
Software Dependencies	No	The paper mentions 'GPT-Neo X: Large Scale Autoregressive Language Modeling in Py Torch' and refers to its source code, but does not provide specific version numbers for PyTorch or other key software dependencies.
Experiment Setup	Yes	In the following experiments, we sample 10 networks with 64 nodes per layer and average the results to obtain precise, model-independent calculations. We set parameter λ to 1 and γ to 5 in Eq. 1 to achieve an appropriate distance between neurons. Table 2: SNIN sample parameters