MIND over Body: Adaptive Thinking using Dynamic Computation

Authors: Mrinal Mathur, Barak Pearlmutter, Sergey Plis

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of this method on language modeling and computer vision tasks. Notably, our model achieves 96.62% accuracy on Image Net with just a three-layer network, surpassing much larger Res Net-50 and Efficient Net. When applied to a transformer architecture, the approach achieves 95.8%/88.7% F1 scores on the SQu AD v1.1/v2.0 datasets at negligible parameter cost. These results showcase the potential for dynamic and reflective computation, contributing to the creation of intelligent systems that efficiently manage resources based on input data complexity.
Researcher Affiliation Academia Mrinal Mathur TRe NDS Center Georgia State University Atlanta, GA, USA Barak A. Pearlmutter Dept of Computer Science Maynooth University Co. Kildare, W23 A3HY, Ireland Sergey Plis TRe NDS Center Georgia State University Atlanta, GA, USA
Pseudocode Yes Algorithm 1 Training Procedure for the MIND Model Algorithm 2 Backward Propagation in MIND model Algorithm 3 Forward Propagation for MIND model
Open Source Code No The paper does not contain any explicit statements about the release of source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes In language modeling, we evaluate on SQu AD (Rajpurkar et al., 2016) and Wiki Text (Gardent et al., 2017). To demonstrate that it is domain-agnostic, we also validate on vision benchmarks including CIFAR-100 (Krizhevsky, 2009) and Image Net (Deng et al., 2009). ... Wiki Text-2 and Wiki Text-103 datasets (Merity et al., 2016).
Dataset Splits Yes All models were validated using 9-fold cross-validation with 10 different random seeds to ensure stability and robustness of results. ... For vision tasks, we evaluated the MIND model on CIFAR-100 and Image Net datasets. CIFAR-100 consists of 60,000 32 x 32 images in 100 classes, while Image Net has 1.28M images in 1,000 classes.
Hardware Specification Yes All experiments were conducted using Py Torch (Paszke et al., 2019) on NVIDIA A40 GPUs with 20GB memory. ... the experiments were conducted in a controlled environment using the Image Net dataset for classification tasks and NVIDIA A100 GPUs for training and inference.
Software Dependencies No All experiments were conducted using Py Torch (Paszke et al., 2019) on NVIDIA A40 GPUs with 20GB memory. The paper mentions Py Torch but does not specify a version number.
Experiment Setup Yes The MIND model was optimized using the Adam optimizer (Kingma & Ba, 2014) with an initial learning rate of 1 x 10^-3, decayed by a factor of 0.1 every 30 epochs. The batch size was set to 64. Hyperparameters α, β, γ, and δ in Equation 5 were fine-tuned to 0.5, 0.2, 0.2, and 0.1 respectively, while λ for Lintrospect was set to 0.6. Each model was trained for 100 epochs with early stopping, triggered when validation loss did not improve over 10 epochs. Fixed-point iteration (FPI) tolerance for the MIND architecture was set to 1 x 10^-5, with a maximum of 100 iterations per layer.