Aioli: A Unified Optimization Framework for Language Model Data Mixing

Authors: Mayee Chen, Michael Hu, Nicholas Lourie, Kyunghyun Cho, Christopher Re

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate AIOLI in two settings by training 160M models on various combinations of data sources from Slim Pajama (Soboleva et al., 2023) (Section 6). First, we compare AIOLI to existing data mixing methods and find that AIOLI consistently outperforms stratified sampling on all 6 datasets, by an average of 0.274 and up to 0.439 points in test perplexity.
Researcher Affiliation Collaboration 1 Computer Science Department, Stanford University; 2 Center for Data Science, NYU; 3 Computer Science Department, NYU; 4 Prescient Design, Genentech
Pseudocode Yes Algorithm 1 AIOLI Algorithm 2 LEARNPARAMS
Open Source Code No The paper does not explicitly state that the source code for the methodology described is publicly available or provide a link to a code repository.
Open Datasets Yes We use a sampled version of Slim Pajama (Soboleva et al., 2023; Yoon, 2023), a pre-processed version of the Red Pajama pretraining dataset (Together.ai, 2023).
Dataset Splits Yes To obtain a test set, we shuffle and split the validation set from Slim Pajama-6B (Soboleva et al., 2023; Yoon, 2023) in half.
Hardware Specification Yes For the m=2,3 settings, experiments were run on a NVIDIA RTX 6000 Ada Generation GPU. For the m=7 setting, experiments were run on a NVIDIA A100 80 GB GPU.
Software Dependencies No The paper mentions software like 'PyTorch' and 'Flash Attention' but does not provide specific version numbers for any key software components.
Experiment Setup Yes We train 160M parameter GPT-style decoder-only LLMs with batch size 8 and context length 2048. All settings use Flash Attention (Dao et al., 2022), batch size of 8, context size of 2048, and cosine learning rate decay from a starting learning rate of 5e-5 to 1e-5 with 500 steps of learning rate warmup.