PICASO: Permutation-Invariant Context Composition with State Space Models
Authors: Tian Yu Liu, Alessandro Achille, Matthew Trager, Aditya Golatkar, Luca Zancato, Stefano Soatto
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our resulting method on Wiki Text and MSMARCO in both zero-shot and fine-tuned settings, and show that we can match the strongest performing baseline while enjoying on average 5.4 speedup. |
| Researcher Affiliation | Collaboration | Tian Yu Liu UCLA Alessandro Achille AWS AI Labs Matthew Trager AWS AI Labs Aditya Golatkar AWS AI Labs Luca Zancato AWS AI Labs Stefano Soatto AWS AI Labs |
| Pseudocode | Yes | A ALGORITHMS: PICASO-S AND PICASO-R We show in Algorithm 1 how PICASO-S is computed in polynomial time via a dynamic programming approach based on Algorithm 2. In Algorithm 3, we also show how PICASO-R can be computed with linear time complexity. Time complexity is measured as the number of arithmetic operations required as a function of number of context states. |
| Open Source Code | No | The paper references third-party tools and models (e.g., HuggingFace, Mamba-2, Sentence-Transformers) and their repositories, but does not provide specific links or statements regarding the release of the authors' own implementation code for PICASO. |
| Open Datasets | Yes | We evaluate our method on two large-scale datasets Wiki Text-V2 (Merity et al., 2016) and MSMARCO (Nguyen et al., 2016). |
| Dataset Splits | Yes | We use the training splits as our fine-tuning data, and the testing/validation splits respectively for evaluation. To pre-process Wiki Text-V2 for our use case, we split each passage in the dataset into two equal context segments... For Wiki Text, we select k {0, . . . , 10} uniformly at random for each batch. For MSMARCO, we use all the available passages (both relevant and irrelevant) associated with each training example. |
| Hardware Specification | Yes | We used the official benchmark2 with an A100 GPU for our timing experiments in Figure 1 to ensure fairest comparisons. |
| Software Dependencies | No | The paper mentions software like 'Hugging Face trainer' and 'Mamba-2 2.7B model', and uses 'CUDA graphs' and 'flash attention', but does not provide specific version numbers for these software dependencies (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | For fine-tuning experiments using BPTC and BP2C, we base our implementation on the official Hugging Face trainer with default hyperparameters, and retrieve the k most relevant context segments for each query sample for composition. For Wiki Text, we select k {0, . . . , 10} uniformly at random for each batch. For both datasets, we fine-tune for only 1 epoch. |