Permissive Information-Flow Analysis for Large Language Models
Authors: Shoaib Ahmed Siddiqui, Radhika Gaonkar, Boris Köpf, David Krueger, Andrew Paverd, Ahmed Salem, Shruti Tople, Lukas Wutschitz, Menglin Xia, Santiago Zanella-Beguelin
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results in an LLM agent setting show that our label propagator assigns a more permissive label over the baseline in more than 85% of the cases, which underscores the practicality of our approach. We demonstrate the effectiveness of our proposed label propagator by evaluating it on three datasets |
| Researcher Affiliation | Collaboration | Shoaib Ahmed Siddiqui1, Radhika Gaonkar2, Boris Köpf2, David Krueger3, Andrew Paverd2, Ahmed Salem2, Shruti Tople2, Lukas Wutschitz2, Menglin Xia2, Santiago Zanella-Béguelin2 EMAIL EMAIL EMAIL EMAIL 1University of Cambridge 2Microsoft 3Mila |
| Pseudocode | Yes | Algorithm 1 describes this idea in pseudocode. We represent the powerset of possible labels as a directed acyclic graph (DAG) where nodes represent labels and edges represent the lattice order (see Figure 2 for an illustration). Starting from the root node L = F c C ℓ(c) that corresponds to the full context, we traverse the DAG depth-first to identify λ-similar labels. For each label L, the function minimal_labels() returns Λ, the set of λ-similar labels at or below L (i.e., at least as permissive as L). |
| Open Source Code | No | The paper does not provide an explicit statement about releasing code or a direct link to a code repository for the methodology described. |
| Open Datasets | Yes | Starting from an existing fake news dataset (Pérez-Rosas et al., 2018), we create pairs of high and low-integrity news articles that discuss similar topics to each other. |
| Dataset Splits | Yes | For the introspection baseline, we use one-shot in-context learning (Wei et al., 2022) based on the first example in the dataset and use the remaining 63 questions for evaluation. |
| Hardware Specification | No | The paper mentions using Llama-2 model family (7B and 70B variants) but does not specify the hardware (e.g., specific GPU models, CPUs) on which these models were run for their experiments. |
| Software Dependencies | No | The paper mentions using Llama-2 models and GPT-4 for data generation, and metrics like ROUGE-L, but it does not specify any software libraries, frameworks, or programming languages with their version numbers. |
| Experiment Setup | Yes | We fix a threshold of λ = 0.2 for this experiment. We use the base model without instruction tuning in this case due to the use of a custom chat format. For the k NN-LM implementation, we follow (Khandelwal et al., 2020) and use the model s penultimate layer representation of the last token (conditioned on all preceding tokens in the document) as the context representation for k NN search. k NN-LM prediction uses γ = 0.5. |