Permissive Information-Flow Analysis for Large Language Models

Authors: Shoaib Ahmed Siddiqui, Radhika Gaonkar, Boris Köpf, David Krueger, Andrew Paverd, Ahmed Salem, Shruti Tople, Lukas Wutschitz, Menglin Xia, Santiago Zanella-Beguelin

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results in an LLM agent setting show that our label propagator assigns a more permissive label over the baseline in more than 85% of the cases, which underscores the practicality of our approach. We demonstrate the effectiveness of our proposed label propagator by evaluating it on three datasets
Researcher Affiliation Collaboration Shoaib Ahmed Siddiqui1, Radhika Gaonkar2, Boris Köpf2, David Krueger3, Andrew Paverd2, Ahmed Salem2, Shruti Tople2, Lukas Wutschitz2, Menglin Xia2, Santiago Zanella-Béguelin2 EMAIL EMAIL EMAIL EMAIL 1University of Cambridge 2Microsoft 3Mila
Pseudocode Yes Algorithm 1 describes this idea in pseudocode. We represent the powerset of possible labels as a directed acyclic graph (DAG) where nodes represent labels and edges represent the lattice order (see Figure 2 for an illustration). Starting from the root node L = F c C ℓ(c) that corresponds to the full context, we traverse the DAG depth-first to identify λ-similar labels. For each label L, the function minimal_labels() returns Λ, the set of λ-similar labels at or below L (i.e., at least as permissive as L).
Open Source Code No The paper does not provide an explicit statement about releasing code or a direct link to a code repository for the methodology described.
Open Datasets Yes Starting from an existing fake news dataset (Pérez-Rosas et al., 2018), we create pairs of high and low-integrity news articles that discuss similar topics to each other.
Dataset Splits Yes For the introspection baseline, we use one-shot in-context learning (Wei et al., 2022) based on the first example in the dataset and use the remaining 63 questions for evaluation.
Hardware Specification No The paper mentions using Llama-2 model family (7B and 70B variants) but does not specify the hardware (e.g., specific GPU models, CPUs) on which these models were run for their experiments.
Software Dependencies No The paper mentions using Llama-2 models and GPT-4 for data generation, and metrics like ROUGE-L, but it does not specify any software libraries, frameworks, or programming languages with their version numbers.
Experiment Setup Yes We fix a threshold of λ = 0.2 for this experiment. We use the base model without instruction tuning in this case due to the use of a custom chat format. For the k NN-LM implementation, we follow (Khandelwal et al., 2020) and use the model s penultimate layer representation of the last token (conditioned on all preceding tokens in the document) as the context representation for k NN search. k NN-LM prediction uses γ = 0.5.