Controlled LLM Decoding via Discrete Auto-regressive Biasing

Authors: Patrick Pynadath, Ruqi Zhang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the advantages of our controlled decoding method on sentiment control, language detoxification, and keyword-guided generation. We evaluate DAB on three distinct controlled-generation tasks: sentiment-guided generation, language detoxification, and keyword-guided generation.
Researcher Affiliation Academia Patrick Pynadath, Ruqi Zhang Department of Computer Science Purdue University West Lafayette, Indiana, 47906, USA EMAIL
Pseudocode Yes We include the psuedo-code for our algorithm in 1; information on hyper-parameter settings in Appendix Cl and further details on each experiment in Appendix D.2, D.3, D.4. Additionally, we include the code-base used to produce our results at the following repository: https://github.com/patrickpynadath1/dab. (Referring to Algorithm 1 Discrete Autoregressive Biasing in Appendix B)
Open Source Code Yes We make our code available at the following url: https://github.com/patrickpynadath1/dab. (Abstract) Additionally, we include the code-base used to produce our results at the following repository: https://github.com/patrickpynadath1/dab. (Reproducibility section)
Open Datasets Yes Additionally we confirm that our experiments use only public datasets. (Reproducibility) We use 1,000 prompts sampled from the Real Toxicity Prompts introduced and generate continuations of length 20 tokens (Gehman et al., 2020; Kumar et al., 2022; Liu et al., 2023a). We use a RoBERTafine-tuned on the Jigsaw toxic comment dataset, following Kumar et al. (2022); Liu et al. (2023a). The internal model is a RoBERTA with GPT2-Large Embeddings fine-tuned on the yelp polarity dataset. We train this model following the codebase of Liu et al. (2023a). We train the steering matrix using the SST2 dataset, as done in Han et al. (2024).
Dataset Splits No The paper mentions using "1,000 prompts sampled from the Real Toxicity Prompts" and generating sequences of specific lengths (12, 20, 50). It also refers to datasets used for fine-tuning models (yelp polarity dataset, Jigsaw toxic comment dataset, SST2 dataset). However, it does not explicitly provide the training/validation/test splits (e.g., percentages or exact counts for each split) for these datasets, which is necessary to reproduce the data partitioning.
Hardware Specification No The paper mentions evaluating efficiency by timing operations on a "GPU" (Table 3), but it does not specify any particular GPU model (e.g., NVIDIA A100, RTX 3090) or other hardware details such as CPU, RAM, or specific server configurations used for running experiments.
Software Dependencies No The paper mentions several software components, including "fine-tuned RoBERTa model from Morris et al. (2020)", "GPT2-XL", "Hugging Face evaluate package", and "Auto Grad profiler within Pytorch". However, it does not provide specific version numbers for these software components (e.g., PyTorch 1.x, Hugging Face Transformers 4.x, Python 3.x), which are essential for reproducibility.
Experiment Setup Yes Here we include additional details on the experiment setup. We provide the hyper-parameter settings for our algorithm for each experiment in Table 4. (Appendix D). Table 4 lists: Hyper-parameter Sentiment Detoxify Topic, with specific values for Proposal Temp, Top-k, Bias Weight Value, and Number Sample Steps.