Controlled LLM Decoding via Discrete Auto-regressive Biasing
Authors: Patrick Pynadath, Ruqi Zhang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the advantages of our controlled decoding method on sentiment control, language detoxification, and keyword-guided generation. We evaluate DAB on three distinct controlled-generation tasks: sentiment-guided generation, language detoxification, and keyword-guided generation. |
| Researcher Affiliation | Academia | Patrick Pynadath, Ruqi Zhang Department of Computer Science Purdue University West Lafayette, Indiana, 47906, USA EMAIL |
| Pseudocode | Yes | We include the psuedo-code for our algorithm in 1; information on hyper-parameter settings in Appendix Cl and further details on each experiment in Appendix D.2, D.3, D.4. Additionally, we include the code-base used to produce our results at the following repository: https://github.com/patrickpynadath1/dab. (Referring to Algorithm 1 Discrete Autoregressive Biasing in Appendix B) |
| Open Source Code | Yes | We make our code available at the following url: https://github.com/patrickpynadath1/dab. (Abstract) Additionally, we include the code-base used to produce our results at the following repository: https://github.com/patrickpynadath1/dab. (Reproducibility section) |
| Open Datasets | Yes | Additionally we confirm that our experiments use only public datasets. (Reproducibility) We use 1,000 prompts sampled from the Real Toxicity Prompts introduced and generate continuations of length 20 tokens (Gehman et al., 2020; Kumar et al., 2022; Liu et al., 2023a). We use a RoBERTafine-tuned on the Jigsaw toxic comment dataset, following Kumar et al. (2022); Liu et al. (2023a). The internal model is a RoBERTA with GPT2-Large Embeddings fine-tuned on the yelp polarity dataset. We train this model following the codebase of Liu et al. (2023a). We train the steering matrix using the SST2 dataset, as done in Han et al. (2024). |
| Dataset Splits | No | The paper mentions using "1,000 prompts sampled from the Real Toxicity Prompts" and generating sequences of specific lengths (12, 20, 50). It also refers to datasets used for fine-tuning models (yelp polarity dataset, Jigsaw toxic comment dataset, SST2 dataset). However, it does not explicitly provide the training/validation/test splits (e.g., percentages or exact counts for each split) for these datasets, which is necessary to reproduce the data partitioning. |
| Hardware Specification | No | The paper mentions evaluating efficiency by timing operations on a "GPU" (Table 3), but it does not specify any particular GPU model (e.g., NVIDIA A100, RTX 3090) or other hardware details such as CPU, RAM, or specific server configurations used for running experiments. |
| Software Dependencies | No | The paper mentions several software components, including "fine-tuned RoBERTa model from Morris et al. (2020)", "GPT2-XL", "Hugging Face evaluate package", and "Auto Grad profiler within Pytorch". However, it does not provide specific version numbers for these software components (e.g., PyTorch 1.x, Hugging Face Transformers 4.x, Python 3.x), which are essential for reproducibility. |
| Experiment Setup | Yes | Here we include additional details on the experiment setup. We provide the hyper-parameter settings for our algorithm for each experiment in Table 4. (Appendix D). Table 4 lists: Hyper-parameter Sentiment Detoxify Topic, with specific values for Proposal Temp, Top-k, Bias Weight Value, and Number Sample Steps. |