PID Control-Based Self-Healing to Improve the Robustness of Large Language Models

Authors: Zhuotong Chen, Zihu Wang, Yifan Yang, Qianxiao Li, Zheng Zhang

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 3, Numerical Experiments are discussed, including detailed Experimental Setup (Section 3.1), evaluation against adversarial examples (Section 3.2), robustness against adversarial datasets (Section 3.3), and an ablation study (Section 3.4). The paper presents performance comparisons using radar plots and tables (e.g., Figures 2, Tables 1, 2, 3, 4, 5, 6, 7), discusses accuracy improvements, and examines computational wall time, all indicative of empirical evaluation.
Researcher Affiliation Academia All listed authors are affiliated with universities: University of California, Santa Barbara, and National University of Singapore. One author, Qianxiao Li, also lists Institute of High Performance Computing, A*STAR, Singapore, which is a public research institution. All associated email domains are academic (.edu, .edu.sg).
Pseudocode Yes Algorithm 1 Tucker Decomposition. Input: An I-way tensor X. Output: Core tensor G, orthogonal basis V1, V2, , VI. for i = 1 to I do Xi = Reshape (X, i), // Reshape the tensor along the ith mode. Ui, Si, Vi = SVD (Xi), // Perform singular value decomposition on the reshaped tensor. Save the singular vectors Vi as the orthogonal basis. end for G = X, // Initialize the tensor core with the I-way tensor X. for i = 1 to I do G = G i Vi, // Multiply the core tensor by the ith orthogonal basis. end for
Open Source Code Yes A detailed implementation can be found in:https://github.com/zhuotongchen/ PID-Control-Based-Self-Healing-to-Improve-the-Robustness-of-Large-Language-Models.
Open Datasets Yes Evaluation methods: We consider both adversarial attack algorithms (e.g. A2T, PSO, Text Bugger, Text Fooler), applied on the SNLI (Bowman et al., 2015), MNLI datasets (Williams et al., 2018) and adversarial datasets (e.g. ANLI) to evaluate the robustness of the proposed PID control and baselines. ... Adversarial NLI (ANLI) (Nie et al., 2020) is a large-scale NLI benchmark...
Dataset Splits No The paper mentions using SNLI, MNLI, and ANLI datasets and refers to "training data" and "testing dataset" but does not explicitly provide specific percentages, absolute sample counts, or detailed splitting methodology used for their experiments. While it notes ANLI has "development and test datasets," it doesn't specify how these were used or divided by the authors for their evaluation.
Hardware Specification No The paper discusses computational wall time (Table 5) and the computational cost of the proposed method, but it does not specify any particular hardware components such as specific GPU or CPU models, or memory configurations used for running the experiments.
Software Dependencies No The paper mentions using various models like Distilbert, BERT-large, RoBERTa Base, RoBERTa Large, and OPT-1.3B, and refers to LoRA for fine-tuning. However, it does not provide specific version numbers for the software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup Yes PID control implementation details: Using a pre-trained model (e.g., BERT), we select training data that this model can accurately predict. Next, we simulate forward propagation using the pre-trained model on this specific set of training data, which generates a collection of 3-dimensional tensors, denoted as {Xt}T 1 t=0 . Following this, we employ Algorithm 1 on each tensor to determine the basis for a linear embedding subspace (see Section 2.4). The dimension of this subspace is chosen based on the criterion that it must account for 99% of the total variance observed (this is done by accumulating the singular values). Finally, the optimal solution outlined in Proposition 3 is implemented to generate a time-dependent control regularization parameter. ... Given that our hyperparameter searching space only contains 0 and 0.5 for each control gain, this results in the values Kp = 0.5, Kd = 0.5, and KI = 0.