Exploring Model Editing for LLM-based Aspect-Based Sentiment Classification

Authors: Shichen Li, Zhongqing Wang, Zheyu Zhao, Yue Zhang, Peifeng Li

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our in-domain and out-of-domain experiments demonstrate that this approach achieves competitive results compared to the currently strongest methods with significantly fewer trainable parameters, highlighting a more efficient and interpretable fine-tuning strategy.
Researcher Affiliation Academia Shichen Li1 Zhongqing Wang1*, Zheyu Zhao1, Yue Zhang2, Peifeng Li1 1Natural Language Processing Lab, Soochow University, Suzhou, China 2Westlake University EMAIL, EMAIL EMAIL
Pseudocode No The paper describes the methodology using textual explanations and figures, but no explicit 'Pseudocode' or 'Algorithm' blocks are provided.
Open Source Code No The paper mentions "https://huggingface.co/meta-llama/Llama-2-7b-hf" in footnote 1, which refers to a base LLM used, not the authors' own implementation code for their proposed method. There is no explicit statement about releasing their source code, nor is a link provided to a repository containing their code.
Open Datasets Yes The labeled dataset used in our experiments includes reviews from four different domains: Restaurant (R), Laptop (L), Device (D), and Service (S). Restaurant (R) is a combination of the restaurant reviews from Sem Eval 2014/2015/2016 (Pontiki et al. 2014, 2015, 2016). Laptop (L) is sourced from Sem Eval 2014 (Pontiki et al. 2014). Device (D) consists of all the digital device reviews collected by Toprak, Jakob, and Gurevych (2010). Service (S) contains reviews from web services introduced by Hu and Liu (2004).
Dataset Splits Yes Table 1: Distribution of reviews across different domains. Device Train 1,394 Test 691; Laptop Train 2,297 Test 631; Restaurant Train 4,284 Test 2,252; Service Train 1,840 Test 886
Hardware Specification Yes All comparison experiments are conducted on a single NVIDIA 3090 GPU and we take accuracy as the evaluation metric.
Software Dependencies No The paper mentions "Adam W (Loshchilov and Hutter 2018) is used as the optimizer" and "Llama-2-7b (Touvron et al. 2023) as our primary base large language model," but it does not specify any software libraries or frameworks with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Adam W (Loshchilov and Hutter 2018) is used as the optimizer, with a learning rate of 3 10 4 for the low-rank weight projection part and 1 10 5 for the representation editing part. For the comparison methods, we adopt standard experimental settings and commonly used parameters. Specifically, Lo RA and Dora utilize a learning rate of 1 10 4 with rank of 32. Additionally, we include Lo Reft with a learning rate of 2 10 5 with rank of 8. All comparison experiments are conducted on a single NVIDIA 3090 GPU and we take accuracy as the evaluation metric. The experimental results are obtained by averaging three runs with random initialization. The PEFT methods are trained for one epoch, while the full parameter methods are trained for three epochs.