Graphmax for Text Generation

Authors: Bin Liu, Guosheng Yin

JAIR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments, we demonstrate that the new GTV-based regularization can improve performances in various natural language processing (NLP) tasks in comparison with existing methods.
Researcher Affiliation Academia Bin Liu EMAIL The Center of Statistical Research, School of Statistics, Southwestern University of Finance and Economics, Chengdu, China Guosheng Yin EMAIL Corresponding author Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong, China
Pseudocode Yes Algorithm 1: Projected Gradient Descent for graphmax. Input: Learning rate α, maximum number of iteration steps T 2 while t < T do 3 at+1 xt α f(xt) ; 4 Sort at+1 = (at+1 1 , . . . , at+1 N ) as (bt+1 1 , . . . , bt+1 N ); 5 Find k = max{1 j N : bt+1 j + (1 Pj i=1 bt+1 j )/j > 0}; 6 Calculate γ = (1 Pk i=1 bt+1 j )/k; 7 Calculate xt+1 i = max{at+1 i + γ, 0}; 8 If xt+1 xt 2 < 10 4 then break; 9 end Output: A near-optimal minimum x of the graphmax.
Open Source Code No The paper does not explicitly state that the authors are releasing their own code for the methodology described. It mentions third-party models like GPT-2, BART, LLAMA2, and references a HuggingFace link for OPT, but this is not a release of their own implementation.
Open Datasets Yes Datasets and Baseline Methods We choose the Amazon (Zhang, Zhao, & Le Cun, 2015) and Yelp (Mc Auley & Leskovec, 2013; Zhang et al., 2015) datasets for general text generation, WMT 16 (Bojar et al., 2016) and WMT 21 corpora (Tran et al., 2021) for machine translation.
Dataset Splits Yes The reported results are averaged over 100 independent runs. Table 2: The BLEU scores of GPT-2, CTRL, PPLM, OPT, LLAMA2, and the proposed methods on generating product reviews ( standard deviation), averaged over five cross-validation folds.
Hardware Specification Yes All models are implemented using Pytorch 1.7 on an Intel(R) Xeon(R) CPU E5-2680 v4 2.40GHz, Tesla K80 GPU, and 128G memory, based on the Ubuntu 16.04 platform.
Software Dependencies Yes All models are implemented using Pytorch 1.7 on an Intel(R) Xeon(R) CPU E5-2680 v4 2.40GHz, Tesla K80 GPU, and 128G memory, based on the Ubuntu 16.04 platform.
Experiment Setup Yes The maximum number of iteration steps for the graphmax algorithm is set as T = 20, the learning rate α is 10 4, and the convergence threshold is 10 4. Specifically, the iteration process stops when the l2 norm of the difference between consecutive iterations, xt+1 xt 2, is less than 10 4.