Persona-aware Generative Model for Code-mixed Language

Authors: Ayan Sengupta, Md Shad Akhtar, Tanmoy Chakraborty

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the personification capabilities of PARADOX, we propose four new metrics CM BLEU, CM Rouge-1, CM Rouge-L and CM KS. On average, PARADOX achieves 1.6% better CM BLEU, 57% better perplexity and 32% better semantic coherence than the non-persona-based counterparts. The source code is available at: https://github.com/victor7246/PARADOX.
Researcher Affiliation Academia Ayan Sengupta EMAIL Department of Electrical Engineering Indian Institute of Technology Delhi Md. Shad Akhtar EMAIL Department of Computer Science & Engineering Indraprastha Institute of Information Technology Delhi Tanmoy Chakraborty EMAIL Department of Electrical Engineering Yardi School of Artificial Intelligence Indian Institute of Technology Delhi
Pseudocode Yes Algorithm 1: Code-Mixed Text Generation with PARADOX
Open Source Code Yes The source code is available at: https://github.com/victor7246/PARADOX.
Open Datasets Yes Finally, we collect two large-scale longitudinal datasets from Twitter and You Tube, primarily monolingual Hindi and Hindi-English code-mixed texts. The datasets will be valuable for code-mixing research. ... We release our curated datasets to encourage research on personalized code-mixed text generation.
Dataset Splits Yes We use a 75-25 split for training and validation with stratified sampling. Therefore, we can ensure at least one training and validation sample for each user.
Hardware Specification Yes We use one Tesla P100 and one Tesla V100 GPU to run all our experiments.
Software Dependencies No The paper mentions using a pre-trained language model open-sourced with Huggingface and calculating metrics with NLTK, but does not provide specific version numbers for these or other software libraries (e.g., Python, PyTorch/TensorFlow).
Experiment Setup Yes PARADOX consists of six encoder and decoder layers, with hidden sizes of 768 in all the layers. For multi-headed FAME and masked multi-headed FAME blocks, we use a total of eight heads with Dropout probabilities set as 0.1. The total number of parameters is 296M. We set the persona encoding variational weight λ = 0.5. All the models are trained for 50 epochs with an early stopping condition on validation loss with the patience of 10. We set batch_size = 4 in all experiments during training and validation. We use Adam optimizer with a learning rate of 4e 4 and β1 = 0.9, β2 = 0.98 for both PARADOX and Transformer. We fine-tune the Mu RIL and BLOOMZ models on autoregressive language modeling tasks for 10 epochs with learning rates 3e 5 and 3e 6, respectively.