Persona-aware Generative Model for Code-mixed Language
Authors: Ayan Sengupta, Md Shad Akhtar, Tanmoy Chakraborty
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the personification capabilities of PARADOX, we propose four new metrics CM BLEU, CM Rouge-1, CM Rouge-L and CM KS. On average, PARADOX achieves 1.6% better CM BLEU, 57% better perplexity and 32% better semantic coherence than the non-persona-based counterparts. The source code is available at: https://github.com/victor7246/PARADOX. |
| Researcher Affiliation | Academia | Ayan Sengupta EMAIL Department of Electrical Engineering Indian Institute of Technology Delhi Md. Shad Akhtar EMAIL Department of Computer Science & Engineering Indraprastha Institute of Information Technology Delhi Tanmoy Chakraborty EMAIL Department of Electrical Engineering Yardi School of Artificial Intelligence Indian Institute of Technology Delhi |
| Pseudocode | Yes | Algorithm 1: Code-Mixed Text Generation with PARADOX |
| Open Source Code | Yes | The source code is available at: https://github.com/victor7246/PARADOX. |
| Open Datasets | Yes | Finally, we collect two large-scale longitudinal datasets from Twitter and You Tube, primarily monolingual Hindi and Hindi-English code-mixed texts. The datasets will be valuable for code-mixing research. ... We release our curated datasets to encourage research on personalized code-mixed text generation. |
| Dataset Splits | Yes | We use a 75-25 split for training and validation with stratified sampling. Therefore, we can ensure at least one training and validation sample for each user. |
| Hardware Specification | Yes | We use one Tesla P100 and one Tesla V100 GPU to run all our experiments. |
| Software Dependencies | No | The paper mentions using a pre-trained language model open-sourced with Huggingface and calculating metrics with NLTK, but does not provide specific version numbers for these or other software libraries (e.g., Python, PyTorch/TensorFlow). |
| Experiment Setup | Yes | PARADOX consists of six encoder and decoder layers, with hidden sizes of 768 in all the layers. For multi-headed FAME and masked multi-headed FAME blocks, we use a total of eight heads with Dropout probabilities set as 0.1. The total number of parameters is 296M. We set the persona encoding variational weight λ = 0.5. All the models are trained for 50 epochs with an early stopping condition on validation loss with the patience of 10. We set batch_size = 4 in all experiments during training and validation. We use Adam optimizer with a learning rate of 4e 4 and β1 = 0.9, β2 = 0.98 for both PARADOX and Transformer. We fine-tune the Mu RIL and BLOOMZ models on autoregressive language modeling tasks for 10 epochs with learning rates 3e 5 and 3e 6, respectively. |