reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Persona-aware Generative Model for Code-mixed Language

Authors: Ayan Sengupta, Md Shad Akhtar, Tanmoy Chakraborty

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the personification capabilities of PARADOX, we propose four new metrics CM BLEU, CM Rouge-1, CM Rouge-L and CM KS. On average, PARADOX achieves 1.6% better CM BLEU, 57% better perplexity and 32% better semantic coherence than the non-persona-based counterparts. The source code is available at: https://github.com/victor7246/PARADOX.
Researcher Affiliation	Academia	Ayan Sengupta EMAIL Department of Electrical Engineering Indian Institute of Technology Delhi Md. Shad Akhtar EMAIL Department of Computer Science & Engineering Indraprastha Institute of Information Technology Delhi Tanmoy Chakraborty EMAIL Department of Electrical Engineering Yardi School of Artificial Intelligence Indian Institute of Technology Delhi
Pseudocode	Yes	Algorithm 1: Code-Mixed Text Generation with PARADOX
Open Source Code	Yes	The source code is available at: https://github.com/victor7246/PARADOX.
Open Datasets	Yes	Finally, we collect two large-scale longitudinal datasets from Twitter and You Tube, primarily monolingual Hindi and Hindi-English code-mixed texts. The datasets will be valuable for code-mixing research. ... We release our curated datasets to encourage research on personalized code-mixed text generation.
Dataset Splits	Yes	We use a 75-25 split for training and validation with stratified sampling. Therefore, we can ensure at least one training and validation sample for each user.
Hardware Specification	Yes	We use one Tesla P100 and one Tesla V100 GPU to run all our experiments.
Software Dependencies	No	The paper mentions using a pre-trained language model open-sourced with Huggingface and calculating metrics with NLTK, but does not provide specific version numbers for these or other software libraries (e.g., Python, PyTorch/TensorFlow).
Experiment Setup	Yes	PARADOX consists of six encoder and decoder layers, with hidden sizes of 768 in all the layers. For multi-headed FAME and masked multi-headed FAME blocks, we use a total of eight heads with Dropout probabilities set as 0.1. The total number of parameters is 296M. We set the persona encoding variational weight λ = 0.5. All the models are trained for 50 epochs with an early stopping condition on validation loss with the patience of 10. We set batch_size = 4 in all experiments during training and validation. We use Adam optimizer with a learning rate of 4e 4 and β1 = 0.9, β2 = 0.98 for both PARADOX and Transformer. We fine-tune the Mu RIL and BLOOMZ models on autoregressive language modeling tasks for 10 epochs with learning rates 3e 5 and 3e 6, respectively.