reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Introducing "Forecast Utterance" for Conversational Data Science

Authors: Md. Mahadi Hassan, Alex Knipper, Shubhra Kanti Karmaker Santu

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments, evaluated with three meticulously crafted data sets, validate the viability of our ambitious goal and demonstrate the effectiveness of both EE and QA techniques in interpreting Forecast Utterances. Specifically, we frame the task as a slot-filling problem, where each slot corresponds to a specific aspect of the goal prediction task. We then employ two methods based on self-supervision with synthetic examples for solving the slot-filling task, namely: 1) Entity Extraction (EE), and 2) Question-Answering (QA) techniques.
Researcher Affiliation	Academia	Md Mahadi Hassan EMAIL Department of Computer Science and Software Engineering Auburn University Alex Knipper EMAIL Department of Computer Science and Software Engineering Auburn University Shubhra Kanti Karmaker (Santu) EMAIL Department of Computer Science and Software Engineering Auburn University
Pseudocode	Yes	Algorithm 1: Forecasting Goal Extraction from User Utterances via Slot-Filling. Algorithm 2: Artificial Training Data Generation through Heuristic for Fine-tuning EE/QA Models for Target Attribute slot. Algorithm 3: Generating K synthetic utterances using T5 model variants based on attributes: The algorithm takes attributes, a T5 model, templates, and K as input, and generates a synthetic dataset containing K balanced utterances from three T5 model variants, each with different levels of template conformity.
Open Source Code	Yes	Submission of a zip file containing source code, with specification of all dependencies, including external libraries, or a link to such resources (while still anonymized) Description of computing infrastructure used: A zip file will be submitted in the submission panel along with the manual.
Open Datasets	Yes	In the course of our research, we put our system to the test using three publicly available Kaggle datasets: Flight Delay (FD)2, Online Food Delivery Preferences (OD)3 and Student Performance (SP)4(Details in Appendix A.3). 2https://www.kaggle.com/usdot/flight-delays 3https://www.kaggle.com/datasets/benroshan/online-food-delivery-preferencesbangalore-region 4https://www.kaggle.com/datasets/larsen0966/student-performance-data-set
Dataset Splits	Yes	We split those data into an 8:2 ratio for training and testing split. We have created handcrafted validation sets for each of the three datasets by actively engaging human volunteers with decent data science expertise and asking them to create utterances expressing forecasting goals. Each instance consists of a user utterance and associated ground truth slot-value labels. ... The final test sets contain 344, 170, and 209 sentences for the FD, OD, and SP datasets, respectively.
Hardware Specification	Yes	We have used one Nvidia Quadro RTX 5000 GPU and reported the time needed to fine-tune the models in our case study.
Software Dependencies	No	The paper states: "Submission of a zip file containing source code, with specification of all dependencies, including external libraries, or a link to such resources (while still anonymized) Description of computing infrastructure used: A zip file will be submitted in the submission panel along with the manual." This indicates that dependencies will be specified in a supplementary file, but the paper itself does not list specific software component names with version numbers in its main text.
Experiment Setup	Yes	We performed an exhaustive hyperparameter search using a subset of the artificially generated dataset. We present the search space for out hyperparameter tuning in Table 10 and Table 11, where we varied the learning rate and weight decay. ... In Table 12 and Table 13 we present the final set of hyperparameters selected for Entity extraction and Question Answering tasks respectively using the Heuristic based dataset. In Table 14 and Table 15 we present the final set of hyperparameters selected for Entity extraction and Question Answering tasks respectively using the T5 based dataset.