reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Image Captioning as an Assistive Technology: Lessons Learned from VizWiz 2020 Challenge

Authors: Pierre Dognin, Igor Melnyk, Youssef Mroueh, Inkit Padhi, Mattia Rigotti, Jarret Ross, Yair Schiff, Richard A. Young, Brian Belgodere

JAIR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This work details the theory and engineering from our winning submission to the 2020 captioning competition. Our work provides a step towards improved assistive image captioning systems. ... Finally, we give experimental details and extensive ablations studies on the 2020 Viz Wiz Grand Challenge and the competition results in Section 5.
Researcher Affiliation	Industry	Pierre Dognin EMAIL Igor Melnyk EMAIL Youssef Mroueh EMAIL Inkit Padhi EMAIL Mattia Rigotti EMAIL Jarret Ross EMAIL Yair Schiff EMAIL IBM Research AI, T.J. Watson Research Center, Yorktown Heights, NY, USA. Richard A. Young EMAIL IBM Research South Africa, Johannesburg, South Africa. Brian Belgodere EMAIL IBM Research, T.J. Watson Research Center, Yorktown Heights, NY, USA.
Pseudocode	No	The paper includes architectural diagrams (Figure 2 and Figure 3) and describes the methodology in prose, but does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	We also visualize the image, caption, objects and the words detected by the OCR to the screen (see the video of the real time demo on https://github.com/IBM/IBM Viz Wiz ).
Open Datasets	Yes	Image captioning has recently demonstrated impressive progress largely owing to the introduction of neural network algorithms trained on curated dataset like MS-COCO. ... This gap motivated the introduction of the novel Viz Wiz dataset... We use the Viz Wiz Captions dataset for all our experiments.
Dataset Splits	Yes	Table 1: Viz Wiz Captions dataset information. Subset Images Captions Training 23,431 117,155 Validation 7,750 138,750 Testing 8,000 40,000
Hardware Specification	No	The paper mentions GPUs in the context of the real-time demo pipeline ("sent to the first GPU in the pipeline", "sent to the second GPU") and also "cloud machines", but does not provide specific models or specifications for these hardware components.
Software Dependencies	No	The paper mentions using a BERT tokenizer, fast Text (Bojanowski et al., 2016), ADAM optimizer, and flask (Grinberg, 2018), but it does not specify version numbers for these software components or libraries.
Experiment Setup	Yes	CE Training The CE training is run for 10 epochs, using a batch size of 80. We employ SGD with ADAM optimizer with (β1, β2) = (0.9, 0.98). We warm the learning rate with a factor of 1 for 2000 iterations (minibatches) and then decay it proportionally to 1/i, where i is the iteration step. ... We used a batch size of 80 images/captions as well as the states learned in the ADAM optimizer during CE training.