reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks

Authors: Jiarui Xu, Yossi Gandelsman, Amir Bar, Jianwei Yang, Jianfeng Gao, Trevor Darrell, Xiaolong Wang

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that training our model with text conditioning and scaling the dataset size improves in-context learning for computer vision tasks by over +10% AP for Foreground Segmentation, over +5% gains in AP for Single Object Detection, and almost 20% lower LPIPS in Colorization. Our emperical results suggest that vision and language prompts are complementary and it is advantageous to use both to achieve better in-context learning performance.
Researcher Affiliation	Collaboration	Jiarui Xu EMAIL UC San Diego Yossi Gandelsman EMAIL UC Berkeley Amir Bar EMAIL UC Berkeley Jianwei Yang EMAIL Microsoft Research Jianfeng Gao EMAIL Microsoft Research Trevor Darrell EMAIL UC Berkeley Xiaolong Wang EMAIL UC San Diego
Pseudocode	No	The paper describes the model architecture and training process in text and a diagram (Figure 2), but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement about the release of source code for the described methodology, nor does it include a link to a code repository. The Open Review link is for peer review, not code.
Open Datasets	Yes	We train a IMProv with a Vi T-L backbone on a combination of our CCVF, S2CV dataset and LAION400M (Schuhmann et al., 2021)... We use the Pascal VOC 2012 dataset (Everingham et al., 2015)... We randomly sampled 1000 example pairs and image query from Image Net (Russakovsky et al., 2015) validation set... We follow the evaluation protocol of Bar et al. (2022) and test IMProv on four splits of Pascal-5i dataset (Shaban et al., 2017).
Dataset Splits	Yes	We follow the evaluation protocol of Bar et al. (2022) and test IMProv on four splits of Pascal-5i dataset (Shaban et al., 2017)... We randomly sampled 1000 example pairs and image query from Image Net (Russakovsky et al., 2015) validation set and converted them to gray-scale to obtain gray-scale and color version for each image.
Hardware Specification	Yes	We train our models on one machine with 8 A100 GPUs with a batch size of 2048 for 150k iterations.
Software Dependencies	No	The paper mentions using Adam W optimizer and pre-trained models like CLIP and VQGAN, but it does not specify version numbers for any core software dependencies like programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or CUDA versions.
Experiment Setup	Yes	We use Adam W (Loshchilov & Hutter, 2017) optimizer with a learning rate of 2e 4 and weight decay of 0.05. We train our models on one machine with 8 A100 GPUs with a batch size of 2048 for 150k iterations. Our learning-rate schedule consists of 2k linear warm-up steps followed by a cosine learning rate decay. During training, we drop the text conditioning with a probability of 0.1. During training, the input image x is split into patches and randomly masked by dropping a fixed percent of the patches (75% in our experiments).