reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Re-Thinking Inverse Graphics With Large Language Models

Authors: Peter Kulits, Haiwen Feng, Weiyang Liu, Victoria Fernandez Abrevaya, Michael J. Black

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through our investigation, we demonstrate the potential of LLMs to facilitate inverse graphics through next-token prediction, without the application of image-space supervision. Our analysis enables new possibilities for precise spatial reasoning about images that exploit the visual knowledge of LLMs. We release our code and data at https://ig-llm.is.tue.mpg.de/ to ensure the reproducibility of our investigation and to facilitate future research. 4 Evaluations To evaluate the ability of our proposed framework to generalize across distribution shifts, we design a number of focused evaluation settings. We conduct experiments on synthetic data in order to quantitatively analyze model capability under controlled shifts.
Researcher Affiliation	Academia	Peter Kulits* EMAIL Max Planck Institute for Intelligent Systems, Tübingen, Germany. Haiwen Feng* EMAIL Max Planck Institute for Intelligent Systems, Tübingen, Germany. Weiyang Liu EMAIL Max Planck Institute for Intelligent Systems, Tübingen, Germany, University of Cambridge. Victoria Abrevaya EMAIL Max Planck Institute for Intelligent Systems, Tübingen, Germany. Michael J. Black EMAIL Max Planck Institute for Intelligent Systems, Tübingen, Germany.
Pseudocode	No	The paper describes a framework and its components using figures (e.g., Figure 1 and 2) but does not include any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	Yes	We release our code and data at https://ig-llm.is.tue.mpg.de/ to ensure the reproducibility of our investigation and to facilitate future research.
Open Datasets	Yes	CLEVR (Johnson et al., 2017) is a procedurally generated dataset of simple 3D objects on a plane. ...incorporating objects sourced from Shape Net (Chang et al., 2015).
Dataset Splits	Yes	We train both our proposed framework and NS-VQA, our neural-scene de-rendering baseline, on 4k images from the ID condition and evaluate them on 1k images from both the ID and OOD conditions. ... we create a dataset comprising 10k images... render a training dataset of one-million images. ...render 100k training images and evaluate the framework on three conditions, each with 1K images
Hardware Specification	No	The paper mentions using "Deep Speed Ze RO-2" as a memory optimization technique, but it does not specify any particular hardware components like GPU models, CPU types, or cloud computing instances used for the experiments.
Software Dependencies	Yes	We finetune the LLa MA 1-based Vicuna 1.3 model2 with Lo RA (Hu et al., 2022a). We use the Hugging Face Transformers and PEFT libraries, along with Deep Speed Ze RO-2 (Rajbhandari et al., 2020). ... The frozen CLIP visual tokenizer from 3. (Footnote 2 points to https://huggingface.co/lmsys/vicuna-7b-v1.3 and Footnote 3 points to https://huggingface.co/openai/clip-vit-large-patch14-336)
Experiment Setup	Yes	In all experiments, we use a lora_r of 128, a lora_alpha of 256, a Lo RA learning rate of 2e-05, a linear projector learning rate of 2e-05, a numeric head learning rate of 2e-04, and a cosine learning-rate schedule. All models are trained with an effective batch size of 32 with bfloat16 mixed-precision training. Both the cross-entropy next-token-prediction and mean-square-error (MSE) losses are given a weight of 1. The models for the CLEVR and parameter-space generalization experiments are trained for 40k steps. The single-object 6-Do F pose-estimation model is trained for 200k and the scene-level Shape Net model for 500k steps.