reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

VLMaterial: Procedural Material Generation with Large Vision-Language Models

Authors: Beichen Li, Rundi Wu, Armando Solar-Lezama, Changxi Zheng, Liang Shi, Bernd Bickel, Wojciech Matusik

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive evaluation, we show that our method outperforms previous methods on both synthetic and real-world examples. We report quantitative comparison results in Table 1, and show qualitative examples in Figure 3 and Figure 5. Both graph structure augmentation and parameter augmentation are crucial to improving the matching quality on in-distribution and out-of-distribution test sets. We also conduct a user study among professional artists and domain researchers to validate how the generated materials substitute an artist s creation process (see Appendix C for details).
Researcher Affiliation	Collaboration	Beichen Li1, Rundi Wu2, Armando Solar-Lezama1, Changxi Zheng2, Liang Shi1, Bernd Bickel3,4, Wojciech Matusik1 1MIT CSAIL, 2Columbia University, 3ETH Z urich, 4Google Research
Pseudocode	Yes	Algorithm 1 MCMC-based local parameter search.
Open Source Code	Yes	Our dataset and code are available at https://github.com/mit-gfx/VLMaterial.
Open Datasets	Yes	We contribute the first open-source procedural material dataset in Blender (Blender, 2024a) to promote future research in this area. Our dataset and code are available at https://github.com/mit-gfx/VLMaterial. We first collected 3,663 free Blender materials from three online sources: 1) 2,411 materials in Blender Kit1, an online repository of Blender 3D assets; 2) the 60 base materials of Infinigen (Raistrick et al., 2023), a procedural 3D scene generation framework; 3) 1,192 materials from individually published Blender procedural material packs2.
Dataset Splits	Yes	We evaluate our method on in-distribution and out-of-distribution test images collected from three different sources: 1) 44 synthetic materials from Blender (Blender, 2024a), randomly selected and separated from our training dataset (before augmentation); 2) 64 synthetic materials from Substance (Adobe, 2024), which are randomly sampled from the evaluation set of (Hu et al., 2023); 3) 64 real photographs gathered from Shi et al. (2020) and Zhou et al. (2023), captured using smart phone cameras.
Hardware Specification	Yes	We use an Adam W optimizer (Loshchilov, 2017) with a 1e-4 learning rate and a cosine annealing schedule. The initial linear warmup period accounts for 3% of training steps. To effectively utilize GPU memory, we apply Flash Attention-2 (Dao, 2023) and train the model in BF16 precision on 8 NVIDIA H100 80GB GPUs using Deep Speed Ze RO-3 (Rasley et al., 2020; Rajbhandhari et al., 2020).
Software Dependencies	Yes	Rules: 1. Create no more than 30 nodes. Make sure your code can be correctly executed in Blender 3.3. Refer to the Blender Python API documentation for valid node types and parameters. The trainable modules include the MLP projector and an array of Lo RA adapters (Hu et al., 2021) applied to all attention-related linear layers in the LLa MA 3 model (with r = 8, α = 32, and a 0.05 dropout probability), amounting to 40M trainable parameters in total. To effectively utilize GPU memory, we apply Flash Attention-2 (Dao, 2023) and train the model in BF16 precision on 8 NVIDIA H100 80GB GPUs using Deep Speed Ze RO-3 (Rasley et al., 2020; Rajbhandari et al., 2020).
Experiment Setup	Yes	The trainable modules include the MLP projector and an array of Lo RA adapters (Hu et al., 2021) applied to all attention-related linear layers in the LLa MA 3 model (with r = 8, α = 32, and a 0.05 dropout probability), amounting to 40M trainable parameters in total. We use an Adam W optimizer (Loshchilov, 2017) with a 1e-4 learning rate and a cosine annealing schedule. The initial linear warmup period accounts for 3% of training steps. We train the model in BF16 precision... Each GPU accommodates a batch size of 4 training samples, resulting in an overall batch size of 32. The fine-tuning process lasts 5 epochs over three days. We run MCMC sampling for each predicted material for Niters = 200 iterations, accepting a worse sample in each iteration at a small probability pacc = 0.05. We use N = 50 and K = 20 for the following quantitative and qualitative results.