VLMaterial: Procedural Material Generation with Large Vision-Language Models
Authors: Beichen Li, Rundi Wu, Armando Solar-Lezama, Changxi Zheng, Liang Shi, Bernd Bickel, Wojciech Matusik
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive evaluation, we show that our method outperforms previous methods on both synthetic and real-world examples. We report quantitative comparison results in Table 1, and show qualitative examples in Figure 3 and Figure 5. Both graph structure augmentation and parameter augmentation are crucial to improving the matching quality on in-distribution and out-of-distribution test sets. We also conduct a user study among professional artists and domain researchers to validate how the generated materials substitute an artist s creation process (see Appendix C for details). |
| Researcher Affiliation | Collaboration | Beichen Li1, Rundi Wu2, Armando Solar-Lezama1, Changxi Zheng2, Liang Shi1, Bernd Bickel3,4, Wojciech Matusik1 1MIT CSAIL, 2Columbia University, 3ETH Z urich, 4Google Research |
| Pseudocode | Yes | Algorithm 1 MCMC-based local parameter search. |
| Open Source Code | Yes | Our dataset and code are available at https://github.com/mit-gfx/VLMaterial. |
| Open Datasets | Yes | We contribute the first open-source procedural material dataset in Blender (Blender, 2024a) to promote future research in this area. Our dataset and code are available at https://github.com/mit-gfx/VLMaterial. We first collected 3,663 free Blender materials from three online sources: 1) 2,411 materials in Blender Kit1, an online repository of Blender 3D assets; 2) the 60 base materials of Infinigen (Raistrick et al., 2023), a procedural 3D scene generation framework; 3) 1,192 materials from individually published Blender procedural material packs2. |
| Dataset Splits | Yes | We evaluate our method on in-distribution and out-of-distribution test images collected from three different sources: 1) 44 synthetic materials from Blender (Blender, 2024a), randomly selected and separated from our training dataset (before augmentation); 2) 64 synthetic materials from Substance (Adobe, 2024), which are randomly sampled from the evaluation set of (Hu et al., 2023); 3) 64 real photographs gathered from Shi et al. (2020) and Zhou et al. (2023), captured using smart phone cameras. |
| Hardware Specification | Yes | We use an Adam W optimizer (Loshchilov, 2017) with a 1e-4 learning rate and a cosine annealing schedule. The initial linear warmup period accounts for 3% of training steps. To effectively utilize GPU memory, we apply Flash Attention-2 (Dao, 2023) and train the model in BF16 precision on 8 NVIDIA H100 80GB GPUs using Deep Speed Ze RO-3 (Rasley et al., 2020; Rajbhandhari et al., 2020). |
| Software Dependencies | Yes | Rules: 1. Create no more than 30 nodes. Make sure your code can be correctly executed in Blender 3.3. Refer to the Blender Python API documentation for valid node types and parameters. The trainable modules include the MLP projector and an array of Lo RA adapters (Hu et al., 2021) applied to all attention-related linear layers in the LLa MA 3 model (with r = 8, α = 32, and a 0.05 dropout probability), amounting to 40M trainable parameters in total. To effectively utilize GPU memory, we apply Flash Attention-2 (Dao, 2023) and train the model in BF16 precision on 8 NVIDIA H100 80GB GPUs using Deep Speed Ze RO-3 (Rasley et al., 2020; Rajbhandari et al., 2020). |
| Experiment Setup | Yes | The trainable modules include the MLP projector and an array of Lo RA adapters (Hu et al., 2021) applied to all attention-related linear layers in the LLa MA 3 model (with r = 8, α = 32, and a 0.05 dropout probability), amounting to 40M trainable parameters in total. We use an Adam W optimizer (Loshchilov, 2017) with a 1e-4 learning rate and a cosine annealing schedule. The initial linear warmup period accounts for 3% of training steps. We train the model in BF16 precision... Each GPU accommodates a batch size of 4 training samples, resulting in an overall batch size of 32. The fine-tuning process lasts 5 epochs over three days. We run MCMC sampling for each predicted material for Niters = 200 iterations, accepting a worse sample in each iteration at a small probability pacc = 0.05. We use N = 50 and K = 20 for the following quantitative and qualitative results. |