GenesisTex2: Stable, Consistent and High-Quality Text-to-Texture Generation
Authors: Jiawei Lu, YingPeng Zhang, Zengjun Zhao, He Wang, Kun Zhou, Tianjia Shao
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We have conducted extensive evaluations on a variety of 3D objects. The evidence demonstrates that our approach significantly surpasses the performance of the baseline methods by better preserving the generative potential of the original T2I models in aspects of details and color richness while maintaining multi-view consistency. We conduct comparisons with four available methods for text-to-texture synthesis, including Text2Tex (Chen et al. 2023b), TEXTure (Richardson et al. 2023), Sync MVD (Liu et al. 2023c), Genesis Tex (Gao et al. 2024). We also conducted a user study to analyze the results in three aspects: 1) consistency, 2) diversity, and 3) overall quality. |
| Researcher Affiliation | Collaboration | Jiawei Lu1,2*, Yingpeng Zhang2* , Zengjun Zhao2, He Wang3, Kun Zhou1, Tianjia Shao1 1 State Key Lab of CAD&CG, Zhejiang University 2 Tencent IEG 3 AI Centre, Computer Science, University College London |
| Pseudocode | No | The paper mentions an algorithm in Section 3.3: 'The detailed algorithm could be found in the supplement.' However, no structured pseudocode or algorithm block is present in the main text of the paper. |
| Open Source Code | No | The paper states: 'Importantly, our framework does not require additional training or fine-tuning, making it highly adaptable to a wide range of models available on public platforms.' This indicates the use of existing models but does not provide concrete access information (e.g., a specific repository link or explicit code release statement) for the authors' own implementation of Genesis Tex2. |
| Open Datasets | Yes | The dataset for evaluation contains 35 meshes with 63 mesh-prompt pairs. These meshes are collected from the publicly open dataset, including objarverse (Deitke et al. 2023), shapenet (Chang et al. 2015), and Stanford 3D Scanning Repository (Turk and Levoy 1994). |
| Dataset Splits | No | The paper describes the generation of evaluation data from publicly open datasets by rendering depth maps from 12 different viewpoints for 35 meshes. However, it does not specify traditional training/test/validation splits for machine learning models, as their method utilizes pre-trained diffusion models and does not involve training on these specific datasets with such splits. |
| Hardware Specification | Yes | We test our method on an NVIDIA A800 GPU, and the entire process was able to finish in 1 minute. |
| Software Dependencies | No | The paper mentions using 'SDXL (Podell et al. 2023) as our base model and Control Net-Depth (Zhang, Rao, and Agrawala 2023) trained for SDXL for spatial control'. While these refer to specific models/frameworks, they do not include specific software version numbers (e.g., Python 3.x, PyTorch 1.x) for the overall implementation environment. |
| Experiment Setup | Yes | The CFG scale is set to 12. We linearly interpolate the view-dependent weight ω for the first 8 steps. The parameters γ and ωmin are set to 8 and 1e 3. We replace the self-attention layers in the output layers of SDXL with our proposed 3Daware local attention mechanism, and we set o, r, δ as 2, 20 and 0.1 to achieve the best performance. |