MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers
Authors: Yiwen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Jiaxiang Tang, Zhongang Cai, Lei Yang, Gang Yu, Guosheng Lin, Chi Zhang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments show that our method generates AMs with hundreds of times fewer faces, significantly improving storage, rendering, and simulation efficiencies, while achieving precision comparable to previous methods. |
| Researcher Affiliation | Collaboration | 1S-Lab, Nanyang Technological University 2Shanghai AI Lab 3Fudan University 4Peking University 5University of Chinese Academy of Sciences 6Sense Time Research 7Stepfun 8Westlake University |
| Pseudocode | No | The paper does not contain any explicit pseudocode or algorithm blocks. Methodological steps are described in prose and through diagrams. |
| Open Source Code | No | The paper provides a project website URL (https://buaacyw.github.io/mesh-anything/) but no explicit statement about releasing the source code or a direct link to a code repository for the methodology described. |
| Open Datasets | Yes | Mesh Anything is trained on a combined dataset of Objaverse (Deitke et al., 2023b) and Shape Net (Chang et al., 2015), selected for their complementary characteristics. |
| Dataset Splits | Yes | Our final filtered dataset consists of 51k meshes from Objaverse and 5k meshes from Shape Net. We randomly select 10% of this dataset as the evaluation dataset, with the remaining 90% used as the training set for all our experiments. |
| Hardware Specification | Yes | The VQ-VAE is trained on 8 A100 GPUs for 12 hours, after which we separately finetune the decoder part of the VQ-VAE into a noise-resistant decoder, as detailed in Section 4.2. Following this, the transformer is trained on 8 A100 GPUs for 4 days. |
| Software Dependencies | No | The paper mentions using BERT and OPT-350M as architectures but does not provide specific version numbers for any software dependencies (e.g., programming languages, libraries, frameworks) used in the implementation. |
| Experiment Setup | Yes | The training batch size for both the VQ-VAE and the transformer is set to 8 per GPU. The residual vector quantization (Zeghidour et al., 2021) depth is set to 3, with a codebook size of 8,192. We sample 4096 points for each point cloud. During training, we apply on-the-fly scaling, shifting, and rotation augmentations, normalizing each mesh to a unit bounding box from 0.5 to 0.5. We used top k and top p values set to 50 and 0.95, respectively. |