MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

Authors: Yiwen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Jiaxiang Tang, Zhongang Cai, Lei Yang, Gang Yu, Guosheng Lin, Chi Zhang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments show that our method generates AMs with hundreds of times fewer faces, significantly improving storage, rendering, and simulation efficiencies, while achieving precision comparable to previous methods.
Researcher Affiliation Collaboration 1S-Lab, Nanyang Technological University 2Shanghai AI Lab 3Fudan University 4Peking University 5University of Chinese Academy of Sciences 6Sense Time Research 7Stepfun 8Westlake University
Pseudocode No The paper does not contain any explicit pseudocode or algorithm blocks. Methodological steps are described in prose and through diagrams.
Open Source Code No The paper provides a project website URL (https://buaacyw.github.io/mesh-anything/) but no explicit statement about releasing the source code or a direct link to a code repository for the methodology described.
Open Datasets Yes Mesh Anything is trained on a combined dataset of Objaverse (Deitke et al., 2023b) and Shape Net (Chang et al., 2015), selected for their complementary characteristics.
Dataset Splits Yes Our final filtered dataset consists of 51k meshes from Objaverse and 5k meshes from Shape Net. We randomly select 10% of this dataset as the evaluation dataset, with the remaining 90% used as the training set for all our experiments.
Hardware Specification Yes The VQ-VAE is trained on 8 A100 GPUs for 12 hours, after which we separately finetune the decoder part of the VQ-VAE into a noise-resistant decoder, as detailed in Section 4.2. Following this, the transformer is trained on 8 A100 GPUs for 4 days.
Software Dependencies No The paper mentions using BERT and OPT-350M as architectures but does not provide specific version numbers for any software dependencies (e.g., programming languages, libraries, frameworks) used in the implementation.
Experiment Setup Yes The training batch size for both the VQ-VAE and the transformer is set to 8 per GPU. The residual vector quantization (Zeghidour et al., 2021) depth is set to 3, with a codebook size of 8,192. We sample 4096 points for each point cloud. During training, we apply on-the-fly scaling, shifting, and rotation augmentations, normalizing each mesh to a unit bounding box from 0.5 to 0.5. We used top k and top p values set to 50 and 0.95, respectively.