PivotMesh: Generic 3D Mesh Generation via Pivot Vertices Guidance
Authors: Haohan Weng, Yikai Wang, Tong Zhang, C. L. Philip Chen, Jun Zhu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first evaluate the proposed Pivot Mesh on the commonly used benchmark Shapenet with four selected categories, chair, table, bench, and lamp. Following the previous setting (Siddiqui et al., 2023; Alliegro et al., 2023), we first pretrain our model on the mixture dataset of four selected categories and then finetune each category separately. We report the generation results both on the mixed dataset and each subset in Table 2. Furthermore, we train our model on the larger scale datasets Objaverse and Objaverse-xl and report the performance in Table 3. For all these experiments, our method can achieve state-of-the-art performance on all evaluation metrics. As shown in Figure 4 and Figure 5, our model can generate meshes with the best visual quality and geometry complexity. Mesh GPT can produce complete meshes but it is trapped in simple geometry due to its network capability and the complex mesh sequence. With the hierarchical autoencoder and pivot vertices guidance, our model can produce compact meshes with sharp details and complex geometry. Besides unconditional generation, we also compare with Mesh Anything in point cloud conditioning as shown in Figure 6. Our model shows the advantages of modeling complex mesh geometry. |
| Researcher Affiliation | Collaboration | Haohan Weng1 Yikai Wang2 Tong Zhang1 C. L. Philip Chen1 Jun Zhu23 1South China University of Technology 2Tsinghua University 3Sheng Shu |
| Pseudocode | No | The paper describes methods and procedures in paragraph text and figures, but no explicitly labeled pseudocode or algorithm blocks are present. |
| Open Source Code | No | Project Page: https://whaohan.github.io/ pivotmesh |
| Open Datasets | Yes | Our model is trained on various classes and scales datasets, including Shape Net V2 (Chang et al., 2015), Objaverse (Deitke et al., 2023), Objaverse-xl (Deitke et al., 2024). |
| Dataset Splits | Yes | For each dataset, we split 1k samples for testing, and leave the rest as the training data. |
| Hardware Specification | Yes | It is trained on an 8 A100-80GB machine for around 1 day with a batch size of 64 for each GPU. For auto-regressive Transformer, it has 24 layers with a hidden size of 1024. It is trained on an 8 A100-80GB machine for around 3 days with batch size 12 for each GPU. |
| Software Dependencies | No | The paper mentions optimizers (Adam W) and techniques (flash attention, fp16 mixed precision) but does not provide specific version numbers for software libraries or frameworks like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | For the autoencoder, the face encoder has 12 layers with a hidden size of 512, the face decoder has 6 layers with a hidden size of 512, and the vertex decoder has 6 layers with a hidden size of 256. For vector quantization, the number of residual quantizers r = 2, and the codebook is dynamically updated by exponential moving averaging with codebook size 16384 and codebook dimension 256. It is trained on an 8 A100-80GB machine for around 1 day with a batch size of 64 for each GPU. For auto-regressive Transformer, it has 24 layers with a hidden size of 1024. It is trained on an 8 A100-80GB machine for around 3 days with batch size 12 for each GPU. The temperature used for sampling is set to 0.5 to balance the quality and diversity. We use Adam W Loshchilov & Hutter (2017) as the optimizer with β1 = 0.9 and β2 = 0.99 with a learning rate of 10 4 for all the experiments. In our experiments, the select ratio ηselect = 15% and the dropping ratio ηdrop = 5%, yielding the final pivot vertex ratio η = 10%. |