EdgeRunner: Auto-regressive Auto-encoder for Artistic Mesh Generation
Authors: Jiaxiang Tang, Max Li, Zekun Hao, Xian Liu, Gang Zeng, Ming-Yu Liu, Qinsheng Zhang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the superior quality, diversity, and generalization capabilities of our model in both point cloud and image-conditioned mesh generation tasks. Section 4 is dedicated to "Experiments", detailing qualitative and quantitative results, user studies, and ablation studies. |
| Researcher Affiliation | Collaboration | 1State Key Laboratory of General Artificial Intelligence, Peking University. 2NVIDIA Research. |
| Pseudocode | Yes | A.1.1 MESH TOKENIZER Algorithm 1: Tokenization |
| Open Source Code | No | The paper includes a project page link (https://research.nvidia.com/labs/dir/edgerunner/) but does not explicitly state that the code for the methodology described in this paper is open-sourced or provide a direct repository link for their implementation. It does mention sharing of 'mesh visualization code' by others, but not their own source code. |
| Open Datasets | Yes | For the Ar AE model, we use meshes from the Objaverse and Objaverse-XL datasets (Deitke et al., 2023b;a). For the image-conditioned diffusion model, we use meshes from the Objaverse dataset along with rendered images from the G-Objaverse dataset (Qiu et al., 2024). |
| Dataset Splits | No | The paper mentions that the training set for Ar AE comprises approximately 112K meshes and for finetuning, 44K higher-quality meshes from Objaverse. For the image-conditioned diffusion model, the dataset comprises approximately 75K 3D meshes. However, it does not provide specific training, validation, and test splits (e.g., percentages or exact counts) for their experimental evaluations, nor does it refer to standard predefined splits used in their experiments. |
| Hardware Specification | Yes | We train the Ar AE model on 64 A100 (80GB) GPUs for approximately one week. The batch size is 4 per GPU, leading to an effective batch size of 256... We train the Di T model on 16 A100 (40GB) GPUs for approximately one week. |
| Software Dependencies | No | The paper mentions several components like the Adam W optimizer (Loshchilov & Hutter, 2017), flash-attention (Dao, 2024), CLIP (Ilharco et al., 2021), DDPM framework (Ho et al., 2020), and DDIM scheduler (Song et al., 2020). However, it does not provide specific version numbers for any of these software libraries or frameworks (e.g., PyTorch 1.x, CUDA 11.x). |
| Experiment Setup | Yes | We train the Ar AE model on 64 A100 (80GB) GPUs for approximately one week. The batch size is 4 per GPU, leading to an effective batch size of 256. We use the Adam W optimizer (Loshchilov & Hutter, 2017) with a cosine-decayed learning rate that ranges from 5e-5 to 5e-6, a weight decay of 0.1, and betas of (0.9, 0.95). Gradient clipping is applied with a maximum norm of 1.0... We train the Di T model on 16 A100 (40GB) GPUs for approximately one week... The batch size is set to 32 per GPU, resulting in an effective batch size of 512. We use the same optimizer as for the Ar AE model. Training employs the min-SNR strategy (Hang et al., 2023) with a weight of 5.0. The DDPM is configured with 1,000 timesteps and utilizes a scaled linear beta schedule. For classifier-free guidance during inference, 10% of the image conditions are randomly set to zero for unconditional training. During inference, we use the DDIM scheduler (Song et al., 2020) with 100 denoising steps and a Classifier-Free Guidance (CFG) scale of 7.5. |