Scaling Diffusion Mamba with Bidirectional SSMs for Efficient 3D Shape Generation

Authors: Shentong Mo

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results on the Shape Net benchmark demonstrate that Di M-3D achieves state-of-the-art performance in generating high-fidelity and diverse 3D shapes. Additionally, Di M3D shows superior capabilities in tasks like 3D point cloud completion.
Researcher Affiliation Academia Shentong Mo Carnegie Mellon University
Pseudocode No The paper describes the architecture and method in detail, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper mentions a third-party package 'thop' and its GitHub link for Gflops calculation, but does not provide any specific link or statement regarding the open-source code for the Di M-3D methodology itself.
Open Datasets Yes We employ the Shape Net dataset, focusing on the Chair, Airplane, and Car categories, consistent with benchmarks used in previous works. Each 3D shape in our experiments is represented by 2,048 points sampled from a set of 5,000 points, following the preprocessing protocol established in Point Flow (Yang et al. 2019). This normalization is applied globally across the entire dataset to maintain consistency. ... The training was performed on the large-scale Shape Net-55 (Chang et al. 2015) dataset, which comprises 55 diverse classes encompassing vehicles, furniture, and daily necessities.
Dataset Splits No The paper mentions using the Shape Net and Shape Net-55 datasets and describes preprocessing steps like sampling points, but it does not provide specific details on how these datasets were split into training, validation, or test sets for reproduction.
Hardware Specification No The paper discusses Giga Floating Point Operations (Gflops) as a metric for computational efficiency and presents Table 3 comparing Gflops for different voxel sizes, but it does not specify the actual hardware (e.g., GPU models, CPU types) used to run the experiments.
Software Dependencies No Di M-3D is implemented in the Py Torch (Paszke et al. 2019) framework. This lists a software framework but does not include a specific version number for PyTorch or any other key libraries.
Experiment Setup Yes We use a voxel input size of 32 32 32 3 and train our models for 10,000 epochs using the Adam optimizer with a learning rate of 1e 4. The training employs a batch size of 128, and we set the total number of diffusion steps, T, to 1,000.