Memory Efficient Matting with Adaptive Token Routing
Authors: Yiheng Lin, Yihan Hu, Chenyi Zhang, Ting Liu, Xiaochao Qu, Luoqi Liu, Yao Zhao, Yunchao Wei
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that MEMatte outperforms existing methods on both high-resolution and real-world datasets, significantly reducing memory usage by approximately 88% and latency by 50% on the Composition-1K benchmark. Quantitative results on Synthetic Dataset. The quantitative results on the synthetic dataset are shown in Table 2. Ablation Study Comparisons of Different Token Compression Methods. |
| Researcher Affiliation | Collaboration | 1Institute of Information Science, Beijing Jiaotong University 2Visual Intelligence + X International Joint Laboratory of the Ministry of Education 3Pengcheng Laboratory, Shenzhen, China 4MT Lab, Meitu Inc EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology using mathematical formulations and descriptive text, but no explicit pseudocode or algorithm blocks are present. |
| Open Source Code | Yes | Code https://github.com/linyiheng123/MEMatte |
| Open Datasets | Yes | As illustrated in Table 1, we select several widely used image matting datasets for comparison, including DIM (Xu et al. 2017), Distinctions-646 (Qiao et al. 2020), AIM-500 (Li, Zhang, and Tao 2021), PPM-100 (Ke et al. 2022), and Transparent-460 (Cai et al. 2022). DIM and Distinction-646 composite foreground images with background images from the COCO (Lin et al. 2014) and VOC2012 (Everingham et al. 2010) datasets |
| Dataset Splits | Yes | We divide these objects into 355 for training and 40 for testing, producing 35,500 training images and 1,000 test images following the rules in DIM. Following other matting methods, we use Composition-1K to represent the DIM test set. |
| Hardware Specification | Yes | All experiments are performed on the RTX 3090. MEMatte can process over 4K resolution images on commonly used consumer-level GPUs, such as the RTX 1060, and 8K resolution images on the RTX 3090 GPU. |
| Software Dependencies | No | The paper does not explicitly provide specific software dependencies with version numbers, such as programming languages, libraries, or solvers. |
| Experiment Setup | Yes | To reduce the number of tokens routed to the global attention branch, we introduce a target compression degree ρ [0, 1] to constrain the value of γ. Here, ρ is a predefined hyperparameter. Consequently, we select 0.25 as the default setting during training for a better trade-off between performance and efficiency. Ltotal = Lmatting + Ldistill + Lcompress. |