Transtreaming: Adaptive Delay-aware Transformer for Real-time Streaming Perception

Authors: Xiang Zhang, Yufei Cui, Chenchen Fu, Zihao Wang, Yuyang Sun, Xue Liu, Weiwei Wu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The proposed model outperforms existing state-of-the-art methods, even in single-frame detection scenarios, by leveraging a transformer-based methodology. It demonstrates robust performance across a range of devices, from powerful V100 to modest 2080Ti, achieving the highest level of perceptual accuracy on all platforms. The experimental results emphasize the system s adaptability and its potential to significantly improve the safety and reliability of many real-world systems, such as autonomous driving. In our experiment, we trained and tested our method on Argoverse-HD... We verify the effectiveness of four proposed components: RTPE, TAT, Planner, and Buffer. For RTPE, we remove it from our model and observe a 0.4% decrease in the test s AP. The results of the ablation study are listed in Table 4.
Researcher Affiliation Academia 1School of Computer Science and Engineering, Southeast University 2School of Computer Science, Mc Gill University EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the methodology and algorithms in textual form and through diagrams (e.g., Figure 3, Figure 4) but does not include explicit pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/Dec Angel/Transtreaming
Open Datasets Yes Dataset: In our experiment, we trained and tested our method on Argoverse-HD, a common urban driving dataset composed of the front camera video sequence and bounding-box annotations for common road objects (e.g. cars, pedestrians, traffic lights).
Dataset Splits Yes We follow the train and validation split as in (He et al. 2023).
Hardware Specification Yes The proposed model demonstrates robust performance across a spectrum of devices, ranging from the highperformance V100 to the modest 2080Ti. As shown in Figure 1, ... on a server equipped with NVIDIA Ge Force RTX 4080. ... using a single Nvidia RTX4080 GPU ... Table 5: The hardware specifications of four devices used in the experiment. Device Name CPU GPU Memory 2080Ti server Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz 48 NVIDIA Ge Force RTX 2080 Ti 4 257547MB 3090 server Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz 36 NVIDIA Ge Force RTX 3090 2 128527MB 4080 server Intel(R) Core(TM) i9-10900X CPU @ 3.70GHz 20 NVIDIA Ge Force RTX 4080 2 257420MB v100 cluster Intel(R) Xeon(R) Gold 6226 CPU @ 2.70GHz 24 Tesla V100-SXM2-32GB 8 256235MB
Software Dependencies No The paper mentions using specific models and methodologies that imply underlying software frameworks (e.g., Transformer, deep learning models), but it does not specify any particular software versions like Python, PyTorch, or CUDA versions.
Experiment Setup Yes The base backbone of our proposed model is pretrained on the COCO dataset, consistent with the approach of (He et al. 2023). Other parameters are initialized following Lecun weight initialization. The model is then fine-tuned on the Argoverse-HD dataset for 8 epochs using a single Nvidia RTX4080 GPU with a batch size of 4 and half-resolution input (600 960). ...we adopt mixed speed training scheme that samples PP from [ 24, 1] and PF from [1, 16]. The loss for each predicted future object is given equal weights.