Multi-View Pedestrian Occupancy Prediction with a Novel Synthetic Dataset
Authors: Sithu Aung, Min-Cheol Sagong, Junghyun Cho
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through in-depth analysis, we identify and evaluate the key elements of our proposed model, highlighting their specific contributions and importance. Our experiments cover conventional evaluations on the same scene and also address the challenging task of synthetic-to-real transfer with the Wild Track dataset (Chavdarova et al. 2018), for which we have generated the ground-truth segmentation data. Through these extensive analyses, we conduct an in-depth examination of the proposed model, dissecting the individual contributions of each component. The results underscore the superiority of our approach over previous multi-view detection methods, particularly highlighting its prowess in synthetic-to-real evaluation, where existing methods falter in transferring knowledge when confronted with disparate scenes. |
| Researcher Affiliation | Academia | Sithu Aung1, Min-Cheol Sagong1, Junghyun Cho1,2,3 1Korea Institute of Science and Technology 2AI-Robotics, KIST School, University of Science and Technology 3 Yonsei-KIST Convergence Research Institute, Yonsei University EMAIL |
| Pseudocode | No | The paper describes methods and mathematical formulations, such as those for the view transformer and pedestrian instance grouping, but does not present any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions a "Project page https://sithu31296.github.io/mvpocc", which is a project demonstration page and not explicitly stated as a direct code repository for the methodology described in the paper. |
| Open Datasets | Yes | Hence, we propose a novel synthetic Multi-View Pedestrian Occupancy dataset, MVP-Occ, comprising five large-scale scenes, designed to mimic real-world environments. In our dataset, the entire scene is represented by voxels, and each voxel is annotated with one of five classes, indicating whether it belongs to a pedestrian, the background environment, or is empty. Furthermore, we present MVP-Occ, our new synthetic dataset, and describes how we added new labels to the existing realworld Wild Track dataset (Chavdarova et al. 2018) for evaluation purposes. Wild Track (Chavdarova et al. 2018) is a real-world dataset captured using seven cameras with significant overlapped fields of view. |
| Dataset Splits | Yes | Each scene in our dataset is divided into training and testing sets, allocating 80% of the initial frames to the training split and reserving the remaining 20% for testing purposes, while Wild Track uses a 90/10 split. |
| Hardware Specification | Yes | The experiments were performed using four NVIDIA A100 GPUs, with a batch size of 4. |
| Software Dependencies | No | The paper mentions "Res Net-18 is used as the backbone network" and "CARLA simulator (Dosovitskiy et al. 2017)" for dataset generation, but it does not specify software dependencies with version numbers for the model's implementation, such as programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | τ is set to 0.5 to filter out low-confidence detections in 2D occupancy prediction. The optimization process employs the Adam W optimizer with an initial learning rate of 1 10 3 and a decay rate of 1 10 2. Training is conducted over 5 epochs, with a cosine learning rate scheduler dynamically adjusting the learning rate. The experiments were performed using four NVIDIA A100 GPUs, with a batch size of 4. L3D = λwce Lwce + λlovasz Llovasz + λaffinity Laffinity, (7) where λwce = 0.4, λlovasz = 0.3, and λaffinity = 0.3 are hyperparameters to balance the loss components. L = (1 λ) L3D + λ L2D, (8) where λ = 0.3 is a weighting coefficient and prioritizes L3D due to its higher complexity and difficulty in optimization. |