reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multi-View Pedestrian Occupancy Prediction with a Novel Synthetic Dataset

Authors: Sithu Aung, Min-Cheol Sagong, Junghyun Cho

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through in-depth analysis, we identify and evaluate the key elements of our proposed model, highlighting their specific contributions and importance. Our experiments cover conventional evaluations on the same scene and also address the challenging task of synthetic-to-real transfer with the Wild Track dataset (Chavdarova et al. 2018), for which we have generated the ground-truth segmentation data. Through these extensive analyses, we conduct an in-depth examination of the proposed model, dissecting the individual contributions of each component. The results underscore the superiority of our approach over previous multi-view detection methods, particularly highlighting its prowess in synthetic-to-real evaluation, where existing methods falter in transferring knowledge when confronted with disparate scenes.
Researcher Affiliation	Academia	Sithu Aung1, Min-Cheol Sagong1, Junghyun Cho1,2,3 1Korea Institute of Science and Technology 2AI-Robotics, KIST School, University of Science and Technology 3 Yonsei-KIST Convergence Research Institute, Yonsei University EMAIL
Pseudocode	No	The paper describes methods and mathematical formulations, such as those for the view transformer and pedestrian instance grouping, but does not present any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions a "Project page https://sithu31296.github.io/mvpocc", which is a project demonstration page and not explicitly stated as a direct code repository for the methodology described in the paper.
Open Datasets	Yes	Hence, we propose a novel synthetic Multi-View Pedestrian Occupancy dataset, MVP-Occ, comprising five large-scale scenes, designed to mimic real-world environments. In our dataset, the entire scene is represented by voxels, and each voxel is annotated with one of five classes, indicating whether it belongs to a pedestrian, the background environment, or is empty. Furthermore, we present MVP-Occ, our new synthetic dataset, and describes how we added new labels to the existing realworld Wild Track dataset (Chavdarova et al. 2018) for evaluation purposes. Wild Track (Chavdarova et al. 2018) is a real-world dataset captured using seven cameras with significant overlapped fields of view.
Dataset Splits	Yes	Each scene in our dataset is divided into training and testing sets, allocating 80% of the initial frames to the training split and reserving the remaining 20% for testing purposes, while Wild Track uses a 90/10 split.
Hardware Specification	Yes	The experiments were performed using four NVIDIA A100 GPUs, with a batch size of 4.
Software Dependencies	No	The paper mentions "Res Net-18 is used as the backbone network" and "CARLA simulator (Dosovitskiy et al. 2017)" for dataset generation, but it does not specify software dependencies with version numbers for the model's implementation, such as programming languages, libraries, or frameworks.
Experiment Setup	Yes	τ is set to 0.5 to filter out low-confidence detections in 2D occupancy prediction. The optimization process employs the Adam W optimizer with an initial learning rate of 1 10 3 and a decay rate of 1 10 2. Training is conducted over 5 epochs, with a cosine learning rate scheduler dynamically adjusting the learning rate. The experiments were performed using four NVIDIA A100 GPUs, with a batch size of 4. L3D = λwce Lwce + λlovasz Llovasz + λaffinity Laffinity, (7) where λwce = 0.4, λlovasz = 0.3, and λaffinity = 0.3 are hyperparameters to balance the loss components. L = (1 λ) L3D + λ L2D, (8) where λ = 0.3 is a weighting coefficient and prioritizes L3D due to its higher complexity and difficulty in optimization.