Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Learning 3D Perception from Others' Predictions
Authors: Jinsu Yoo, Zhenyang Feng, Tai-Yu Pan, Yihong Sun, Cheng Perng Phoo, Xiangyu Chen, Mark Campbell, Kilian Weinberger, Bharath Hariharan, Wei-Lun Chao
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments including several scenarios (e.g., different sensors, detectors, and domains) demonstrate the effectiveness of our approach toward label-efficient learning of 3D perception from other units predictions. We validate our approach on the recently released real-world collaborative driving dataset, using reference cars predictions as pseudo labels for the ego car. |
| Researcher Affiliation | Academia | Jinsu Yoo1 Zhenyang Feng1 Tai-Yu Pan1 Yihong Sun2 Cheng Perng Phoo2 Xiangyu Chen2 Mark Campbell2 Kilian Q Weinberger2 Bharath Hariharan2 Wei-Lun Chao1 1The Ohio State University 2Cornell University |
| Pseudocode | No | The paper describes the overall pipeline in Section 3.5 using numbered steps but does not present it in a formally structured pseudocode or algorithm block. |
| Open Source Code | No | We plan to make our implementation publicly available to promote reproducibility. |
| Open Datasets | Yes | We conduct experiments primarily on the V2V4Real dataset (Xu et al., 2023), which consists of 40 clips with a total of 18k frames by driving two cars, Tesla and Honda, together within 100m. (Please see additional results on the OPV2V dataset (Xu et al., 2022c) in the supplementary.) |
| Dataset Splits | Yes | To align with our research purpose, we re-split the original data into three portions: "R pretraining", "R prediction/E training", and "E validation/test" (Fig. 8). Specifically, we split them into two subsets containing 20 clips and use the first subset to pre-train R s detector f R. Then we inference on the second subset to provide pseudo labels YR for training E s detector f E together with E s point clouds. We validate and test the E s performance on the first subset by splitting it into 20% and 80%. Our re-split gives 4,488 frames for R pretraining, 4,463 2 frames for E training, and 870 and 3,618 frames for E validation/test respectively. |
| Hardware Specification | Yes | We train it with 60 epochs and a batch size of 64 on 8 NVIDIA Tesla P100 GPUs. |
| Software Dependencies | No | We conduct experiments with Point Pillars (Lang et al., 2019) as a default detector. We adopt a neural network similar to Point Net (Qi et al., 2017) for the ranker for its simplicity. |
| Experiment Setup | Yes | We train it with 60 epochs and a batch size of 64 on 8 NVIDIA Tesla P100 GPUs. We use Adam optimizer and an initial learning rate of 2e-3 dropped to 2e-5 by cosine annealing decaying strategy (Loshchilov & Hutter, 2017). For curriculum learning, we set TE-R to 40m. Also, we set the ranker threshold to 0.5, and λ for the distance-based threshold to 1 with a fixed confidence threshold Tc of 0.2. |