ITSC 2025 Paper Abstract

Paper VP-VP.63

Sun, Han (Nanjing University of Science and Technology), Song, Zhenbo (Nanjing University of Science and Technology), Lin, Xiao (Nanjing University of Science and Technology), Lu, Jianfeng (Nanjing University of Science & Technology)

MonoSC: Enhancing Monocular 3D Object Detection by 2D Segmentation and Completion

Scheduled for presentation during the Video Session "On-Demand Video Presentations" (VP-VP), Saturday, November 22, 2025, 08:00−18:00, On-Demand Platform

2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC), November 18-21, 2025, Gold Coast, Australia

This information is tentative and subject to change. Compiled on April 2, 2026

Keywords Real-time Object Detection and Tracking for Dynamic Traffic Environments, Deep Learning for Scene Understanding and Semantic Segmentation in Autonomous Vehicles

Abstract

Monocular 3D object detection is a fundamental issue in autonomous driving, yet faces challenges due to the inherent depth ambiguity in monocular imagery, especially in scenarios involving occlusion and distant object detection. While current 2D object detectors have achieved remarkable performance in providing accurate bounding boxes and class labels, these features remain underexploited in the 3D detection task. To address this issue, we propose a novel feature augmentation framework called MonoSC, which systematically harnesses 2D detection priors through advanced vision models. Our methodology innovatively designs three core modules: (1) a SAM-driven instance segmentation module for precise object boundary delineation, (2) a generative adversarial completion network for reconstructing occluded/corrupted object regions, and (3) an object-of-interest detection module optimized for enhanced feature representation. These three modules are progressively used for the full pipeline. The innovation lies in three key aspects: pioneering application of Segment Anything Model (SAM) in monocular 3D detection, development of a car instance dataset named KITTI-Seg-Car, and a unique feature refinement pipeline that synergistically combines instance segmentation with generative completion. Based on such innovation, MonoSC can significantly enhance the robustness and accuracy of monocular 3D detection in challenging scenarios. Extensive experiments on the KITTI dataset demonstrate that our method can effectively improve the detection performance and achieve state-of-the-art results on certain metrics. Our code and dataset will be made publicly available soon.