Paper ThAT5.3
Bauer, Adrian (University of Wuppertal), Krabbe, Jan-Christoph (University of Wuppertal), Kummert, Anton (University of Wuppertal)
ChangeSAM: Adapting the Segment Anything Model to Street Scene Image Change Detection
Scheduled for presentation during the Regular Session "Sensing, Vision, and Perception III" (ThAT5), Thursday, September 26, 2024,
11:10−11:30, Salon 13
2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC), September 24- 27, 2024, Edmonton, Canada
This information is tentative and subject to change. Compiled on October 8, 2024
|
|
Keywords Sensing, Vision, and Perception, Automated Vehicle Operation, Motion Planning, Navigation, Data Management and Geographic Information Systems
Abstract
In the dynamic field of machine learning, foundation models have recently gained prominence, particularly for their application in natural language processing and computer vision. The foundational Segment Anything Model (SAM), known for its interactive image segmentation via prompts, serves as the basis for this study. We introduce ChangeSAM, a tailored adaptation of SAM for street scene image change detection (CD). ChangeSAM utilizes the versatility and vast knowledge of SAM, adapting it to effectively identify semantic changes in image pairs. Two architectural adaptations are introduced – Pre Decoder Fusion (PreDF) and Post Decoder Fusion (PostDF) – enabling ChangeSAM to process dual images. Enhancements through Prompt Tuning and Low-Rank Adaptation (LoRA) are integrated, achieving a balance between reusability, computational efficiency, and accuracy. Our evaluation on the VL-CMU-CD dataset shows that with minimal parameter adjustments, ChangeSAM achieves accuracy on par with fully fine-tuned models. This work contributes to the ongoing development of foundation models in practical applications, illustrating the viability and potential of adaptable, efficient models in scenarios with limited computational resources, such as intelligent vehicles.
|
|