ITSC 2024 Paper Abstract

Paper ThBT13.6

Le Roux, François (Sirris), Cabral, Henrique (Sirris), Yarroudh, Anass (University of Liège), Nlemba, Laurent (GIM Wallonie), Campling, Matthias (KU Leuven), Tsiporkova, Elena (EluciDATA Lab of Sirris)

Object Localization and Tracking Pipeline for the Realistic Rendering of Railway Environments

Scheduled for presentation during the Poster Session "Railway systems and applications" (ThBT13), Thursday, September 26, 2024, 14:30−16:30, Foyer

2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC), September 24- 27, 2024, Edmonton, Canada

This information is tentative and subject to change. Compiled on September 12, 2025

Keywords Simulation and Modeling, Off-line and Online Data Processing Techniques

Abstract

There is a growing demand for the virtualization of real-world scenes to become more realistic, driven by improvements in data acquisition systems and availability of analytical tools and computational power to treat that data. Despite the growing number of studies which tackle specific steps in this process, there is still a lack of solutions which leverage recent advances in computer vision techniques to facilitate the digital reconstruction of a real-world environment. Such an approach has obvious advantages, e.g., being able of operating on simple video data and not requiring an expensive data acquisition system, such as LIDAR cameras. However, it comes with several challenges, including the need to individualize and categorize objects in an image, estimate their position with respect to the camera and also deal with artifacts. To tackle these challenges, we conceived an innovative workflow for extracting and positioning in real-world coordinates relevant elements in a video scene, in our case of video captured from the front of a moving train. Starting from a semantic segmentation task, our method then leverages the object segmentation model SegmentAnything to assign a unique identifier to each instance of each class, which are subsequently tracked along the video sequence using a video segmentation algorithm. Each unique object instance can be positioned in real-world coordinates by integrating the output with its depth calibrated to metric units by leveraging scene reconstruction from structure from motion (SfM). The full approach is validated by comparing the estimated positions of buildings and traffic signs with their real positions based on an open-source database, achieving in both a sub-meter accuracy. This novel approach provides a comprehensive framework for the positioning of any object from a sequence of video frames and can be applied to a wide-range of domains beyond the one tackled here.