| |
Last updated on June 24, 2025. This conference program is tentative and subject to change
Technical Program for Monday June 23, 2025
|
MoA1 Regular Session, Plenary Room |
Add to My Program |
Oral 1 |
|
|
Chair: López, Antonio M. | Universitat Autònoma De Barcelona |
Co-Chair: Nashashibi, Fawzi | INRIA |
|
09:15-09:33, Paper MoA1.1 | Add to My Program |
UDA4Inst: Unsupervised Domain Adaptation for Instance Segmentation |
|
Guo, Yachan | Universitat Autònoma De Barcelona |
Xiao, Yi | Computer Vision Center, Universitat Autònoma De Barcelona |
Xue, Danna | Computer Vision Center, Universitat Autònoma Barcelona |
Gomez Zurita, Jose Luis | Computer Vision Center (CVC) |
López, Antonio M. | Universitat Autònoma De Barcelona |
Keywords: Techniques for Dataset Domain Adaptation, Instance and Panoptic Segmentation Techniques, Data Annotation and Labeling Techniques
Abstract: Instance segmentation is crucial for autonomous driving but is hindered by the lack of annotated real-world data due to expensive labeling costs. Unsupervised Domain Adaptation (UDA) offers a solution by transferring knowledge from labeled synthetic data to unlabeled real-world data. While UDA methods for synthetic to real-world domains (synth-to-real) excel in tasks such as semantic segmentation and object detection, its application to instance segmentation for autonomous driving remains underexplored and often relies on suboptimal baselines. We introduce UDA4Inst, a powerful framework for synth-to-real UDA in instance segmentation. Our framework enhances instance segmentation through Semantic Category Training and Bidirectional Mixing Training. Semantic Category Training groups semantically related classes for separate training, improving pseudo-label quality and segmentation accuracy. Bidirectional Mixing Training combines instance-wise and patch-wise data mixing, creating realistic composites that enhance generalization across domains. Extensive experiments show UDA4Inst sets a new state-of-the-art on the SYNTHIA-> Cityscapes benchmark (mAP 31.3) and introduces results on novel datasets, using UrbanSyn and Synscapes as sources and Cityscapes and KITTI360 as targets. Code and models are available at https://github.com/gyc-code/UDA4Inst.
|
|
09:33-09:51, Paper MoA1.2 | Add to My Program |
Adaptive Semantic Segmentation of Traffic Scenes Via Frequency Domain Analysis |
|
Zhang, Tengwen | Xi'an Jiaotong University |
Li, Yaochen | Xi'an Jiaotong University |
Zou, Runlin | Xian Jiaotong University |
Gao, Yuan | Xi'an Jiaotong University |
Qiu, Chao | Xi'an Jiaotong University |
Ni, Hong | Xi'an Jiaotong University |
He, Ziyuan | Xi'an Jiaotong University |
Keywords: Techniques for Dataset Domain Adaptation
Abstract: High-precision semantic segmentation is an important research topic in the communities of computer vision and intelligent transportation. The existing unsupervised domain adaptation methods based on image translation often lead to artifacts and structural distortions. To overcome this problem, a novel adaptive semantic segmentation method of traffic scenes via frequency domain analysis is proposed. Firstly, we leverage the frequency domain space to decouple style and semantic features. The Fast Fourier Transform is applied to achieve structural-preserving style alignment. Subsequently, a content enhancement module is proposed based on the Wavelet transform, which utilizes the original source images to correct and enhance high-frequency structural and semantic details. Furthermore, a convolutional enhancement attention module is proposed, which utilizes depthwise separable convolution to capture more local details. The experimental results based on the GTA5→Cityscapes and SYNTHIA→Cityscapes tasks have respectively attained state-of-the-art mIoU of 76.4 and 67.7, convincingly demonstrating the effectiveness of the methods.
|
|
09:51-10:09, Paper MoA1.3 | Add to My Program |
Map-Free Trajectory Prediction Via Deformable Attention in Bird’s-Eye View Space |
|
Kong, Minsang | Kookmin University |
Kim, Myeong jun | Kookmin University |
Sung, Jinwook | Kookmin University |
Kang, Sang Gu | Kookmin University |
Park, Kyu min | Kookmin University |
Park, Minseo | Kookmin University |
Jeong, Dahun | Kookmin University |
Lee, Sang Hun | Kookmin University |
Keywords: Advanced Multisensory Data Fusion Algorithms, Motion Forecasting, Deep Learning Based Approaches
Abstract: In autonomous driving, trajectory prediction is crucial for safe navigation. While many recent methods rely on pre-built high definition (HD) maps, these are limited to specific regions and cannot reflect real-time changes. We propose a novel framework that constructs bird's-eye view representations from real-time sensor data and selectively extracts critical features using deformable attention, eliminating the need for HD maps. We also introduces a sparse goal candidate proposal module for fully end-to-end prediction without post-processing. Experiments demonstrate that our model achieves competitive performance compared to HD map-based methods.
|
|
10:09-10:27, Paper MoA1.4 | Add to My Program |
TPK: Trustworthy Trajectory Prediction Integrating Prior Knowledge for Interpretability and Kinematic Feasibility |
|
Abouelazm, Ahmed | FZI Research Center for Information Technology |
Baden, Marius | Karlsruhe Institute of Technology |
Hubschneider, Christian | FZI Research Center for Information Technology |
Wu, Yin | Karlsruhe Institute of Technology |
Slieter, Daniel | CARIAD SE |
Zöllner, J. Marius | FZI Research Center for Information Technology; KIT Karlsruhe In |
Keywords: Predictive Trajectory Models and Motion Forecasting, Motion Forecasting, Trust and Acceptance of Autonomous Technologies
Abstract: Trajectory prediction is crucial for autonomous driving, enabling vehicles to navigate safely by anticipating the movements of surrounding road users. However, current deep learning models often lack trustworthiness as their predictions can be physically infeasible and illogical to humans. To make predictions more trustworthy, recent research has incorporated prior knowledge, like the social force model for modeling interactions and kinematic models for physical realism. However, these approaches focus on priors that suit either vehicles or pedestrians and do not generalize to traffic with mixed agent classes. We propose incorporating interaction and kinematic priors of all agent classes--vehicles, pedestrians, and cyclists with class-specific interaction layers to capture agent behavioral differences. To improve the interpretability of the agent interactions, we introduce DG-SFM, a rule-based interaction importance score that guides the interaction layer. To ensure physically feasible predictions, we proposed suitable kinematic models for all agent classes with a novel pedestrian kinematic model. We benchmark our approach on the Argoverse 2 dataset, using the state-of-the-art transformer HPTR as our baseline. Experiments demonstrate that our method improves interaction interpretability, revealing a correlation between incorrect predictions and divergence from our interaction prior. Even though incorporating the kinematic models causes a slight decrease in accuracy, they eliminate infeasible trajectories found in the dataset and the baseline model. Thus, our approach fosters trust in trajectory prediction as its interaction reasoning is interpretable, and its predictions adhere to physics.
|
|
10:27-10:45, Paper MoA1.5 | Add to My Program |
HotShot: A Loss-Guided Data Augmentation and Curriculum Learning Technique for the Task of Semantic Segmentation |
|
Frickenstein, Lukas | BMW AG |
Thoma, Moritz | BMW AG |
Mori, Pierpaolo | BMW AG |
Balamuthu Sampath, Shambhavi | BMW AG |
Fasfous, Nael | BMW AG |
Vemparala, Manoj Rohit | BMW AG |
Frickenstein, Alexander | BMW AG |
Unger, Christian | BMW Group |
Passerone, Claudio | Dipartimento Di Elettronica E Telecomunicazioni Politecnico Di |
Stechele, Walter | Technical University of Munich (TUM) |
Keywords: Data Augmentation Techniques Using Neural Networks, Semantic Segmentation Techniques, Deep Learning Based Approaches
Abstract: Semantic segmentation is an important computer vision task that requires costly pixel-level annotations to train deep neural networks (DNNs) for. Especially for applications like autonomous driving, precise pixel-level understanding of scenes is a decisive factor between success and failure of the application. It follows that every labeled sample of an existing dataset is highly valuable and should be optimally used during training to maximize its value. This is achieved using (1) augmentation of the same labeled sample to help the model learn it in different ways, and (2) curriculum learning to introduce training samples to the model in an strategic order to ease the learning process. In this work, we present HotShot, a loss-guided cropping technique that assesses the DNN’s prediction capability during the training to derive probability scores of potential cropping regions. This effectively combines augmentation and curriculum learning in one technique, where a single sample is cropped (augmentation) in regions selected based on the DNN’s loss throughout the training (curriculum learning). For UperNet using a ConvNeXt-tiny backbone and DeepLabV3+ architecture using a ResNet-50 backbone, applying HotShot provides a +0.41 p.p. and +0.43 p.p. mIoU improvement over randomly cropping regions on the CityScapes and BDD100K datasets respectively. More interestingly, the analysis shows HotShot primarily boosts the classes that are most challenging for the model. For example, the rider and motorcycle classes on the BDD100K dataset improve by 163% and 129% using DeepLabV3+ with a ResNet-50 backbone. HotShot achieves improved mIoU in almost all cases and normalizes imbalances in learning challenging classes in datasets.
|
|
MoBT1 Poster Session, Caravaggio Room |
Add to My Program |
Poster 1.1 >> Planning, Trajectory Prediction & Motion Forecasting |
|
|
Chair: Betz, Johannes | Technical University of Munich |
Co-Chair: Malis, Ezio | INRIA |
|
11:15-12:30, Paper MoBT1.1 | Add to My Program |
Hybrid Machine Learning Model with a Constrained Action Space for Trajectory Prediction |
|
Fertig, Alexander | Technische Hochschule Ingolstadt |
Balasubramanian, Lakshman | MoiiAi |
Botsch, Michael | Technische Hochschule Ingolstadt |
Keywords: End-to-End Neural Network Architectures and Techniques, Predictive Trajectory Models and Motion Forecasting, Safety Verification and Validation Techniques
Abstract: Trajectory prediction is crucial to advance autonomous driving, improving safety, and efficiency. Although end-to-end models based on deep learning have great potential, they often do not consider vehicle dynamic limitations, leading to unrealistic predictions. To address this problem, this work introduces a novel hybrid model that combines deep learning with a kinematic motion model. It is able to predict object attributes such as acceleration and yaw rate and generate trajectories based on them. A key contribution is the incorporation of expert knowledge into the learning objective of the deep learning model. This results in the constraint of the available action space, thus enabling the prediction of physically feasible object attributes and trajectories, thereby increasing safety and robustness. The proposed hybrid model facilitates enhanced interpretability, thereby reinforcing the trustworthiness of deep learning methods and promoting the development of safe planning solutions. Experiments conducted on the publicly available real-world Argoverse dataset demonstrate realistic driving behaviour, with benchmark comparisons and ablation studies showing promising results.
|
|
11:15-12:30, Paper MoBT1.2 | Add to My Program |
PPP: Planning with Path-Informed Prediction for Autonomous Driving |
|
Xi, Ning | Wuhan University of Technology |
Chu, Duanfeng | Wuhan University of Technology |
Deng, Zejian | University of Waterloo |
Cao, Yongxing | Wuhan University |
Feng, Feng | Wuhan University of Technology |
Huang, Yanjun | Tongji University |
Wang, Jinxiang | Southeast University |
Keywords: Motion Planning Algorithms for Autonomous Vehicles, Predictive Trajectory Models and Motion Forecasting, Deep Learning Based Approaches
Abstract: With the rapid advancement of end-to-end autonomous driving, the integration of prediction and planning has increasingly become a research focus in the field of autonomous driving. However, most existing methods do not adequately consider the robustness of driving trajectories during the trajectory generation, making them less effective in handling complex driving scenarios. To address this issue, this paper introduces Planning with Path-Informed Prediction for Autonomous Driving (PPP), which constructs a prediction-decision module that fuses multi-dimensional information by integrating the ego vehicle's potential multimodal future paths with environmental features. Moreover, we introduce a multi-stage trajectory evaluation mechanism during the trajectory generation process, which significantly enhances the system's performance in dynamic environments, thereby achieving improvements in both accuracy and robustness in complex driving scenarios.Through experiments on the nuPlan dataset, our method demonstrates exceptional competitiveness in closed-loop tests. Notably, in complex scenario tests, PPP outperforms learning-based and hybrid methods. Code will be available under https://github.com/Keria0812/PPP.
|
|
11:15-12:30, Paper MoBT1.3 | Add to My Program |
Dynamic Intent Queries for Motion Transformer-Based Trajectory Prediction |
|
Demmler, Tobias | Robert Bosch GmbH |
Hartung, Lennart | Robert Bosch GmbH |
Tamke, Andreas | Bosch |
Dang, Thao | University of Applied Sciences, Esslingen |
Hegai, Alexander | Robert Bosch GmbH |
Haug, Karsten | Robert Bosch GmbH |
Mikelsons, Lars | Augsburg University |
Keywords: Motion Forecasting, Predictive Trajectory Models and Motion Forecasting
Abstract: In autonomous driving, accurately predicting the movements of other traffic participants is crucial, as it significantly influences a vehicle's planning processes. Modern trajectory prediction models strive to interpret complex patterns and dependencies from agent and map data. The Motion Transformer (MTR) architecture and subsequent work define the most accurate methods in common benchmarks such as the Waymo Open Motion Benchmark. The MTR model employs pre-generated static intention points as initial goal points for trajectory prediction. However, the static nature of these points frequently leads to misalignment with map data in specific traffic scenarios, resulting in unfeasible or unrealistic goal points. Our research addresses this limitation by integrating scene-specific dynamic intention points into the MTR model. This adaptation of the MTR model was trained and evaluated on the Waymo Open Motion Dataset. Our findings demonstrate that incorporating dynamic intention points has a significant positive impact on trajectory prediction accuracy, especially for predictions over long time horizons. Furthermore, we analyze the impact on ground truth trajectories which are not compliant with the map data or are illegal maneuvers.
|
|
11:15-12:30, Paper MoBT1.4 | Add to My Program |
Negotiating Cooperative Ordering Problems with Bimodal Planning |
|
Wenzel, Raphael | HRI Europe GmbH; TU Darmstadt |
Probst, Malte | Honda Research Institute Europe |
Puphal, Tim | Honda Research Institute Europe GmbH |
Amann, Markus | Honda Research Institute Europe GmbH |
Eggert, Julian | Honda Research Institute Europe GmbH |
Keywords: Motion Planning Algorithms for Autonomous Vehicles, Level 4-5 Autonomous Driving Systems Architecture, Multi-Objective Planning Approaches
Abstract: In Automated Driving (AD), traffic scenarios where two agents must resolve an ordering without knowing each other's intention are critical for expanding the operational design domain of automated vehicles to urban environments. These scenarios require negotiation to determine who passes first through an interaction zone. We present a novel agreement measure and negotiation approach to resolve these ordering problems across a wide range of common scenarios. Our method emphasizes detecting and deciding when to switch between potential negotiation outcomes. Our approach extends existing behavior planners to cope with bimodal cooperative interactions, where two potentially desirable outcomes need to be considered. We evaluate our approach by providing both an illustrative scenario and extensive statistical experiments across various geometries, including oncoming narrow passages, crossing and merging scenarios. The results demonstrate that our system considerably improves the behavior in cooperative ordering scenarios compared to the baseline. Furthermore, it is also robust in the sense that it effectively handles dynamic situations where the other agent's intentions changes during the negotiation process.
|
|
11:15-12:30, Paper MoBT1.5 | Add to My Program |
Efficient Data Representation for Motion Forecasting: A Scene-Specific Trajectory Set Approach |
|
Vivekanandan, Abhishek | FZI Research Center for Information Technology; KIT Karlsruhe In |
Zöllner, J. Marius | FZI Research Center for Information Technology; KIT Karlsruhe In |
Keywords: Profile Extraction and Discovery from Datasets, Techniques for Dataset Domain Adaptation, Integration Methods for HD Maps and Onboard Sensors
Abstract: Representing diverse and plausible future trajectories is critical for motion forecasting in autonomous driving. However, efficiently capturing these trajectories in a compact set remains challenging. This study introduces a novel approach for generating scene-specific trajectory sets tailored to different contexts, such as intersections and straight roads, by leveraging map information and actor dynamics. A deterministic goal sampling algorithm identifies relevant map regions, while our Recursive In-Distribution Subsampling (RIDS) method enhances trajectory plausibility by condensing redundant representations. Experiments on the Argoverse 2 dataset demonstrate that our method achieves up to a 45% improvement in Driving Area Compliance (DAC) compared to baseline methods while maintaining competitive displacement errors. Our work highlights the benefits of mining such scene-aware trajectory sets and how they could capture the complex and heterogeneous nature of actor behavior in real-world driving scenarios.
|
|
11:15-12:30, Paper MoBT1.6 | Add to My Program |
Reliability Comparison of Vessel Trajectory Prediction Models Via Probability of Detection |
|
Rastin, Zahra | Chair of Dynamics and Control, University of Duisburg-Essen, Dui |
Donandt, Kathrin | University of Duisburg-Essen |
Soeffker, Dirk | University of Duisburg-Essen |
Keywords: Predictive Trajectory Models and Motion Forecasting, Safety Verification and Validation Techniques, Deep Learning Based Approaches
Abstract: This contribution addresses vessel trajectory predic- tion (VTP), focusing on the evaluation of different deep learning-based approaches. The objective is to assess model performance in diverse traffic complexities and compare the reliability of the approaches. While previous VTP models overlook the specific traffic situation complexity and lack reliability assessments, this research uses a probability of detection analysis to quantify model reliability in varying traffic scenarios, thus going beyond common error distribution analyses. All models are evaluated on test samples categorized according to their traffic situation during the prediction horizon, with performance metrics and reliability estimates obtained for each category. The results of this comprehensive evaluation provide a deeper understanding of the strengths and weaknesses of the different prediction approaches, along with their reliability in terms of the prediction horizon lengths for which safe forecasts can be guaranteed. These findings can inform the development of more reliable vessel trajectory prediction approaches, enhancing safety and efficiency in future inland waterways navigation.
|
|
11:15-12:30, Paper MoBT1.7 | Add to My Program |
Toward Unified Practices in Trajectory Prediction Research on Bird's-Eye-View Datasets |
|
Westny, Theodor | Linköping University |
Olofsson, Björn | Linköping University |
Frisk, Erik | Linköping University |
Keywords: Motion Forecasting, Predictive Trajectory Models and Motion Forecasting, UAV Datasets
Abstract: The availability of high-quality datasets is crucial for developing behavior prediction algorithms in autonomous vehicles. This paper highlights the need to standardize the use of certain datasets for motion forecasting research to simplify comparative analysis and proposes a set of tools and practices to achieve this. Drawing on extensive experience and a comprehensive review of current literature, we summarize our proposals for preprocessing, visualization, and evaluation in the form of an open-sourced toolbox designed for researchers working on trajectory prediction problems. The clear specification of necessary preprocessing steps and evaluation metrics is intended to alleviate development efforts and facilitate the comparison of results across different studies. The toolbox is available at: https://github.com/westny/dronalize.
|
|
11:15-12:30, Paper MoBT1.8 | Add to My Program |
A Generalized Waypoint Loss for End-To-End Autonomous Driving (I) |
|
Stelzer, Malte | Technische Universität Braunschweig |
Bartels, Timo | Technische Universität Braunschweig |
Bickerdt, Jan | Volkswagen AG |
Schomerus, Volker Patricio | Volkswagen AG |
Piewek, Jan | Volkswagen AG |
Bagdonat, Thorsten | Volkswagen AG |
Fingscheidt, Tim | Technische Universität Braunschweig |
Keywords: End-to-End Neural Network Architectures and Techniques, Deep Learning Based Approaches, Level 4-5 Autonomous Driving Systems Architecture
Abstract: Many approaches in autonomous driving generate future waypoints to form trajectories, which are then used to derive driving commands. During imitation learning for end-to-end autonomous driving, these trajectories are typically learned using a straightforward L1 loss, which compares the model's predictions to those of an expert. In this paper, we propose a separation of longitudinal and lateral components of the L1 loss that weighs these independently, thereby aligning more with the separate handling of longitudinal and lateral control by PID controllers in the model pipeline. We employ this novel generalized waypoint loss with the TransFuser architecture in the CARLA simulator and show that we can control and improve on certain infraction types, without a performance loss in any other metric. Additionally, we investigate a novel ensemble technique that produces a more cautious ensemble, reducing infractions while maintaining overall performance. For future work, our novel loss formulation enables the definition of a time-variant loss tailored to specific traffic scenarios in the training data.
|
|
11:15-12:30, Paper MoBT1.9 | Add to My Program |
Time-Efficient Dynamic Urban Global Planner (I) |
|
Arquero, Juan | Universidad Politécnica De Madrid |
Naranjo, Jose | Universidad Politecnica De Madrid |
Eduardo, Molinos | Karlsruher Institut Für Technologie |
Milanés, Vicente | Renault |
Valle, Alfredo | Universidad Politécnica De Madrid |
Jiménez, Felipe | Universidad Politécnica De Madrid |
Keywords: Motion Planning Algorithms for Autonomous Vehicles, Level 3 Driving Systems Architecture and Techniques, User Experience in Autonomous Vehicles
Abstract: This paper presents a global route planning algorithm designed as part of the navigation module embedded in an Autonomous Driving System (ADS). Unlike trajectory planners, which focus on local maneuvering and vehicle control, this algorithm determines optimal routes at a higher level, prioritizing dynamic adaptation to traffic conditions and regulatory elements. The planner aims to minimize travel time rather than merely reducing the total distance traveled, making it particularly effective in urban environments where traffic signals, vehicle interactions, and road regulations significantly impact journey duration. To achieve this, the algorithm dynamically adjusts to real-time variations in traffic flow and control measures. Additionally, it integrates risk-aware routing by imposing penalties on roads with higher pedestrian interaction, enhancing safety and increasing public acceptance of ADS technology. Designed for efficiency and scalability, the algorithm is lightweight enough to run on microcontroller-based embedded systems, ensuring feasibility for real-world deployment in constrained computing environments. The algorithm was tested using a Renault mass-production car, demonstrating its applicability in real-world driving scenarios.
|
|
11:15-12:30, Paper MoBT1.10 | Add to My Program |
GripMap: An Efficient, Spatially Resolved Constraint Framework for Offline and Online Trajectory Planning in Autonomous Racing |
|
Werner, Frederik | Technische Universität München |
Schwehn, Ann-Kathrin | Technical University of Munich |
Lienkamp, Markus | Technische Universität München |
Betz, Johannes | Technical University of Munich |
Keywords: Motion Planning Algorithms for Autonomous Vehicles, Adaptive Vehicle Control Techniques, Multi-Objective Planning Approaches
Abstract: Conventional trajectory planning approaches for autonomous vehicles often assume a fixed vehicle model that remains constant regardless of the vehicle's location. This overlooks the critical fact that the tires and the surface are the two force-transmitting partners in vehicle dynamics; while the tires stay with the vehicle, surface conditions vary with location. Recognizing these challenges, this paper presents a novel framework for spatially resolving dynamic constraints in both offline and online planning algorithms applied to autonomous racing. We introduce the GripMap concept, which provides a spatial resolution of vehicle dynamic constraints in the Frenet frame, allowing adaptation to locally varying grip conditions. This enables compensation for location-specific effects, more efficient vehicle behavior, and increased safety, unattainable with spatially invariant vehicle models. The focus is on low storage demand and quick access through perfect hashing. This framework proved advantageous in real-world applications in the presented form. Experiments inspired by autonomous racing demonstrate its effectiveness. In future work, this framework can serve as a foundational layer for developing future interpretable learning algorithms that adjust to varying grip conditions in real-time.
|
|
11:15-12:30, Paper MoBT1.11 | Add to My Program |
A Glimpse into the Future: An Inverse Soft Q-Learning’s Soft Actor-Critic Approach for Pedestrian Path Prediction |
|
Dietl, Laura | Technische Hochschule Ingolstadt |
Facchi, Christian | Technische Hochschule Ingolstadt |
Keywords: Predictive Trajectory Models and Motion Forecasting, Reinforcement Learning for Planning, Deep Learning Based Approaches
Abstract: Anticipating the future trajectories of pedestrians is a essential ability in autonomous vehicles to perform proactive actions and thus reduce dangerous encounters. However, predicting human motion is a task that is inherently challenging due to the influence of social and environmental factors, and the multimodality of future predictions based solely on partial history of their trajectory. The underlying scene and the past trajectory of a pedestrian provide useful indicators for predicting their future steps. Unlike other approaches that utilize supervised learning or generative modeling, Inverse Reinforcement Learning enables a model to learn the pedestrian's reward function that encodes their intentions. This work proposes a framework based on Inverse soft Q-Learning's Soft Actor-Critic Version. The framework utilizes the information about the scene and the past trajectory of a pedestrian together with an attention mechanism to learn the pedestrian's behavior policy. Quantitative and qualitative evaluation on existing pedestrian trajectory prediction benchmarks show comparable results to state-of the-art baselines.
|
|
11:15-12:30, Paper MoBT1.12 | Add to My Program |
Multi-Rules Reachability Analysis for Road Agents Using Graph-Based Maps and Real-Time Kinematics |
|
Fossati, Monica | Inria |
Malis, Ezio | INRIA |
Martinet, Philippe | INRIA |
Keywords: Integration Methods for HD Maps and Onboard Sensors, Decision Making
Abstract: Automated vehicles perform well in simple environments with clear rules, but urban traffic presents significant challenges due to the unpredictable behavior of road users, sometimes beyond traffic rules. Achieving full autonomy in such settings requires a systematic approach to modeling the possible actions of all agents. This paper presents a multi-rules reachability analysis framework that integrates graph-based maps with real-time perception data to dynamically characterize the surrounding space. By leveraging the semantic richness and modularity of Lanelet2 maps, our method provides a structured representation that enhances situational awareness. This allows for the extraction of navigation-relevant information, with the goal of supporting safer and more efficient decision-making in complex urban environments.
|
|
11:15-12:30, Paper MoBT1.13 | Add to My Program |
Sampling-Based Motion Planning with Preordered Objectives |
|
Halder, Patrick | ZF Friedrichshafen AG |
Althoff, Matthias | Technische Universität München |
Keywords: Multi-Objective Planning Approaches, Motion Planning Algorithms for Autonomous Vehicles, Decision Making
Abstract: Motion planning for cyber-physical systems requires addressing numerous system objectives and constraints, including satisfying physical limitations, ensuring safety, reaching goal areas, or reducing energy consumption. Typically, it is only possible to achieve some of the objectives simultaneously since they may contradict. The objectives are usually weighted to specify which plans are preferred in such situations, resulting in a cumbersome tuning process. In this work, we use a weight-free prioritization of the objectives through preorders and introduce a novel sampling-based motion planner designed to efficiently generate trajectories optimizing preordered objectives. We ensure that only the smallest number of required objectives is evaluated to reduce computational time. Our approach can holistically define and solve many types of multi-objective optimization problems, and its usefulness is demonstrated for a Mars rover and an autonomous vehicle.
|
|
11:15-12:30, Paper MoBT1.14 | Add to My Program |
Dynamic Objective MPC for Motion Planning of Seamless Docking Maneuvers |
|
Schumann, Oliver | Ulm University |
Buchholz, Michael | Universität Ulm |
Dietmayer, Klaus | University of Ulm |
Keywords: Multi-Objective Planning Approaches, Motion Planning Algorithms for Autonomous Vehicles
Abstract: Automated vehicles and logistics robots used in warehouses and similar environments must often position themselves in narrow environments with high precision in front of a specific target, such as a package or their charging station. Often, these docking scenarios are solved in two steps: path following and rough positioning followed by a high-precision motion planning algorithm. This can generate suboptimal trajectories caused by bad positioning in the first phase and, therefore, prolong the time it takes to reach the goal. In this work, we propose a unified approach, which is based on a Model Predictive Control (MPC) that unifies the advantages of Model Predictive Contouring Control (MPCC) with a Cartesian MPC to reach a specific goal pose. This paper's main contributions are the handling of very narrow scenarios, the adaption of the dynamic weight allocation method to reach path ends and goal poses inside driving corridors, and the development of the so-called dynamic objective MPC. The latter is an improvement of the dynamic weight allocation method, which can inherently switch state-dependent from an MPCC to a Cartesian MPC to solve the path-following problem and the high-precision positioning tasks independently of the location of the goal pose seamlessly with one algorithm. This leads to foresighted, feasible, and safe motion plans, which can decrease the mission time and result in smoother trajectories.
|
|
11:15-12:30, Paper MoBT1.15 | Add to My Program |
Safety Trajectory Planning for Autonomous Vehicles in Unstructured Narrow Environments: A Perception Error Compatible Approach |
|
Li, Zhaopeng | Beijing Institute of Technology |
Guo, Zijun | Beijing Institute of Technology |
Yu, Huilong | Beijing Insitute of Technology |
Xi, JunQiang | Beijing Institute of Technology |
Keywords: Motion Planning Algorithms for Autonomous Vehicles, Collision Avoidance Algorithms
Abstract: Planning in unstructured narrow environments with perception errors is of great challenge, as inaccurately perceived obstacle positions may lead planners to generate collision-prone trajectories. Typically, constraints are applied on the collision probability of the potential trajectory to ensure safety. However, due to the oversimplified modeling of vehicles or obstacles in estimating collision probabilities, these methods are often unsuitable for unstructured environments. To address this issue, we propose a computationally efficient risk zone generation method and introduce a Gaussian error function-based evaluation method to the collision risk assessment. The resulting risk-aware cost term is integrated into a trajectory optimization framework based on optimal control theory, effectively enabling obstacle avoidance by explicitly penalizing collision risk along the trajectory. Additionally, to address the inherent limitations of penalty based obstacle avoidance, we introduce a geometric safety constraint rigorously derived with duality principles. In the simulation scenarios of the Trajectory Planning Competition for Automated Parking organized by the IEEE Intelligent Transportation Systems Conference 2022, we introduce perception uncertainty to better reflect real-world conditions. The results demonstrate that our algorithm significantly improves trajectory safety while maintaining an effective balance between safety and efficiency.
|
|
11:15-12:30, Paper MoBT1.16 | Add to My Program |
Adaptive Path Planning for Skill-Based Personalization in Parking Maneuvers |
|
Speidel, Piet | Robert Bosch GmbH |
Hilsch, Michael | Robert Bosch GmbH |
Alt, Benedikt | Robert Bosch GmbH |
Schildbach, Georg | University of Luebeck |
Keywords: Multi-Objective Planning Approaches, User-Centric Intelligent Vehicle Technologies, User Experience in Autonomous Vehicles
Abstract: This work introduces the idea of skill-based personalization for Advanced Driver Assistance Systems, aiming to address the limitations of traditional imitation-based personalization methods. Therefore, an innovative, adaptive path planning approach is developed as a crucial intermediate step in realizing this concept. This approach is exemplified through automated parking. For this purpose, the Hybrid A∗ and the Elastic-Band methods were integrated and modified to accommodate flexible target positions and incorporate additional Key Performance Indicators (KPIs) relevant for personalized parking algorithms in their cost functions. Additionally, two novel shortcut algorithms are proposed to address some of the limitations in adjusting these KPIs. As a result, a path planner is developed that is capable of producing customized paths aligned with the user’s specific skills and preferences.
|
|
11:15-12:30, Paper MoBT1.17 | Add to My Program |
A Trajectory Optimisation Approach for Motorcycles |
|
Abdallah, Mohammad | Loughborough University |
Hubbard, Peter | Loughborough University |
Fleming, James | Loughborough University |
Keywords: Motion Planning Algorithms for Autonomous Vehicles
Abstract: Trajectory planning and optimisation for motorcycles is an underdeveloped area in autonomous vehicle research. The Kinematic Bicycle Model (KBM) is popularly studied as a tool for path planning, trajectory optimisation and control in autonomous vehicle applications. However, due to the nonminimum phase properties of motorcycle systems, it is unsuitable as a model for path planning in motorcycles. As such, this paper presents an alternative trajectory planner, that makes use of a reduced order model approximation of the motorcycle dynamics to express the trajectory optimisation problem as a quadratic program that may be efficiently solved. This planner includes the nonminimum phase behaviour of motorcycles and enforces the dynamic constraints needed for safety guarantees. Tracking accuracy is assessed in a detailed non-linear motorcycle simulation, which shows a reduced root mean square error of 0.386m compared to 1.091m when using a planner based on KBM. Successful collision avoidance is also achieved when tracking, unlike when using the KBM planner. This provides a proof of concept for future research into real-time trajectory optimisation and planning for motorcycles.
|
|
MoBT2 Poster Session, Leonardo + Lobby Left |
Add to My Program |
Poster 1.2 >> Sensing and Perception: Objects Detection & Tracking |
|
|
Chair: Sotelo Vázquez, Miguel Ángel | University of Alcalá |
Co-Chair: Brehar, Raluca | Technical University of Cluj-Napoca, Computer Science Department |
|
11:15-12:30, Paper MoBT2.1 | Add to My Program |
3D Shape Adaptation across Datasets for Weakly Supervised Monocular 3D Object Detection |
|
Zhang, Xiaoning | Xi’an Jiaotong University |
Su, Yuanqi | Xi'an Jiaotong University |
Lu, HaoAng | Xi'an Jiaotong University |
Zhang, Chi | Xi'an Jiaotong University |
Liu, Yuehu | Institute of Artificial Intelligence and Robotics, Xi'an Jiaoton |
Keywords: Static and Dynamic Object Detection Algorithms, Techniques for Dataset Domain Adaptation
Abstract: Monocular 3D object detection (M3D) is a key yet challenging task that usually involves extensive and expensive manual annotation of 3D boxes. To eliminate the dependence on 3D box labels, weakly supervised M3D (WM3D) has recently been explored using only 2D annotations, which necessitate the use of extra resources, like LiDAR data, stereo images, and video sequences. However, the strict correspondence and complex calibration between the target image and additional resources limit their applicability. In this work, we propose a simple yet effective framework, 3D Shape Adaptation across datasets for Weakly supervised Monocular 3D Detection (SAWM3D). We observed that directly applying a source-dataset detector to the target dataset results in a significant domain gap, with the primary contribution coming from the 3D location, while orientation and dimensions have a smaller impact. This enables us to view WM3D as 3D shape adaptation optimization on the target dataset. Directly scaling the predicted shape results in a significant reduction of the adaptation gap; 3D detector fine-tuning with only 2D annotations also yields impressive results. Experiments on the KITTI benchmark demonstrate the effectiveness of our strategies.
|
|
11:15-12:30, Paper MoBT2.2 | Add to My Program |
TinyCenterSpeed: Efficient Center-Based Object Detection for Autonomous Racing |
|
Reichlin, Neil | ETH |
Baumann, Nicolas | ETH |
Ghignone, Edoardo | ETH Zurich |
Magno, Michele | ETH Zurich |
Keywords: Deep Learning Based Approaches, Level 3 Driving Systems Architecture and Techniques, Real-Time Data Processing for UAVs
Abstract: Perception within autonomous driving is nearly synonymous with Neural Networks (NNs). Yet, in the domain of autonomous racing — often characterized by scaled, compu- tationally limited robots used for cost-effectiveness and safety — opponent detection and tracking typically resort to traditional computer vision techniques due to computational constraints. This paper introduces TinyCenterSpeed, a streamlined adaptation of the seminal CenterPoint method, optimized for real-time performance on 1:10 scale autonomous racing platforms. This adaptation is viable even on OBCs powered solely by Central Processing Units (CPUs), as it incorporates the use of an external Tensor Processing Unit (TPU). We demonstrate that, compared to Adaptive Breakpoint Detector (ABD), the current State-of-the- Art (SotA) in scaled autonomous racing, TinyCenterSpeed not only improves detection and velocity estimation by up to 61.38% but also supports multi-opponent detection and estimation. It achieves real-time performance with an inference time of just 7.88 ms on the TPU, significantly reducing CPU utilization 8.3- fold.
|
|
11:15-12:30, Paper MoBT2.3 | Add to My Program |
Efficient Extrinsic Manual-Calibration Method for Vehicle-Mounted Surround View Cameras Using Relative Pose Estimation |
|
Nakashima, Hiroyuki | Honda Motor Co., Ltd |
Oshiyama, Hiroki | Artner Co., Ltd |
Saigusa, Shigenobu | Honda R&D Americas, Inc |
Keywords: Level 2 ADAS Control Techniques, 3D Scene Reconstruction Methods
Abstract: In this paper, we propose an simple and accurate manual calibration method for extrinsics of vehicle-mounted surround view cameras utilizing relative camera pose estimation. This method addresses the labor and complexity associated with manual calibration of multi-camera systems. Our proposed method can mitigate the accumulation of calculation errors that occur when sequentially estimating relative poses for multiple cameras.
|
|
11:15-12:30, Paper MoBT2.4 | Add to My Program |
MonoSORT3D: A Monocular Approach for Online Auxiliary-Free Multi-Object Tracking |
|
Khonsari, Rana | University of Saarland |
Eisemann, Leon | Porsche Engineering Group GmbH |
Vozniak, Igor | DFKI |
Müller, Christian | German Research Center for Artificial Intelligence |
Maucher, Johannes | Stuttgart Media University |
Keywords: Dynamic Object Tracking, Static and Dynamic Object Detection Algorithms, Data Annotation and Labeling Techniques
Abstract: Using a mono camera, in ADAS, as the primary sensor offers a significant advantage by minimizing system complexity since no extra fusion step is considered. This approach also leverages recent advancements in computer vision and deep learning, which enable high levels of environmental understanding and scene analysis from visual data alone. As such, mono-camera setups hold promise for achieving reliable perception at scale and lead to a growing interest in monocular approaches, particularly for detecting and tracking dynamic objects. However, traditional mono-camera methods are often dependent on auxiliary inputs from GPS or maps, which may be unreliable in complex terrain or areas with poor signal coverage. In this work, we introduce an effective monocular 3D multi-object tracking approach, called MonoSORT3D, which operates without requiring additional auxiliary inputs. Evaluation of our method on the KITTI and MOT17 datasets demonstrates the competitive performance, against state-of-the-art methods. Additionally, we provide an in-depth analysis of MonoSORT3D architecture by conducting an ablation study on different components within.
|
|
11:15-12:30, Paper MoBT2.5 | Add to My Program |
Calibrating the Full Predictive Class Distribution of 3D Object Detectors for Autonomous Driving |
|
Schröder, Cornelius | Technical University Munich (TUM) |
Schlüter, Marius-Raphael | Technical University Munich (TUM) |
Lienkamp, Markus | Lehrstuhl Für Fahrzeugtechnik, TU München |
Keywords: Deep Learning Based Approaches, Static and Dynamic Object Detection Algorithms
Abstract: In autonomous systems, precise object detection and uncertainty estimation are critical for self-aware and safe operation. This work addresses confidence calibration for the classification task of 3D object detectors. We argue that it is necessary to regard the calibration of the full predictive confidence distribution over all classes and deduce a metric which captures the calibration of dominant and secondary class predictions. We propose two auxiliary regularizing loss terms which introduce either calibration of the dominant prediction or the full prediction vector as a training goal. We evaluate a range of post-hoc and train-time methods for CenterPoint, PillarNet and DSVT-Pillar and find that combining our loss term, which regularizes for calibration of the full class prediction, and isotonic regression lead to the best calibration of CenterPoint and PillarNet with respect to both dominant and secondary class predictions. We further find that DSVT-Pillar can not be jointly calibrated for dominant and secondary predictions using the same method.
|
|
11:15-12:30, Paper MoBT2.6 | Add to My Program |
Adversarial Attacked Teacher for Domain Adaptive Object Detection under Poor Visibility Conditions |
|
Wang, Kaiwen | Karlsruhe Institute of Technology |
Shen, Yinzhe | Karlsruhe Institute of Technology |
Lauer, Martin | Karlsruher Institut Für Technologie |
Keywords: Perception Algorithms for Adverse Weather Conditions, Techniques for Dataset Domain Adaptation
Abstract: Camera-based object detection encounters challenges in adverse weather, which can compromise the robustness of the perception module within autonomous driving systems. Cutting-edge domain adaptive object detection methods use the teacher-student framework and domain adversarial learning to generate domain-invariant pseudo-labels for self-training. However, the pseudo-labels generated by the teacher model often exhibit a bias toward the majority class, incorporating overconfident false positives and underconfident false negatives. We reveal that pseudo-labels vulnerable to adversarial attacks are more likely to be of low quality. To address this issue, we propose a simple yet effective framework named Adversarial Attacked Teacher (AAT) to improve pseudo-label quality. Specifically, we apply adversarial attacks on the teacher model, prompting it to generate adversarial pseudo-labels to correct bias, suppress overconfidence, and encourage underconfident proposals. We introduce an adaptive pseudo-label regularization to emphasize the influence of pseudo-labels with high certainty and reduce the negative impacts of uncertain predictions. Moreover, reliable minority pseudo-labels, verified by pseudo-label regularization, are oversampled to minimize dataset imbalance without introducing false positives. AAT establishes a new state-of-the-art, achieving 53.0 mAP on the Cityscapes to Foggy Cityscapes benchmark. The code is publicly available at https://github.com/KIT-MRT/AAT/.
|
|
11:15-12:30, Paper MoBT2.7 | Add to My Program |
FADet: A Multi-Sensor 3D Object Detection Network Based on Local Featured Attention |
|
Guo, Ziang | Skolkovo Institute of Science and Technology |
Yagudin, Zakhar | Skolkovo Institute of Science and Technology |
Asfaw, Selamawit | Skolkovo Institute of Science and Technology |
Lykov, Artem | Skolkovo Institute of Science and Technology |
Tsetserukou, Dzmitry | Skolkovo Institute of Science and Technology |
Keywords: Advanced Multisensory Data Fusion Algorithms, Static and Dynamic Object Detection Algorithms, Deep Learning Based Approaches
Abstract: Camera, LiDAR, and radar are common perception sensors for autonomous driving tasks. Robust prediction of 3D object detection is optimally based on the fusion of these sensors. Taking advantage of their abilities remains a challenge, because each of these sensors has its own characteristics. Specifically, different sensors present different scales in their corresponding extracted features. To address this problem, considering the feature alignment in different scales, in this paper, we propose FADet, a multi-sensor 3D detection network, which specifically studies the characteristics of different sensors across the dimensions of their data input based on our local featured attention modules. For camera images, we propose a dual-attention-based submodule. For LiDAR point clouds, the triple-attention-based submodule is utilized, while the mixed-attention-based submodule is applied for features of radar points. With local featured attention submodules, our FADet has effective detection results in long-tail and complex scenes from camera, LiDAR and radar input. In the NuScenes validation dataset, FADet achieves state-of-the-art performance on LiDAR-camera object detection tasks with 71.8% NDS and 69.0% mAP, at the same time, on radar-camera object detection tasks with 51.7% NDS and 40.3% mAP.
|
|
11:15-12:30, Paper MoBT2.8 | Add to My Program |
HiLO: High-Level Object Fusion for Autonomous Driving Using Transformers |
|
Osterburg, Timo | TU Dortmund |
Albers, Franz | Technical University of Dortmund |
Diehl, Christopher | Technische Universität Dortmund |
Pushparaj, Rajesh | TU Dortmund |
Bertram, Torsten | Technische Universität Dortmund |
Keywords: Deep Learning Based Approaches, Static and Dynamic Object Detection Algorithms
Abstract: The fusion of sensor data is essential for a robust perception of the environment in autonomous driving. Learning-based fusion approaches mainly use feature-level fusion to achieve high performance, but their complexity and hardware requirements limit their applicability in near-production vehicles. High-level fusion methods offer robustness with lower computational requirements. Traditional methods, such as the Kalman filter, dominate this area. This paper modifies the Adapted Kalman Filter (AKF) and proposes a novel transformer-based high-level object fusion method called HiLO. Experimental results demonstrate improvements of 25.9 percentage points in F1-score and 6.1 percentage points in mean IoU. Evaluation on a new large-scale real-world dataset demonstrates the effectiveness of the proposed approaches. Their generalizability is further validated by cross-domain evaluation between urban and highway scenarios. Code, data, and models are available at https://github.com/rst-tu-dortmund/HiLO
|
|
11:15-12:30, Paper MoBT2.9 | Add to My Program |
Edge-Deployable Spatiotemporal Modeling Network for Vehicle Behavior Recognition |
|
Li, Gaojie | Xi'an Jiaotong University |
Li, Yaochen | Xi'an Jiaotong University |
Zhang, Ying | Xi'an Jiaotong University |
Wang, Yutong | Xi'an Jiaotong University |
Hao, Sibo | Xi'an Jiaotong University |
Su, Yuanqi | Xi'an Jiaotong University |
Keywords: End-to-End Neural Network Architectures and Techniques, Deep Learning Based Approaches
Abstract: Vehicle behavior recognition is essential for autonomous vehicles to quickly perceive and respond to their driving environment. In this paper, an edge-deployable spatiotemporal modeling network for vehicle behavior recognition is developed. Firstly, an Efficient SpatioTemporal Modeling (ESTM) Block is designed to extract both long-term evolution features and short-term motion information.Secondly, a Channel-Enhanced Spatial Modeling (CESM) Block is developed to capture the interdependencies among channels in spatial modeling. The proposed network can effectively process video input from onboard cameras while minimizing computational parameters and FLOPs. The combination of the ESTM and CESM blocks can produce rich and effective features for vehicle behavior recognition on edge devices. The experimental results demonstrate the effectiveness of the proposed method in real-world driving scenarios.
|
|
11:15-12:30, Paper MoBT2.10 | Add to My Program |
SAM-Maps: Road Map Generation for Automated Vehicles in Urban Areas |
|
van Andel, Matthijs Pieter | Delft University of Technology |
Boekema, Hidde | TU Delft |
Gavrila, Dariu M. | TU Delft |
Keywords: Geometric vs. Semantic Mapping, Motion Forecasting, Foundation Models Based Approaches
Abstract: Automated Vehicles (AVs) rely on up-to-date map in- formation to inform trajectory prediction and planning modules, but these maps are expensive to obtain and update as they are usually annotated by humans. We propose SAM-Maps, a method for automatically generating road maps from aerial images of urban areas that takes advantage of the power of foundation models, requiring no human annotation or additional training to map unseen areas. This method extracts a coarse road graph from the images and then estimates the geometry of the roads from this graph. We evaluate our model on the challenging road layouts of the recent View-of-Delft Prediction dataset by comparing the maps generated using our model to the human-annotated maps, achieving an IoU of 33.3% with our automatic method and an IoU of 56.1% with some human corrections in our method. We also evaluate a trajectory prediction model on our maps to test whether they are sufficiently accurate for downstream tasks. The performance of this model using the map from our automatic method is 37.9% better on the minADE6 metric than not using map data as input. To the best of our knowledge, this is the first method that extracts both the drivable area and road connections of European urban areas from aerial images. The code will be publicly released for research purposes.
|
|
11:15-12:30, Paper MoBT2.11 | Add to My Program |
Beyond Object Detection with Existence Maps for Anchor-Based Deep Learning Models |
|
Ramos Ferreira, Filipa | FEUP |
Rossetti, Rosaldo | University of Porto - FEUP |
Keywords: Deep Learning Based Approaches, Static and Dynamic Object Detection Algorithms
Abstract: LiDAR-based object detection models have achieved impressive accuracy in autonomous driving benchmarks. However, despite improvements in efficiency, these models lack interpretability and measured accuracies are heavily dataset-dependent, reflecting closed-set performance. Further, in the real-world, many dynamic situations arise, inducing inevitable perception errors. To address these limitations, we propose a twofold approach: first, we introduce the Existence Map, a method to visualise the internal knowledge of deep learning models that suggests existence of objects, and second, we propose a methodology to merge this information with the standard output, supplementing detections and calibrating final confidences, reducing the mean absolute error in 12.3%. Further, our experiments on the KITTI dataset demonstrate that the merging strategy can enhance precision and recall by 0.03 and 0.04, respectively, when evaluated across all ground-truth classes despite the model being trained on only cars, pedestrians and cyclists. Additionally, we show that existence maps can help identify missed objects, reduce false positives, and capture location uncertainties, leading to improved performance and increased interpretability in safety-critical object detection applications.
|
|
11:15-12:30, Paper MoBT2.12 | Add to My Program |
Cross-Level Sensor Fusion with Object Lists Via Transformer for 3D Object Detection |
|
Liu, Xiangzhong | Fortiss GmbH Research Institute of the Free State of Bavaria Ass |
Zhang, Jiajie | Technical University of Munich |
Shen, Hao | Fortiss GmbH |
Keywords: Advanced Multisensory Data Fusion Algorithms, Deep Learning Based Approaches, Cooperative Perception and Localization Techniques
Abstract: In automotive sensor fusion systems, smart sensors and Vehicle-to-Everything (V2X) modules are commonly utilized. Sensor data from these systems are typically available only as processed object lists rather than raw sensor data from traditional sensors. Instead of processing other raw data separately and then fusing them at the object level, we propose an end-to-end cross-level fusion concept with Transformer, which integrates highly abstract object list information with raw camera images for 3D object detection. Object lists are fed into a Transformer as denoising queries and propagated together with learnable queries through the latter feature aggregation process. Additionally, a deformable Gaussian mask, derived from the positional and size dimensional priors from the object lists, is explicitly integrated into the Transformer decoder. This directs attention toward the target area of interest and accelerates model training convergence. Furthermore, as there is no public dataset containing object lists as a standalone modality, we propose an approach to generate pseudo object lists from ground-truth bounding boxes by simulating state noise and false positives and negatives. As the first work to conduct cross-level fusion, our approach shows substantial performance improvements over the vision-based baseline on the nuScenes dataset. It demonstrates its generalization capability over diverse noise levels of simulated object lists and real detectors.
|
|
11:15-12:30, Paper MoBT2.13 | Add to My Program |
PDB-Eval: An Evaluation of Large Multimodal Models for Description and Explanation of Personalized Driving Behavior |
|
Wu, Junda | UCSD |
Echterhoff, Jessica | UC San Diego |
Han, Kyungtae | Toyota Motor North America |
Abdelraouf, Amr | Toyota North America R&D |
Gupta, Rohit | Toyota Motor North America R&D |
McAuley, Julian | UC San Diego |
Keywords: Synthetic Data Generation for Training, Foundation Models Based Approaches, Data Augmentation Techniques Using Neural Networks
Abstract: Understanding a driver's behavior and intentions is important for potential risk assessment and early accident prevention. Safety and driver assistance systems can be tailored to individual drivers' behavior, significantly enhancing their effectiveness. However, existing datasets are limited in describing and explaining general vehicle movements based on external visual evidence. This paper introduces a benchmark, PDB-Eval, for a detailed understanding of Personalized Driver Behavior, and aligning Large Multimodal Models (MLLMs) with driving comprehension and reasoning. Our benchmark consists of two main components, PDB-X and PDB-QA. PDB-X can evaluate MLLMs' understanding of temporal driving scenes. Our dataset is designed to find valid visual evidence from the external view to explain the driver's behavior from the internal view. To align MLLMs' reasoning abilities with driving tasks, we propose PDB-QA as a visual explanation question-answering task for MLLM instruction fine-tuning. As a generic learning task for generative models like MLLMs, PDB-QA can bridge the domain gap without harming MLLMs' generalizability. Our evaluation indicates that fine-tuning MLLMs on fine-grained descriptions and explanations can effectively bridge the gap between MLLMs and the driving domain, which improves zero-shot performance on question-answering tasks by up to 73.2%. We further evaluate the MLLMs fine-tuned on PDB-X in Brain4Cars' intention prediction and AIDE's recognition tasks. We observe up to 12.5% performance improvements on the turn intention prediction task in Brain4Cars, and consistent performance improvements up to 11.0% on all tasks in AIDE.
|
|
11:15-12:30, Paper MoBT2.14 | Add to My Program |
Weight Pruning to Mitigate Class-Specific Accuracy Degradation for LiDAR-Based 3D Object Detection |
|
Ito, Tenshi | Chubu University |
Hirakawa, Tsubasa | Chubu University |
Yamashita, Takayoshi | Chubu University |
Fujiyoshi, Hironobu | Chubu University |
Keywords: Static and Dynamic Object Detection Algorithms, Deep Learning Based Approaches
Abstract: The realization of autonomous driving systems requires efficient and accurate 3D object detection to identify objects such as vehicles, pedestrians, and cyclists within the driving environment using point cloud data. For achieving both high speed processing and high accuracy, it is necessary to reduce model size by model compression techniques, such as pruning, while maintaining performance. However, pruning for 3D object detection tasks has not been extensively studied, and the effects of applying existing pruning methods to 3D object detection models remain unclear. In this paper, we clarify the problems of pruning 3D object detection models with existing methods through preliminary experiments, and propose a pruning method suitable for 3D object detection models that solves these problems. Our preliminary experiments reveal that existing pruning methods significantly degrade detection performance for specific object classes. To address this issue, we propose a pruning method that preserves class-specific knowledge, mitigating biased accuracy degradation across different object classes. Experimental results on the KITTI dataset demonstrate that the proposed method can be combined with existing pruning methods without conflicts and achieves higher accuracy than existing methods.
|
|
11:15-12:30, Paper MoBT2.15 | Add to My Program |
Infrastructure Based Detection, Tracking and Modelling of Traffic Participants for Realistic Digital Twin Representation and Behavior Prediction (I) |
|
Kocsis, Mihai | Heilbronn University |
Heinrich, Erik | Heilbronn University |
Schnepf, Florian | Heilbronn University of Applied Sciences |
Zöllner, Raoul | Universtiy of Heilbronn |
Keywords: Cooperative Perception and Localization Techniques, Dynamic Object Tracking, Synthetic Data Generation for Training
Abstract: Traffic congestion and inefficiencies in urban mobility remain significant challenges, necessitating advanced solutions that leverage real-world data and intelligent control strategies. In order to optimize traffic flows e.g. through traffic light control, analysis of real-world scenarios in urban hotspots is required. This research integrates data of real traffic participants and traffic lights status recorded in the Test Field Autonomous Driving Baden-Württemberg (TAF-BW) into a digital twin. The focus lies on object detection and tracking in the test field, integration of this data into a digital twin and accurate modelling of pedestrians. The real and simulated data builds the framework for training of behavioral models of various traffic participants. Based on the intention of these participants, light signal control is possible to optimize traffic flow.
|
|
11:15-12:30, Paper MoBT2.16 | Add to My Program |
LiDAR-Guided Monocular 3D Object Detection for Long-Range Railway Monitoring (I) |
|
Domínguez Sánchez, Raul David | Technical University of Munich/SETLabs Research GmbH |
Díaz, Xavier | Setlabs Research GmbH |
Xingcheng, Zhou | Technical University of Munich |
Ronecker, Max Peter | SETLabs Research GmbH |
Karner, Michael | SETLabs Research GmbH |
Watzenig, Daniel | Virtual Vehicle Research Center |
Knoll, Alois | Technische Universität München |
Keywords: Advanced Multisensory Data Fusion Algorithms, Deep Learning Based Approaches, Lidar-Based Environment Mapping
Abstract: Railway systems, particularly in Germany, require high levels of automation to address legacy infrastructure challenges and increase train traffic safely. A key component of automation is robust long-range perception, essential for early hazard detection, such as obstacles at level crossings or pedestrians on tracks. Unlike automotive systems with braking distances of ~70 meters, trains require perception ranges exceeding 1 km. This paper presents an deep-learning-based approach for long-range 3D object detection tailored for autonomous trains. The method relies solely on monocular images, inspired by the Faraway-Frustum approach, and incorporates LiDAR data during training to improve depth estimation. The proposed pipeline consists of four key modules: (1) a modified YOLOv9 for 2.5D object detection, (2) a depth estimation network, and (3-4) dedicated short- and long-range 3D detection heads. Evaluations on the OSDaR23 dataset demonstrate the effectiveness of the approach in detecting objects up to 250 meters. Results highlight its potential for railway automation and outline areas for future improvement.
|
|
11:15-12:30, Paper MoBT2.17 | Add to My Program |
MH-CDNet: Map and History-Aided Change Detection of Traffic Signs in High-Definition Maps |
|
Zhong, Yangyi | Wuhan University |
Guo, Yuxiang | Wuhan University |
Yue, Peng | Wuhan University |
Cai, Chuanwei | Wuhan University |
Li, Jian | State Key Laboratory of Intelligent Vehicle Safty Technology Cho |
Kai, Yan | Chongqing ChangAn Auto |
|
|
11:15-12:30, Paper MoBT2.18 | Add to My Program |
Q-Loc: Visual Cue-Based Ground Vehicle Localization Using Long Short-Term Memory |
|
Malinchock, Cole | North Carolina State University |
Yu, Jimin | North Carolina State University |
Thapa, Pratik | North Carolina State University |
Ungrupulithaya, Dhruva | North Carolina State University |
Yoon, Man-Ki | North Carolina State University |
Keywords: Continuous Localization Solutions, 3D Scene Reconstruction Methods, Dynamic Object Tracking
Abstract: Mobile autonomous systems are increasingly being deployed in controlled environments worldwide, with large fleets of ground robots performing tasks such as delivery and surveillance. These systems require reliable localization to navigate through such environments. While the Global Positioning System (GPS) is commonly implemented in these systems, urban environments can introduce inaccuracies due to signal blockages caused by large buildings and structures, or even complete signal loss. This paper proposes a rapid and cost-effective localization method using a sensor ubiquitous in autonomous systems: cameras. We introduce a system that uses vision-based machine learning techniques to detect common landmarks in camera streams and subsequently predict location. The system employs advanced object detection models for landmark identification and recurrent neural networks for vehicle localization based on the detected landmarks. We prototype these techniques on a small-scale autonomous vehicle platform to demonstrate the system’s capabilities and evaluate its accuracy and execution efficiency in real-world scenarios.
|
|
MoBT3 Poster Session, Raffaello + Lobby Right |
Add to My Program |
Poster 1.3 >> Datasets & Neural Scene Representation |
|
|
Chair: Hornauer, Sascha | MINES Paristech |
Co-Chair: López, Antonio M. | Universitat Autònoma De Barcelona |
|
11:15-12:30, Paper MoBT3.1 | Add to My Program |
Enhancing Data Efficiency for Training Object Detectors |
|
Höhne, Mirco Oliver | Robert Bosch GmbH |
Menke, Maximilian | Robert Bosch GmbH |
Bieshaar, Maarten | Robert Bosch GmbH |
Keywords: Profile Extraction and Discovery from Datasets, Data Annotation and Labeling Techniques, Deep Learning Based Approaches
Abstract: Deep learning has transformed object detection in autonomous driving and robotics. Yet, it requires training with large datasets, driving up costs and resource demands. While data reduction techniques offer a solution, most research has focused on image classification, leaving object detection largely unexplored. This paper introduces, adapts, and evaluates different data reduction strategies for 2D object detection. Experiments with Faster R-CNN on nuImages and BDD100K reveal that: (1) Reduction methods based on loss prove to be both simple and effective, achieving up to 40% dataset reduction while preserving model performance, (2) We introduce a novel predictive measure for dataset quality, leveraging intrinsic dimensionality to evaluate dataset diversity. This metric achieves up to 80% alignment (Spearman correlation) with the final performance on the full dataset, enabling efficient pre-evaluation of potential reductions and streamlining the reduction process, (3) We investigate the impact of label errors on data reduction, revealing their influence, especially at high dataset compression rates, and offering key insights for developing robust reduction strategies.
|
|
11:15-12:30, Paper MoBT3.2 | Add to My Program |
PedGT: Enhancing Pedestrian Intention Prediction Using a Skeleton-Based Graph-Transformer |
|
Riaz, Muhammad Naveed | Computer Vision Center (CVC), Universitat Autònoma De Barcelona |
Wielgosz, Maciej | Computer Vision Center (CVC) |
Xie, Chen | Jilin University |
López, Antonio M. | Universitat Autònoma De Barcelona |
Keywords: Synthetic Data Generation for Training, Data Annotation and Labeling Techniques, Vulnerable Road User Protection Strategies
Abstract: Abstract—Accurately predicting pedestrian crossings in front of ego-vehicles is essential for intelligent transportation systems (ITS) to enhance road safety. Many existing approaches rely on multiple input modalities, such as scene images, segmentation maps, and trajectory data, which introduce complexity and inefficiencies, thus limiting real-time applicability. To address these challenges, we propose PedGT, a graph-based transformer model that integrates a graph convolutional network (GCN) for spatial feature extraction and a transformer encoder for temporal modeling. Unlike multi-modal methods, PedGT simplifies the pipeline by utilizing only pedestrian pose keypoints and bounding box center points, achieving superior performance on two benchmark datasets. On PIE, it achieves an F1 score of 91% and a recall of 93%, surpassing the previous best of 89%, and 88% by PCPNet. On JAAD, PedGT improves F1 and recall to 70%, outperforming PedFormer’s 54% and 48%. Ablation studies highlight the impact of data normalization on accuracy, while frame importance analysis identifies keyframes influencing predictions. This work demonstrates that selecting optimal inputs and leveraging an efficient spatial-temporal model enable PedGT to outperform multi-modal solutions, providing a more streamlined and effective approach for pedestrian intention prediction.
|
|
11:15-12:30, Paper MoBT3.3 | Add to My Program |
Really, Pedestrian Trajectories: How Realistic Are the Datasets? |
|
Dietl, Laura | Technische Hochschule Ingolstadt |
Facchi, Christian | Technische Hochschule Ingolstadt |
Keywords: Automotive Datasets, Motion Forecasting
Abstract: The accurate prediction of pedestrian trajectories is a crucial ability for systems to safely navigate in real-world traffic scenarios. But pedestrian trajectory prediction is challenging due to the social, environmental, and intra-personal context, as well as the multimodal nature of future paths. The effectiveness of path prediction models in accounting for these factors is highly dependent on the dataset on which they are trained. Moreover, in order for such models to be utilized in real-world traffic scenarios the datasets need to reflect their characteristics, such as various infrastructure elements or pedestrian interactions with a range of road user classes. This work examines how realistic the four common pedestrian trajectory datasets are, namely the BIWI Walking Pedestrians (ETH) dataset, the Crowds UCY/Zara (UCY) dataset, the Intersection Drone (inD) dataset and the Stanford Drone Dataset (SDD). To this end, a complexity classification scheme is defined that categorizes pedestrian trajectories based on a newly developed Social Attention (SA) factor, trajectory non-linearity, and the number of starts and stops in a trajectory. The datasets are evaluated based on these factors, the complexity classification, and the environmental context, and then compared to one another. This work provides a source of information about strengths and limitations of the datasets, the complexity of individual pedestrian paths, the behaviors that may be learnable, and the application areas for which they may be best suited.
|
|
11:15-12:30, Paper MoBT3.4 | Add to My Program |
IDD-CRS: A Comprehensive Video Dataset for Critical Road Scenarios in Unstructured Environments |
|
Mishra, Ravi Shankar | International Institue of Information Technology, Hyderabad |
Parikh, Chirag | International Institute of Information Technology, Hyderabad |
Subramanian, Anbumani | INAI, International Institue of Information Technology, Hyderaba |
Jawahar, Cv | IIIT Hyderabad |
Sarvadevabhatla, Ravi Kiran | International Institue of Information Technology, Hyderabad |
Keywords: Data Annotation and Labeling Techniques, Vulnerable Road User Protection Strategies, Advanced Passive Safety Systems
Abstract: In this work, we present IDD-CRS, a large-scale dataset focused on critical road scenarios, captured using Advanced Driver Assistance Systems (ADAS) and dash cameras. Unlike existing datasets that predominantly emphasize pedestrian safety and vehicle safety separately, IDD-CRS incorporates both vehicle and pedestrian behaviors, offering a more comprehensive view of road safety. The dataset includes diverse scenarios, such as high-speed lane changes, unsafe vehicle approaches to pedestrians and cyclists, and complex interactions between ego vehicles and other road agents. Leveraging ADAS technology allows us to accurately define the temporal boundaries of actions, resulting in precise annotations and more reliable safety analysis. With 90 hours of video footage, consisting of 5400 one-minute-long videos and 135,000 frames, IDD-CRS introduces new vehicle related classes and hard negative classes, establishing baselines for action recognition and long-tail action recognition tasks. Our benchmarks reveal the limitations of current models, pointing toward future advancements needed for improving road safety technology.
|
|
11:15-12:30, Paper MoBT3.5 | Add to My Program |
RealDriveSim: A Realistic Multi-Modal Multi-Task Synthetic Dataset for Autonomous Driving |
|
Jadon, Arpit | German Aerospace Center |
Wang, Haoran | Max-Planck Institut Für Informatik, Saarland Informatics Campus, |
Thomas, Phillip | Parallel Domain |
Stanley, Michael | Zoox, San Francisco, California |
Cibik, S. Nathaniel | Parallel Domain, San Francisco, California |
Laurat, Rachel | Parallel Domain, San Francisco, California |
Maher, Omar | Monta AI, Sacramento, California |
Hoyer, Lukas | ETH Zurich |
Unal, Ozan | Huawei Technologies, Zurich Research Center, Zurich |
Dai, Dengxin | Huawei Technologies, Zurich Research Center, Zurich |
Keywords: Automotive Datasets, Data Annotation and Labeling Techniques, Synthetic Data Generation for Training
Abstract: As perception models continue to develop, the need for large-scale datasets increases. However, data annotation remains far too expensive to effectively scale and meet the demand. Synthetic datasets provide a solution to boost model performance with substantially reduced costs. However, current synthetic datasets remain limited in their scope, realism, and are designed for specific tasks and applications. In this work, we present RealDriveSim, a realistic multi-modal synthetic dataset for autonomous driving that not only supports popular 2D computer vision applications but also their LiDAR counterparts, providing fine-grained annotations for up to 64 classes. We extensively evaluate our dataset for a wide range of applications and domains, demonstrating state-of-the-art results compared to existing synthetic benchmarks. RealDriveSim will be released upon acceptance. The dataset is publicly available at https://realdrivesim.github.io/.
|
|
11:15-12:30, Paper MoBT3.6 | Add to My Program |
ICF-Body: A Multimodal Sensor Fusion Dataset for In-Cabin Estimation of Occupant Body Pose and Anthropometric Measurements |
|
Preu, Victor | Volkswagen AG |
Dihora, Savan | Volkswagen AG |
Rygol, Tim | Volkswagen AG |
Pauer, Daniel | Volkswagen AG |
Almeida, Pedro | Volkswagen AG |
Hecker, Peter | Technische Universität Braunschweig |
Keywords: Automotive Datasets, Advanced Multisensory Data Fusion Algorithms, Advanced Passive Safety Systems
Abstract: We introduce ICF-Body, a multimodal in-cabin sensor fusion dataset. Our dataset aims to allow the development and benchmarking of in-cabin sensor fusion algorithms for estimating (a) occupant body pose and (b) anthropometric measurements. Both tasks are closely related, as knowledge of body dimensions and proportions ensures a consistent body model across different poses. ICF-Body is the first dataset that contains temporal data from a memory seat configuration sensor (SCS) and a belt webbing extraction sensor (WES), in addition to near-infrared (NIR) images, RGB images, ultra-wideband (UWB), and 60 GHz radar. A time-of-flight (ToF) camera was used for dynamic body pose ground truth labeling, providing thirteen 3D keypoints. Seven body measurements were obtained as static anthropometric ground truth labels for each participant. The availability of accurate estimates of (a) body pose and (b) anthropometric measurements will be crucial for future restraint systems to allow the precise adaptation of deployment characteristics.
|
|
11:15-12:30, Paper MoBT3.7 | Add to My Program |
Cross-Cultural Analysis of Car-Following Dynamics: A Comparative Study of Open-Source Trajectory Datasets |
|
Taourarti, Imane | Ensta Paris / Renault Group |
Tapus, Adriana | ENSTA ParisTech |
Monsuez, Bruno | Ecole Nationale Supérieure Des Techniques Avancées |
Ibanez Guzman, Javier | Renault S.A.S, |
Ramaswamy, Arunkumar | Renault |
Keywords: Level 2 ADAS Control Techniques, Automotive Datasets, Profile Extraction and Discovery from Datasets
Abstract: This study addresses the critical need for refined, reliable, and complete real-world trajectory data in the development of Advanced Driver Assistance Systems (ADAS), particularly for Adaptive Cruise Control (ACC) functions. We conducted a comprehensive comparison of car-following and deceleration scenarios across ten open-source datasets from multiple countries, encompassing both highway and urban environments. Focusing on key kinematic variables crucial for longitudinal behavior, we employed statistical measures and safety metrics to compare datasets across different driving regulations and road designs. Our findings reveal substantial overlaps in the distributions of logical parameters, despite the varied data sources and cultural contexts. However, we noted significant differences in safety-critical metrics, such as Time Headway and Time To Collision (TTC), highlighting culture-specific driving behaviors. Interestingly, Chinese datasets consistently exhibited the smallest distance headways across all scenarios, yet maintained high TTC values (around 16s) compared to other datasets, suggesting a unique approach to risk management. To quantify these differences, we calibrated the Intelligent Driver Model using U.S. data and evaluated its transferability, demonstrating remarkable performance degradation when applied to non-U.S. datasets. These results provide crucial insights for developing globally applicable, yet culturally sensitive safety assessment methodologies for next-generation automated vehicles, highlighting the need for adaptive ADAS technologies that can accommodate regional driving norms while maintaining consistent safety standards.
|
|
11:15-12:30, Paper MoBT3.8 | Add to My Program |
An Effective and Robust Driving Scenario Identification Framework Utilizing Unsupervised Covariance Clustering |
|
Zeng, Zifan | Huawei Technologies Duesseldorf GmbH; Technical University of Mu |
Liu, Shiming | Huawei Technologies Co., Ltd |
Bao, Zhenyu | Huawei Technologies Co., Ltd |
Zhang, Qunli | Huawei Technologies Duesseldorf GmbH |
Wang, Peng | Huawei Technologies Co., Ltd., RAMS Lab |
Hu, Zheng | Huawei |
Keywords: Profile Extraction and Discovery from Datasets, Data Annotation and Labeling Techniques, Safety Verification and Validation Techniques
Abstract: The technology of autonomous driving vehicles has made rapid progress over the last decade, but the challenge of proving these system’s safety still exists. Compared to conventional mile-based testing, scenario-based testing (SBT) is a more promising solution since scenarios covering diverse and rare driving conditions in real traffic can be simulated to assess the system’s performance in safety-critical scenarios. Furthermore, understanding run-time scenarios is vital to trigger safety mechanisms designed for the Safety Of The Intended Functionality (SOTIF). However, a challenging task is to extract or generate scenarios in the design phase and recognize driving scenarios in the run-time phase due to the complexity and diversity of driving scenarios, especially the interaction with other driving agents. In this study, we proposed a complete framework for offline extraction and online identification of all kinds of interaction scenarios. A covariance-clustering based method was adopted to identify the meta-driving actions, which used the Toeplitz matrix to achieve more interpretable clustering results than distance-based methods. Experiments demonstrated the effectiveness of our method by its robust identification results for cutin and merge-in scenarios. With a lightweight design and a theoretically valid confidence estimation method, our approach is computationally efficient for reliable online applications.
|
|
11:15-12:30, Paper MoBT3.9 | Add to My Program |
BEV-LLM: Leveraging Multimodal BEV Maps for Scene Captioning in Autonomous Driving |
|
Brandstätter, Felix | University of Applied Science Munich |
Schuetz, Erik | Munich University of Applied Sciences |
Winter, Katharina | Munich University of Applied Sciences |
Flohr, Fabian | Munich University of Applied Sciences |
Keywords: Foundation Models Based Approaches, Feedback Systems for Driver Interaction, Automotive Datasets
Abstract: Autonomous driving technology has the potential to transform transportation, but its wide adoption depends on the development of interpretable and transparent decisionmaking systems. Scene captioning, which generates natural language descriptions of the driving environment, plays a crucial role in enhancing transparency, safety, and human-AI interaction. We introduce BEV-LLM, a lightweight model for 3D captioning of autonomous driving scenes. BEV-LLM leverages BEVFusion to combine 3D LiDAR point clouds and multi-view images, incorporating a novel absolute positional encoding for view-specific scene descriptions. Despite using a small 1B parameter base model, BEV-LLM achieves competitive performance on the nuCaption dataset, surpassing state-of-the-art by up to 5% in BLEU scores. Additionally, we release two new datasets — nuView (focused on environmental conditions and viewpoints) and GroundView (focused on object grounding) — to better assess scene captioning across diverse driving scenarios and address gaps in current benchmarks, along with initial benchmarking results demonstrating their effectiveness.
|
|
11:15-12:30, Paper MoBT3.10 | Add to My Program |
Design and Development of a Digital Twin for Monitoring Railway Infrastructure |
|
Fuentes, Javier | University of Alcala |
Fierro, Franck | University of Alcala |
Barea, Rafael | University of Alcala |
López-Guillén, Elena | University of Alcalá |
Bergasa, Luis M. | University of Alcala |
Keywords: Synthetic Data Generation for Training, Dataset Augmentation Using Neural Field
Abstract: The aim of this work is to develop a digital twin application to ensure an optimal level of reliability when launching a larger project based on the identification of trains and detection of defects that allow for safe freight transport on spanish trains. The digital twin framework consists of three parts: the “physical product” which consists of a scanning camera placed on a track gantry, the ”virtual product” which includes a model based on real-time data representing the freight car detected by the perception system, and the data flow connections. The camera images will be post-processed through an artificial intelligence detection model (YOLOv8), trained to detect all the elements necessary for the safety of the vehicle and the cargo. Field studies have demonstrated the effectiveness of the proposed digital twin framework and its potential to identify railcars and detect defects in freight wagons.
|
|
11:15-12:30, Paper MoBT3.11 | Add to My Program |
How Hard Is Snow? a Paired Domain Adaptation Dataset for Clear and Snowy Weather: CADC+ |
|
Tang, Mei Qi | University of Waterloo |
Sedwards, Sean | University of Waterloo |
Huang, Chengjie | University of Waterloo |
Czarnecki, Krzysztof | University of Waterloo |
Keywords: Automotive Datasets, Techniques for Dataset Domain Adaptation, Data Annotation and Labeling Techniques
Abstract: Evaluating the impact of snowfall on 3D object detection requires a dataset with sufficient labelled data from both weather conditions, ideally captured in the same driving environment. Current datasets with LiDAR point clouds either do not provide enough labelled data in both domains, or rely on de-snowing methods to generate synthetic clear weather. Synthetic data often lacks realism and introduces an additional domain shift that confounds accurate evaluations. To address these challenges, we present CADC+, the first paired weather domain adaptation dataset for autonomous driving in winter conditions. CADC+ extends the Canadian Adverse Driving Conditions (CADC) dataset using clear weather data that was recorded on the same roads and in the same period as CADC. To create CADC+, we pair each CADC sequence with a clear weather sequence that matches the snowy sequence as closely as possible. CADC+ thus minimizes the domain shift resulting from factors unrelated to the presence of snow. We also present some preliminary results using CADC+ to evaluate the effect of snow on 3D object detection performance. We observe that snow introduces a combination of aleatoric and epistemic uncertainties, acting as both noise and a distinct data domain.
|
|
11:15-12:30, Paper MoBT3.12 | Add to My Program |
Methodology for Scalable LiDAR Datasets |
|
Sanchez Guitierrez-Cabello, Guillermo | Institute for Automobile Research (INSIA) Universidad Politécnic |
Jiménez, Felipe | Universidad Politécnica De Madrid |
Talavera, Edgar | Universidad Politecnica De Madrid |
Keywords: Automotive Datasets, Deep Learning Based Approaches, Infrastructure Requirements for Automated Vehicles
Abstract: The use of Light Detection and Ranging (LiDAR) in intelligent transportation systems has gained increasing attention due to its ability to provide accurate three-dimensional representations of the environment and its robustness under adverse weather conditions compared to camera based systems. However, traditional classification approaches, whether based on geometric features or deep learning, are highly dependent on sensor configuration and can suffer from occlusions or incomplete representations. This work proposes a classification and retrieval based methodology leveraging embedding vectors in a latent space generated through PointNet and a KNN based label assignment strategy. A novel auto-labeling process is introduced, incorporating ID consistency filtering to ensure coherent label propagation across multiple captures of the same vehicle. Additionally, a combined proximity measure is used in KNN retrieval, integrating both latent space similarity and sensor distance correction, enhancing classification robustness, particularly for underrepresented categories. The proposed approach is validated on real world LiDAR data captured from fixed infrastructure, addressing the scarcity of publicly available datasets in this configuration. Results demonstrate that this method provides an accurate, scalable, and sensor-independent classification framework, ensuring reliable label assignment across diverse traffic conditions.
|
|
11:15-12:30, Paper MoBT3.13 | Add to My Program |
TIAND-SLAM: A Multi-Modal SLAM Dataset for Autonomous Navigation |
|
Thakur, Abhishek | IIT Hyderabad |
S, Abhilash | Indian Institute of Technology, Hyderabad |
V, Samuktha | Indian Institute of Technology Hyderabad |
Pachamuthu, Rajalakshmi | Indian Institute of Technology, Hyderabad |
Keywords: Automotive Datasets
Abstract: Simultaneous Localization and Mapping (SLAM) is fundamental to autonomous navigation, relying on multimodal dataset to generate high-definition (HD) maps for accurate localization and path planning. However, most existing SLAM datasets primarily feature structured environments with abundant landmarks, such as urban areas with buildings and well-defined road infrastructures. In contrast, highways in semi-urban areas pose significant challenges due to the limited availability of prominent features. To bridge this gap, we introduce TIAND-SLAM (TiHAN-IITH Autonomous Navigation Dataset), a novel multi-modal dataset collected from highway roads, both under and above a flyover in Hyderabad, India, as well as the campus data within the IIT Hyderabad (IITH) campus and TiHAN testbed. TIAND-SLAM is designed to facilitate research on SLAM generalization in feature-scarce environments too. The dataset includes 30 trajectories ranging from 50 meters to 2.5 kilometers, recorded using a sensor suite comprising a LiDAR, six cameras, radar, RTK-GNSS, and an Inertial Measurement Unit (IMU) sensor. Ground truth localization is obtained using RTK-GNSS ensuring precise benchmarking. Additionally, we evaluate SLAM performance by generating maps using LiDAR data. TIAND-SLAM serves as a valuable resource for advancing SLAM research in highway and unstructured terrains, promoting robustness and adaptability in real-world autonomous navigation scenarios.
|
|
11:15-12:30, Paper MoBT3.14 | Add to My Program |
Highly Accurate and Diverse Traffic Data: The DeepScenario Open 3D Dataset |
|
Dhaouadi, Oussema | Technical University of Munich |
Meier, Johannes Michael | TU Munich |
Wahl, Luca | DeepScenario |
Kaiser, Jacques | DeepScenario GmbH |
Scalerandi, Luca | Technical University of Munich, DeepScenario |
Wandelburg, Nick | DeepScenario GmbH |
Zhuolun, Zhou | DeepScenario |
Berinpanathan, Nijanthan | DeepScenario GmbH |
Banzhaf, Holger | DeepScenario GmbH |
Cremers, Daniel | TU Munich |
Keywords: UAV Datasets, Automotive Datasets, Data Annotation and Labeling Techniques
Abstract: Accurate 3D trajectory data is crucial for advancing autonomous driving. Yet, traditional datasets are usually captured by fixed sensors mounted on a car and are susceptible to occlusion. Additionally, such an approach can precisely reconstruct the dynamic environment in the close vicinity of the measurement vehicle only, while neglecting objects that are further away. In this paper, we introduce the DeepScenario Open 3D Dataset (DSC3D), a high-quality, occlusion-free dataset of 6 degrees of freedom bounding box trajectories acquired through a novel monocular camera drone tracking pipeline. Our dataset includes more than 175,000 trajectories of 14 types of traffic participants and significantly exceeds existing datasets in terms of diversity and scale, containing many unprecedented scenarios such as complex vehicle-pedestrian interaction on highly populated urban streets and comprehensive parking maneuvers from entry to exit. DSC3D dataset was captured in five various locations in Europe and the United States and include: a parking lot, a crowded inner-city, a steep urban intersection, a federal highway, and a suburban intersection. Our 3D trajectory dataset aims to enhance autonomous driving systems by providing detailed environmental 3D representations, which could lead to improved obstacle interactions and safety. We demonstrate its utility across multiple applications including motion prediction, motion planning, scenario mining, and generative reactive traffic agents. Our interactive online visualization platform and the complete dataset are publicly available at app.deepscenario.com, facilitating research in motion prediction, behavior modeling, and safety validation.
|
|
11:15-12:30, Paper MoBT3.15 | Add to My Program |
Prediction of Occluded Pedestrians in Road Scenes Using Human-Like Reasoning: Insights from the OccluRoads Dataset |
|
Melo Castillo, Angie Nataly | University of Alcala |
Martin Serrano, Sergio | University of Alcala |
Salinas Maldonado, Carlota | University of Alcala |
Sotelo, Miguel A. | University of Alcala |
Keywords: Automotive Datasets, Static and Dynamic Object Detection Algorithms, Synthetic Data Generation for Training
Abstract: Pedestrian detection is a critical task in autonomous driving, aimed at improving safety and reducing risks on the road. In recent years, significant advancements have been made in detection performance. However, these achievements still fall short of human perception, particularly in cases involving occluded pedestrians, especially those entirely invisible. In this work, we present the Occlusion-Rich Road Scenes with Pedestrians (OccluRoads) dataset, a diverse collection of road scenes with partially and fully occluded pedestrians in both real-world and virtual environments. All scenes are meticulously labeled and enriched with contextual information that encapsulates human perception in such scenarios. Leveraging this dataset, we developed a pipeline to predict the presence of occluded pedestrians using Knowledge Graph (KG), Knowledge Graph Embedding (KGE), and a Bayesian inference process. Our approach achieves an F1 score of 0.91, representing an improvement of up to 42% compared to traditional machine learning models.
|
|
11:15-12:30, Paper MoBT3.16 | Add to My Program |
Synthetic Dataset Generation Using Logical Scenario Files for Automotive Perception Testing |
|
García, Mikel | Vicomtech |
Iglesias, Aitor | Fundación Vicomtech |
Sánchez, Martí | Fundación Vicomtech |
Naranjo, Ruben | Vicomtech |
Iñiguez de Gordoa, Jon Ander | Vicomtech |
Nieto, Marcos | Vicomtech |
Aginako Bengoa, Naiara | UPV/EHU |
Keywords: Data Annotation and Labeling Techniques, Automotive Datasets, Synthetic Data Generation for Training
Abstract: Conducting extensive recording campaigns to asses the safety of newly developed Automated Driving Systems (ADS) or perception algorithms has been proved to be a costly and time consuming process. This is one of the reasons why the automotive industry is adopting the scenario-based testing methodology, to verify and validate the safety of the developed ADS in their expected operating domain. The exterior perception system is the first component in the sense-plan-act process of Connected Cooperative and Automated Vehicles (CCAVs). In this context, high-fidelity simulation engines are used to replicate sensor setups at reduced cost and higher scalability than driving and capturing data from real sensors. The use of logical automotive scenario descriptions allows defining certain parameter ranges, contexts and actions to execute simulations that fulfill the desired conditions. This work proposes a methodology for generating synthetic labelled datasets to test and validate automotive perception systems using logical scenario files. Decoupling the desired sensor setup from the simulation allows reproduction and testing the same situation under different sensor setups and conditions. We implement the methodology to validate three 3D LiDAR-based object detectors in three different sensor setups. The generated sample dataset will be made public here.
|
|
11:15-12:30, Paper MoBT3.17 | Add to My Program |
The DLR Urban Traffic Dataset (DLR-UT): A Comprehensive Traffic Dataset from an Urban Research Intersection |
|
Schicktanz, Clemens | German Aerospace Center (DLR) Institute of Transportation System |
Klitzke, Lars | German Aerospace Center (DLR) Institute of Transportation System |
Gimm, Kay | German Aerospace Center (DLR), Institute of Transportation Syste |
Rizzo, Giancarlo | German Aerospace Center (DLR) |
Liesner, Karsten | German Aerospace Center (DLR) |
Mosebach, Henning | German Aerospace Center |
Knake-Langhorst, Sascha | DLR |
Keywords: Automotive Datasets
Abstract: Current trajectory datasets of traffic participants often lack detailed environmental information, which is crucial for developing effective data-driven methods for future mobility solutions. To address this gap, we introduce the comprehensive DLR Urban Traffic dataset. The dataset includes 32,296 trajectories of traffic participants, along with traffic light data, local weather data, air quality data, and road condition data collected at a research intersection during a single day. A comparison with other publicly available datasets reveals that our dataset offers more comprehensive information about the traffic environment than existing alternatives. In addition, since version 1.2.0, the dataset includes metadata such as traffic volume per lane and the trajectory data in the OpenSCENARIO format, enabling data replay in simulation. An analysis of our dataset shows that trajectories of motorized road users (MRU) are available for all possible 16 routes at the intersection. Most interactions between MRU occur during unprotected left turns with oncoming traffic. However, there are also interactions between MRU and vulnerable road users, particularly during right turns. All in all, the dataset provides researchers with the resources needed to improve urban mobility solutions. Available for non-commercial use, the dataset can be directly downloaded from https://doi.org/10.5281/zenodo.14773161.
|
|
11:15-12:30, Paper MoBT3.18 | Add to My Program |
ViewpointDepth: A New Dataset for Monocular Depth Estimation under Viewpoint Shifts |
|
Pjetri, Aurel | Verizon Connect, University of Florence |
Caprasecca, Stefano | Verizon Connect |
Taccari, Leonardo | Verizon Connect |
Simoncini, Matteo | Verizon Connect |
Piñeiro Monteagudo, Henrique | Verizon Connect; University of Bologna |
Walter, Wallace | Retired |
Coimbra De Andrade, Douglas | Verizon Connect |
Sambo, Francesco | Verizon Connect |
Bagdanov, Andrew David | University of Florence |
Keywords: Automotive Datasets, 3D Scene Reconstruction Methods, Deep Learning Based Approaches
Abstract: Monocular depth estimation is a critical task for autonomous driving and many other computer vision applications. While significant progress has been made in this field, the effects of viewpoint shifts on depth estimation models remain largely underexplored. This paper introduces a novel dataset and evaluation methodology to quantify the impact of different camera positions and orientations on monocular depth estimation performance. We propose a ground truth strategy based on homography estimation and object detection, eliminating the need for expensive LIDAR sensors. We collect a diverse dataset of road scenes from multiple viewpoints and use it to assess the robustness of a modern depth estimation model to geometric shifts. After assessing the validity of our strategy on a public dataset, we provide valuable insights into the limitations of current models and highlight the importance of considering viewpoint variations in real-world applications.
|
|
11:15-12:30, Paper MoBT3.19 | Add to My Program |
Leveraging Bounding Box Annotations and Boolean Map Saliency for Traffic Light Detection in Foggy Night |
|
Tabassam, Nadra | University of Oldenburg |
Moulaeifard, Mohammad | University of Oldenburg |
Franzle, Martin | University of Oldenburg |
Fleck, Sven | Obsurver UG |
Keywords: Automotive Datasets, Deep Learning Based Approaches, Data Annotation and Labeling Techniques
Abstract: Object detection is a fundamental task in computer vision, relying heavily on bounding box (BB) annotations with ground truth labels to train deep learning models. This approach produces impactful results when the boundaries of objects are identifiable, but struggles in adverse weather conditions such as rain, fog, and snow, where object outlines are fuzzy. One such case is the detection of Traffic lights (TLs) in fog at night, as these weather conditions cause the light to scatter in different directions, creating a Halo effect. Therefore, creating BB for manual TL detection annotation is inaccurate. Dense fog makes it difficult for annotators to determine the state (such as red, green, or yellow) and exact location of TLs. Annotation tools are also not designed for blurred images and require annotators to manually adjust parameters like brightness and zoom. To address the challenges of manual annotation, a Boolean Map Saliency (BMS) method is employed to automatically generate annotations that highlight TLs, thereby improving object detection in scenarios where manual BBs are insufficient. Results based on automated BBs and manually annotated bounded boxes are compared using Faster-RCNN. For both results, the SHIFT dataset is used. Our proposed approach makes it possible to generate superior-quality BBs compared to previous approaches, along with an improved TL detection algorithm, especially on foggy nights when it is difficult for the sensors to detect the TLs.
|
|
11:15-12:30, Paper MoBT3.20 | Add to My Program |
Generating Synthetic Deviation Maps for Prior-Enhanced Vectorized HD Map Construction |
|
Xu, Haoming | University of Chinese Academy of Sciences |
Xiao, Yiyang | Institute of Computing Technology, Chinese Academy |
Li, Wei | Institute of Computing Technology, Chinese Academy of Sciences |
Hu, Yu | Institute of Computing Technology, Chinese Academy of Sciences |
Keywords: Synthetic Data Generation for Training
Abstract: High-definition (HD) maps are essential for autonomous driving, providing detailed and accurate environmental information. Recent advancements in online vectorized HD map construction have shown great promise, particularly methods that leverage existing maps as prior knowledge to improve performance. However, the robustness of these prior-enhanced methods under varying deviations between the priors and the real world remains a critical concern. This paper introduces a novel framework for generating synthetic maps, which allows for the controllable magnitude of diverse deviations, including geometric distortions, topological errors, and semantic inconsistencies, simulating real-world scenarios where prior maps may be outdated or inaccurate. Furthermore, lane group constraints are designed to avoid positional conflicts when map elements are modified. The synthesis method can overcome the time-consuming challenge of collecting real road changes. We demonstrate the utility of the synthetic deviation maps by incorporating them into the state-of-the-art prior-enhanced construction methods. The results reveal how different types and degrees of deviations affect the prediction accuracy, providing worthwhile insights into their robustness. Overall, this work contributes to a data augmentation and provides a valuable tool for developing more robust and reliable autonomous driving systems. The code is opensource and available at https://github.com/healenrens/Syn-D-maps.
|
|
MoBT4 Poster Session, Bernini Room |
Add to My Program |
Poster 1.4 >> Localisation and Mapping |
|
|
Chair: Bonnifait, Philippe | University of Technology of Compiegne |
Co-Chair: Giosan, Ion | Technical University of Cluj-Napoca |
|
11:15-12:30, Paper MoBT4.1 | Add to My Program |
PL-RAS: A Robust Localization System with Real Time Protection Level Calculation and Adaptive Kernel for Enhanced Integrity (I) |
|
Maharmeh, Elias | Valeo |
Nashashibi, Fawzi | INRIA |
Alsayed, Zayed | Valeo - VMTC |
Keywords: Sensor Fusion for Accurate Localization, Fault Detection and Isolation (FDI) and Protection Level Determination
Abstract: Uncertainty in perception tasks, such as localization, is critical for autonomous systems. Many localization systems fail to ensure that their reported uncertainties encompass the true pose. This paper addresses this issue using the integrity framework. We focus on two main aspects. First, fault-tolerant localization through qualitative evaluation. Second, quantitative estimation of error bounds using (horizontal) protection levels. We introduce PL-RAS (Protection Level-based Robust and Adaptive Solver). This solver aids robustness in non-linear least squares optimization, including factor graph-based localization systems. PL-RAS improves uncertainty awareness and enhances system integrity. It strengthens both qualitative and quantitative integrity aspects. We test the approach on urban road data collected using an acquisition vehicle at Valeo’s Creteil VMTC site. The results confirm PL-RAS’s effectiveness. In one dataset, the integrity risks are 4.0×10−4 (lateral) and 34.0×10-3 (longitudinal). In a more challenging dataset, the lateral risk becomes 3.0×10−4, while the longitudinal risk increases to 92.3×10-3. These findings demonstrate PL-RAS’s robustness in fault tolerance and protection level estimation.
|
|
11:15-12:30, Paper MoBT4.2 | Add to My Program |
LoopNet: A Multitasking Few-Shot Learning Approach for Loop Closure in Large Scale SLAM |
|
Nakshbandi, Mohammad Maher | Transilvania University of Brasov |
Sharawy, Ziad | Transilvania University of Brasov |
Grigorescu, Sorin Mihai | Transilvania University of Brasov |
Keywords: Real-Time SLAM Algorithms for Dynamic Environments
Abstract: Abstract—One of the main challenges in the Simultaneous Localization and Mapping (SLAM) loop closure problem is the recognition of previously visited places. In this work, we tackle the two main problems of real-time SLAM systems: 1) loop closure detection accuracy and 2) real-time computation constraints on the embedded hardware. Our LoopNet method is based on a multitasking variant of the classical ResNet architecture, adapted for online retraining on a dynamic visual dataset and optimized for embedded devices. The online retraining is designed using a few-shot learning approach. The architecture provides both an index into the queried visual dataset, and a measurement of the prediction quality. Moreover, by leveraging DISK (DIStinctive Keypoints) descriptors, LoopNet surpasses the limitations of handcrafted features and traditional deep learning methods, offering better performance under varying conditions. Code is available at https://github.com/RovisLab/LoopNet. Additinally, we introduce a new loop closure benchmarking dataset, coined LoopDB, which is available at https://github.com/RovisLab/LoopDB.
|
|
11:15-12:30, Paper MoBT4.3 | Add to My Program |
Semantic SLAM with Rolling-Shutter Cameras and Low-Precision INS in Outdoor Environments |
|
Zhang, Yuchen | Beijing NavInfo Technology Co., Ltd |
Fan, Miao | NavInfo Co., Ltd |
Jiao, Yi | NavInfo Co. Ltd |
Xu, Shengtong | Autohome Inc |
Liu, Xiangzeng | Xidian University |
Xiong, Haoyi | Baidu Inc |
Keywords: Real-Time SLAM Algorithms for Dynamic Environments
Abstract: Accurate localization and mapping in outdoor environments remains challenging when using consumer-grade hardware, particularly with rolling-shutter cameras and low-precision inertial navigation systems (INS). We present a novel semantic SLAM approach that leverages road elements such as lane boundaries, traffic signs, and road markings to enhance localization accuracy. Our system integrates real-time semantic feature detection with a graph optimization framework, effectively handling both rolling-shutter effects and INS drift. Using a practical hardware setup which consists of a rolling-shutter camera (3840×2160@30fps), IMU (100Hz), and wheel encoder (50Hz), we demonstrate significant improvements over existing methods. Compared to state-of-the-art approaches, our method achieves higher recall (up to 5.35%) and precision (up to 2.79%) in semantic element detection, while maintaining mean relative error (MRE) within 10cm and mean absolute error (MAE) around 1m. Extensive experiments in diverse urban environments demonstrate the robust performance of our system under varying lighting conditions and complex traffic scenarios, making it particularly suitable for autonomous driving applications. The proposed approach provides a practical solution for high-precision localization using affordable hardware, bridging the gap between consumer-grade sensors and production-level performance requirements.
|
|
11:15-12:30, Paper MoBT4.4 | Add to My Program |
BALO: A Novel Point to Plane BAlanced Lidar Odometry |
|
Azzini, Matteo | INRIA |
Malis, Ezio | INRIA |
Martinet, Philippe | INRIA |
|
|
11:15-12:30, Paper MoBT4.5 | Add to My Program |
SD++: Enhancing Standard Definition Maps by Incorporating Road Knowledge Using LLMs |
|
Diwanji, Hitvarth | University of California, San Diego |
Liao, Jing-Yan | University of California San Diego |
Tumu, Akshar | University of California, San Diego |
Christensen, Henrik | UC San Diego |
Vazquez-Chanlatte, Marcell | University of California, Berkeley |
Tsuchiya, Chikao | Nissan North America |
Keywords: Crowdsourced Localization and Mapping, Geometric vs. Semantic Mapping
Abstract: High-definition maps (HD maps) are detailed and informative maps capturing lane centerlines and road elements. Although very useful for autonomous driving, HD maps are costly to build and maintain. Furthermore, access to these high-quality maps is usually limited to the firms that build them. On the other hand, standard definition (SD) maps provide road centerlines with an accuracy of a few meters. In this paper, we explore the possibility of enhancing SD maps by incorporating information from road manuals using LLMs. We develop SD++, an end-to-end pipeline to enhance SD maps with location-dependent road information obtained from a road manual. We suggest and compare several ways of using LLMs for such a task. Furthermore, we show the generalization ability of SD++ by showing results from both California and Japan.
|
|
11:15-12:30, Paper MoBT4.6 | Add to My Program |
Metro-Rail-SLAM for Automated Inspection Vehicles with Path-Aided Constraints |
|
Feng, Feng | Wuhan University of Technology |
Meng, Jie | Wuhan University of Technology |
Zhang, Jianan | Wuhan University of Technology |
Xiao, Hanbiao | Wuhan University of Technology |
Hu, Zhaozheng | Wuhan University of Technology |
Keywords: Continuous Localization Solutions, Sensor Fusion for Accurate Localization, 3D Scene Reconstruction Methods
Abstract: Automated vehicles mounted on the rail are widely applied for metro infrastructure inspection and maintenance. However, the long-distance enclosed tunnel environment and severe feature degradation pose significant challenges to accurate vehicle localization. In this paper, we proposed a Metro-Rail-SLAM for automated inspection vehicles. The design path constraints are modeled as a path-aided likelihood model (PA-LM) by applying Kernel Density Estimation from the designed path of construction drawing. Besides, we detected emergency-bays from LiDAR as landmarks. The PA-LM and landmarks together with the odometry are integrated into a particle filter to develop a fast, accurate, and robust SLAM. The proposed models and method were validated with a prototyped inspection robot in an actual metro construction scenario located in Chengdu. Experimental results demonstrate that the proposed Metro-Rail-SLAM has good capability of localization and mapping.
|
|
11:15-12:30, Paper MoBT4.7 | Add to My Program |
Advancements in Enhancing GNSS RTK Positioning Accuracy and Integrity for Automated Driving |
|
Schön, Steffen | Leibniz University Hannover |
Baasch, Kai-Niklas | Leibniz University Hannover |
Karimidoona, Ali | Leibniz University Hannover |
Kulemann, Dennis | Institut Für Erdmessung Leibniz Universität Hannover |
Ruwisch, Fabian | Leibniz Universität Hannover |
Schaper, Anat | Leibniz Universität Hannover |
Su, Jingyao | Leibniz University Hannover |
Keywords: Fault Detection and Isolation (FDI) and Protection Level Determination, Cooperative Perception and Localization Techniques, Sensor Fusion for Accurate Localization
Abstract: For safety-critical applications like autonomous driving, high trust in the navigation solution is essential, primarily measured by integrity. Multipath and further propagation specific errors in GNSS observations present significant challenges, as they can only be partially corrected. To ensure high integrity in urban navigation, it is crucial to understand the signal propagation mechanisms and potential error sources in these complex environments. Our group has made recent progress in this area, conducting various experiments in urban areas to analyze GNSS positioning performance. Using ray tracing, GNSS channel models, and 3D city models, the signal propagation conditions can be classified and errors quantified. We create GNSS Feature Maps to analyse the spatio-temporal similarity of the geometry-related error features and developed a Feature Map aided robust GNSS RTK algorithm, yielding improved accuracy and fulfilling our newly defined alert limits for German roads. We show how collaborative positioning can further improve this situation.
|
|
11:15-12:30, Paper MoBT4.8 | Add to My Program |
A Concise Survey on Lane Topology Reasoning for HD Mapping |
|
Yao, Yi | NavInfo Co., Ltd |
Fan, Miao | NavInfo Co., Ltd |
Xu, Shengtong | Autohome Inc |
Xiong, Haoyi | Baidu Inc |
Liu, Xiangzeng | Xidian University |
Hu, Wenbo | Hefei University of Technology |
Huang, Wenbing | Renmin University of China |
Keywords: Crowdsourced Localization and Mapping, Geometric vs. Semantic Mapping
Abstract: Lane topology reasoning techniques play a crucial role in high-definition (HD) mapping and autonomous driving applications. While recent years have witnessed significant advances in this field, there has been limited effort to consolidate these works into a comprehensive overview. This survey systematically reviews the evolution and current state of lane topology reasoning methods, categorizing them into three major paradigms: procedural modeling-based methods, aerial imagery-based methods, and onboard sensors-based methods. We analyze the progression from early rule-based approaches to modern learning-based solutions utilizing transformers, graph neural networks (GNNs), and other deep learning architectures. The paper examines standardized evaluation metrics, including road-level measures (APLS and TLTS score), and lane-level metrics (DET and TOP score), along with performance comparisons on benchmark datasets such as OpenLane-V2. We identify key technical challenges, including dataset availability and model efficiency, and outline promising directions for future research. This comprehensive review provides researchers and practitioners with insights into the theoretical frameworks, practical implementations, and emerging trends in lane topology reasoning for HD mapping applications.
|
|
11:15-12:30, Paper MoBT4.9 | Add to My Program |
Range and Bird's Eye View Fused Cross-Modal Visual Place Recognition |
|
Peng, Jianyi | Tongji University |
Lu, Fan | Tongji University |
Li, Bin | Tongji University |
Huang, Yuan | Beijing Institute of Control Engineering |
Qu, Sanqing | Tongji University |
Chen, Guang | Tongji University |
Keywords: Global vs. Local Localization Techniques, Map-Matching Techniques, Lidar-Based Environment Mapping
Abstract: Image-to-point cloud cross-modal Visual Place Recognition (VPR) is a challenging task where the query is an RGB image, and the database samples are LiDAR point clouds. Compared to single-modal VPR, this approach benefits from the widespread availability of RGB cameras and the robustness of point clouds in providing accurate spatial geometry and distance information. However, current methods rely on intermediate modalities that capture either the vertical or horizontal field of view, limiting their ability to fully exploit the complementary information from both sensors. In this work, we propose an innovative initial retrieval + re-rank method that effectively combines information from range (or RGB) images and Bird's Eye View (BEV) images. Our approach relies solely on a computationally efficient global descriptor similarity search process to achieve re-ranking. Additionally, we introduce a novel similarity label supervision technique to maximize the utility of limited training data. Specifically, we employ points average distance to approximate appearance similarity and incorporate an adaptive margin, based on similarity differences, into the vanilla triplet loss. Experimental results on the KITTI dataset demonstrate that our method significantly outperforms state-of-the-art approaches.
|
|
11:15-12:30, Paper MoBT4.10 | Add to My Program |
Infrastructure-Based Smart Positioning System for Automated Shuttles Using 3D Object Detection |
|
Araluce, Javier | TECNALIA Research & Innovation |
Justo, Alberto | TECNALIA Research & Innovation, Basque Research and Technology A |
Rodriguez-Arozamena, Mario | TECNALIA Research & Innovation, Basque Research and Technology A |
Sarabia, Joseba | University of the Basque Country; Tecnalia, Basque Research And |
Matute, Jose | Virginia Tech |
Diaz Briceño, Sergio Enrique | Tecnalia, Basque Research and Technology Alliance |
Keywords: Static and Dynamic Object Detection Algorithms, Vehicle-to-Infrastructure (V2I) Communication, Cooperative Perception and Localization Techniques
Abstract: Automated vehicles need high positioning accuracy to execute driving maneuver effectively. This accuracy is crucial for the viability of dependent systems such as planning, decision-making, and perception. However, achieving precise localization typically necessitates expensive onboard sensors, that increase vehicle costs, complicate maintenance, and pose significant scalability challenges for large fleets of trucks or buses. To address these issues without compromising vehicle interoperability, this work proposes an infrastructure-based positioning system for critical areas. The system utilizes off-board sensors to collect data from a shuttle moving on a test track. The data collection is automated through a custom-designed labeling tool, eliminating the need for manual tagging. A deep learning model based on 3D object detection has been trained to localize the vehicle accurately during normal operation. Rigorous assessments have been conducted to evaluate localization performance, achieving an Average Trajectory Error of 0.17 m for position, and 9.4 deg for rotation. To demonstrate real-world applicability, a complete architecture based on ROS2 was developed and tested with actual data, confirming its functionality in practical scenarios.
|
|
11:15-12:30, Paper MoBT4.11 | Add to My Program |
Offline Map Updating and Validation for Autonomous Driving Using Crowdsourced Data |
|
Moawad, Mark | Hamburg University of Technology |
Stührenberg, Jan | Hamburg University of Technology |
Tandon, Aditya | Hamburg University of Technology |
Abdulaaty, Omar AbdelAziz | IAV GmbH |
Mendoza, Ricardo Carillo | IAV GmbH |
Hussein, Ahmed | IAV GmbH |
Smarsly, Kay | Hamburg University of Technology |
Keywords: Crowdsourced Localization and Mapping, Real-Time SLAM Algorithms for Dynamic Environments, Advanced Multisensory Data Fusion Algorithms
Abstract: Autonomous driving promises safer and more comfortable transportation with less traffic congestion than human driving. Autonomous driving can be achieved using landmark-based maps, which allow for precise localization and collision-free path planning. Therefore, it is essential to keep the maps updated and validated. Traditional approaches towards map updating and validation often fail to robustly keep pace with environmental changes, causing localization errors. Current research addresses the map updating and validation problem using either graph-based methods or feature-based methods online, i.e. running while the vehicles are traversing the environment, which is computationally demanding and unscalable. In this paper, an offline map updating and validation framework is presented using crowdsourced data, which is abundantly available and ubiquitous. To integrate multiple observations and improve map accuracy and reliability, the framework couples data fusion techniques, including the density-based spatial clustering of applications with noise (DBSCAN) algorithm, the K-D tree data structure, and Dempster-Shafer theory. The framework is validated through multiple test scenarios, including adding new landmarks and removing deleted ones. As a result, the map updating and validation framework effectively integrates crowdsourced data, enhancing the accuracy and reliability of map updating and validation. The findings highlight the potential of crowdsourced data to improve map validation processes in autonomous driving.
|
|
11:15-12:30, Paper MoBT4.12 | Add to My Program |
A Chef`s KISS - Utilizing Semantic Information in Both ICP and SLAM Framework |
|
Ochs, Sven | FZI Research Center for Information Technology |
Heinrich, Marc | FZI Research Center for Information Technology |
Schörner, Philip | FZI Research Center for Information Technology |
Zofka, Marc René | FZI Research Center for Information Technology |
Zöllner, J. Marius | FZI Research Center for Information Technology; KIT Karlsruhe In |
Keywords: Real-Time SLAM Algorithms for Dynamic Environments, Geometric vs. Semantic Mapping, Global vs. Local Localization Techniques
Abstract: For utilizing autonomous vehicle in urban areas a reliable localization is needed. Especially when HD maps are used, a precise and repeatable method has to be chosen. Therefore accurate map generation but also re-localization against these maps is necessary. Due to best 3D reconstruction of the surrounding, LiDAR has become a reliable modality for localization. The latest LiDAR odometry estimation are based on iterative closest point (ICP) approaches, namely KISS-ICP cite{vizzo_kiss-icp_2023} and SAGE-ICP cite{cui_sage-icp_2023}. We extend the capabilities of KISS-ICP by incorporating semantic information into the point alignment process using a generalizable approach with minimal parameter tuning. This enhancement allows us to surpass KISS-ICP in terms of absolute trajectory error (ATE), the primary metric for map accuracy. Additionally, we improve the Cartographer mapping framework to handle semantic information. Cartographer facilitates loop closure detection over larger areas, mitigating odometry drift and further enhancing ATE accuracy. By integrating semantic information into the mapping process, we enable the filtering of specific classes, such as parked vehicles, from the resulting map. This filtering improves relocalization quality by addressing temporal changes, such as vehicles being moved.
|
|
11:15-12:30, Paper MoBT4.13 | Add to My Program |
Onboard Train Localization Assisted by Surrounding Structure Identification Using One-Dimensional LiDAR Sensor |
|
Nagai, Kensuke | The University of Tokyo |
Chang, Haw-Shyang | The University of Tokyo |
Ohnishi, Wataru | The University of Tokyo |
Koseki, Takafumi | The University of Tokyo |
Setoguchi, Yusuke | Nippon Signal Co., Ltd |
Kiyosawa, Daichi | Nippon Signal Co., Ltd |
Morita, Shunji | Nippon Signal Co., Ltd |
Tanaka, Kazuhiro | Nippon Signal Co., Ltd |
Keywords: Continuous Localization Solutions, Sensor Fusion for Accurate Localization, Lidar-Based Environment Mapping
Abstract: Train localization is a crucial technology in the railway industry, with increasing demand for cost-effective methods that eliminate reliance on ground-based equipment to reduce both costs and maintenance requirements. In this study, we propose a versatile train localization method applicable across diverse environments, including high-speed railways, conventional lines, urban settings, and rural areas. By integrating a high-speed, cost-effective one-dimensional LiDAR sensor with GNSS, MEMS IMU, and a Tachometer generator, the system can effectively recognize the surrounding environment and accurately determine the train's position. The proposed method ensures low-cost and high-accuracy train localization even in areas with dense surrounding structures or open-sky environments. Experimental results conducted on operational railway lines demonstrated a high success rate of 97 % in recognizing the surrounding environment and detecting train location using this approach. Moreover, the accuracy of train localization achieved through this method was found to be comparable to that of a loosely-coupled GNSS approach.
|
|
11:15-12:30, Paper MoBT4.14 | Add to My Program |
Learning Explicit Uncertainty Estimation in Cross Modality Localization |
|
Schütte, Stefan | TU Dortmund University |
Bertram, Torsten | Technische Universität Dortmund |
Keywords: Map-Matching Techniques, Deep Learning Based Approaches
Abstract: Metric localization of automated vehicles using exteroceptive sensors involves finding reliable spatial features in both the sensor data and the map. In real world scenarios, methods have to deal with unknown and changing environments and noisy sensor measurements, making feature selection considerably harder. If the map is created using a different sensor modality, localization methods also have to deal with the characteristics of the available sensor. Machine learning methods promise a solution to these problems by extracting the same features from maps created from different sensors. In this work, we compare an approach for learning based cross modality localization with a classical method on different types of maps. Furthermore, we enhance the learned model by estimating its uncertainty directly from the measurement data.
|
|
11:15-12:30, Paper MoBT4.15 | Add to My Program |
Deep Deterministic Policy Gradient Method for Autonomous Vehicle Maneuvering through Multimodal LiDAR and RADAR Sensor Fusion |
|
Lodhi, Shikhar Singh | Indian Institute of Technology, Roorkee |
Kumar, Neetesh | Indian Institute of Technology-Roorkee |
Sharma, Teena | University of Quebec at Chicoutimi, Saguenay, QC |
Keywords: Sensor Fusion for Accurate Localization, Continuous Localization Solutions, Real-Time SLAM Algorithms for Dynamic Environments
Abstract: Autonomous Vehicle (AV) driving involves complex maneuvers, often constrained by poor environmental perception. While Deep Reinforcement Learning (DRL) and advanced sensor technologies like LiDAR and RADAR have improved AV performance, high dimensional sensor data poses challenges in critical tasks like lane changes and turns. To overcome these challenges, we propose a multimodal fusion of LiDAR and RADAR sensors with the Deep Deterministic Policy Gradient (DDPG) algorithm. Our approach preprocesses sensor data into low dimensional representations, enhancing the RL agent's environmental perception and decision making. This fusion, combined with Temporal Difference (TD) updates in the actor-critic network, improves maneuvering efficiency in the CARLA simulator. Results show a 100% task completion rate with adequate speed and time, achieving a 25% higher peak reward compared to state-of-the-art methods. The simulation videos for the same are available here url{https://www.youtube.com/playlist?list=PLnWGKVuAZgq1rdm 21CEKW-S-Nr78qmkSX}.
|
|
11:15-12:30, Paper MoBT4.16 | Add to My Program |
Enhanced LIO-Based Localization System with Online Map Update for Robust Mining Tunnel Operations |
|
Zhang, Zufeng | Department of Automation, Tsinghua University, Beijing, |
Yin, Jialun | Suzhou Automobile Research Institute, Tsinghua University |
Tao, Qianwen | Wuhan University of Technology |
Chen, Feng | Tsinghua University |
Zhang, Xuefeng | Institute for Artificial Intelligence, Peking University, Beijin |
Keywords: Sensor Fusion for Accurate Localization, Continuous Localization Solutions
Abstract: Coal mines, crucial for energy in the face of rapid economic growth, grapple with challenges like unsafe manual mining in narrow tunnels, which significantly impact efficiency and safety. In response, recent research has intensified the development of unmanned mining technologies, with robust localization emerging as a foundational requirement for autonomous navigation. However, localization in underground mines remains difficult due to sparse and degraded geometric features, compounded by dynamic environmental changes such as route adjustments and structural modifications during mining operations. To address these challenges, we propose an integrated localization system that emphasizes robust state estimation, environment reconstruction, and long-term adaptability. The system combines multi-LiDAR sensing, with complementary field-of-view and resolution characteristics, and incorporates contextual infrastructure features commonly encountered in underground environments to enhance localization stability. Furthermore, we introduce a map inconsistency detection and correction module, which enables the system to adapt to long-term environmental changes and maintain map relevance over time. The efficacy of our proposed system is rigorously evaluated across various mine tunnel environments over an extended duration, affirming its reliability and performance.
|
|
11:15-12:30, Paper MoBT4.17 | Add to My Program |
Lidar Pole Detection Training Using Vector Maps for Localization (I) |
|
Noizet, Maxime | Université De Technologie De Compiègne |
Xu, Philippe | ENSTA, Institut Polytechnique De Paris |
Bonnifait, Philippe | University of Technology of Compiegne |
Keywords: Integration Methods for HD Maps and Onboard Sensors, Sensor Fusion for Accurate Localization, Map-Matching Techniques
Abstract: Autonomous navigation requires accurate and reliable localization. In urban environments, infrastructure such as buildings and bridges disrupts Global Navigation Satellite Systems (GNSS), which requires the implementation of robust perception systems combined with inertial navigation. Roadside poles like traffic signs or light poles can serve as stable landmarks for map-based localization. When georeferenced in high-definition vector maps, these features enable reliable localization through detection pipelines and data association methods. While lidar captures their 3D geometry, distinguishing mapped poles in raw point clouds remains challenging. To train pole detectors tailored to the specific map used, we propose an automatic annotation framework that integrates lidar data, a vector map, and offline semantic segmentation to generate precise labeled data. By combining annotated pole clusters from the map with semantic segmentation, annotation errors can be minimized. This enables the training of a map-specific classifier optimized to detect mapped poles while filtering out irrelevant structures. It eliminates the need for manual labeling and ensures adaptability to the map used for online localization. Thanks to real data acquired in real-world urban scenarios, we show that this approach enhances significantly localization accuracy.
|
|
11:15-12:30, Paper MoBT4.18 | Add to My Program |
Distance Estimation in Outdoor Driving Environments Using Phase-Only Correlation Method with Event Cameras |
|
Kobayashi, Masataka | Nagoya University |
Shiba, Shintaro | Woven by Toyota |
Kong, Quan | Woven by Toyota, Inc |
Kobori, Norimasa | Woven by Toyota Inc |
Shimizu, Tsukasa | Toyota Motor Corporation |
Lu, Shan | Nagoya University |
Yamazato, Takaya | Nagoya University |
Keywords: Continuous Localization Solutions, V2X Communication Protocols and Standards, Vehicle-to-Infrastructure (V2I) Communication
Abstract: This study focuses on event cameras, exploring their potential among various sensor technologies. Event cameras possess characteristics such as high dynamic range, low latency, and high temporal resolution, and they can also leverage visible light communication. This enables high visibility in low-light and backlit environments, as well as excellent performance in detecting pedestrian movements and acquiring traffic information between traffic lights and vehicles. These characteristics are particularly beneficial for autonomous driving systems. Furthermore, if distance estimation functionality can be integrated into event cameras, they can serve as a multi-functional sensor for autonomous vehicles, providing significant cost efficiency benefits. In this study, we achieved distance estimation based on triangulation using an event camera and two points on an LED bar installed along a road. Furthermore, by employing the phase- only correlation method, we achieved sub-pixel precision in estimating the distance between two points on the LED bar, enabling even more accurate distance estimation. This approach performed monocular distance estimation in outdoor driving environments at distances ranging from 20 to 60 meters, achieving a success rate of over 90 % with errors of less than 0.5 meters. We aim to implement position estimation using our distance estimation technology. High-precision measurements will determine the vehicle’s position relative to ITS smart poles, enabling real-time localization and optimal route selection. This technology will contribute to smart urban transportation.
|
|
11:15-12:30, Paper MoBT4.19 | Add to My Program |
Temperature-Dependent Baro-Aided INS for GNSS-Denied Intelligent Vehicle Applications |
|
Silva, Felipe | Federal University of Lavras |
Hernandez Villalobos, Guillermo | Technology Innovation Institute |
Souza Junior, Cristino | Technology Innovation Institute |
Keywords: Sensor Fusion for Accurate Localization, UAV Sensor Integration, Global vs. Local Localization Techniques
Abstract: Sensor fusion is of paramount importance for Intelligent Autonomous Vehicles (IAVs) nowadays. Global Navigation Satellite Systems (GNSSs), in particular, are present in most strategic applications, bounding the drift of Inertial Navigation Systems (INSs). When the former are not available, either due to signal blockage or deliberate jamming/spoofing, barometers are sensors that can maintain INS vertical channel accuracy in the long-term. In most baro-aided INS integrations currently seen in the literature, however, a standard constant-temperature gradient is assumed for the barometric atmospheric pressure model, which might provide less-than-optimal performance in environments subject to extreme temperature variations (such as deserts). As main contribution of this paper, we propose an improved Tightly-Coupled (TC) Extended Kalman Filter (EKF)-based baro-INS integration that employs actual Outside Air Temperature (OAT) measurements from an external temperature probe. Results from experimental tests conducted in a desert area show that the proposed approach outperforms the traditional ones, particularly when the GNSS is unavailable for long periods of time, and the IAV is subject to large altitude excursions.
|
|
MoC1 Regular Session, Plenary Room |
Add to My Program |
Oral 2 |
|
|
Chair: Martinet, Philippe | INRIA |
Co-Chair: Petrovai, Andra | Technical University of Cluj-Napoca |
|
13:30-13:48, Paper MoC1.1 | Add to My Program |
DOC-Depth: A Novel Approach for Dense Depth Ground Truth Generation |
|
de Moreau, Simon | Mines Paris - PSL & Valeo |
Corsia, Mathias | Exwayz |
Bouchiba, Hassan | Exwayz |
Almehio, Yasser | Valeo |
Bursuc, Andrei | Valeo |
El-Idrissi, Hafid | Valeo |
Moutarde, Fabien | MINES Paris - PSL |
Keywords: Data Annotation and Labeling Techniques, Static and Dynamic Object Detection Algorithms, 3D Scene Reconstruction Methods
Abstract: Accurate depth information is essential for many computer vision applications. Yet, no available dataset recording method allows for fully dense accurate depth estimation in a large scale dynamic environment. In this paper, we introduce DOC-Depth, a novel, efficient and easy-to-deploy approach for dense depth generation from any LiDAR sensor. After reconstructing consistent dense 3D environment using LiDAR odometry, we address dynamic objects occlusions automatically thanks to DOC, our state-of-the art dynamic object classification method. Additionally, DOC-Depth is fast and scalable, allowing for the creation of unbounded datasets in terms of size and time. We demonstrate the effectiveness of our approach on the KITTI dataset, improving its density from 16.1% to 71.2% and release this new fully dense depth annotation, to facilitate future research in the domain. We also showcase results using various LiDAR sensors and in multiple environments. All software components are publicly available for the research community at https://simondemoreau.github.io/DOC-Depth/
|
|
13:48-14:06, Paper MoC1.2 | Add to My Program |
LiDPM: Rethinking Point Diffusion for Lidar Scene Completion |
|
Martyniuk, Tetiana | Valeo.ai, Inria |
Puy, Gilles | Valeo.ai |
Boulch, Alexandre | Valeo.ai |
Marlet, Renaud | Valeo |
De Charette, Raoul | INRIA |
Keywords: 3D Scene Reconstruction Methods
Abstract: Training diffusion models that work directly on lidar points at the scale of outdoor scenes is challenging due to the difficulty of generating fine-grained details from white noise over a broad field of view. The latest works addressing scene completion with diffusion models tackle this problem by reformulating the original DDPM as a local diffusion process. It contrasts with the common practice of operating at the level of objects, where vanilla DDPMs are currently used. In this work, we close the gap between these two lines of work. We identify approximations in the local diffusion formulation, show that they are not required to operate at the scene level, and that a vanilla DDPM with a well-chosen starting point is enough for completion. Finally, we demonstrate that our method, LiDPM, leads to better results in scene completion on SemanticKITTI. The project page is https://astra-vision.github.io/lidpm.
|
|
14:06-14:24, Paper MoC1.3 | Add to My Program |
UAVD-Mamba: Deformable Token Fusion Vision Mamba for Multimodal UAV Detection |
|
Li, Wei | Hunan University |
Tang, Jiaman | Hunan University |
Li, Yang | Hunan University, College of Mechanical and Vehicle Engineering |
Xia, Beihao | Huazhong University of Science and Technology |
Tan, Ligang | Hunan University |
Qin, Hongmao | Hunan University |
Keywords: Remote Sensing Techniques for UAVs
Abstract: Unmanned Aerial Vehicle (UAV) object detection has been widely used in traffic management, agriculture, emergency rescue, etc. However, it faces significant challenges, including occlusions, small object sizes, and irregular shapes. These challenges highlight the necessity for a robust and efficient multimodal UAV object detection method. Mamba has demonstrated considerable potential in multimodal image fusion. Leveraging this, we propose UAVD-Mamba, a multimodal UAV object detection framework based on Mamba architectures. To improve geometric adaptability, we propose the Deformable Token Mamba Block (DTMB) to generate deformable tokens by incorporating adaptive patches from deformable convolutions alongside normal patches from normal convolutions, which serve as the inputs to the Mamba Block. To optimize the multimodal feature complementarity, we design two separate DTMBs for the RGB and infrared (IR) modalities, with the outputs from both DTMBs integrated into the Mamba Block for feature extraction and into the Fusion Mamba Block for feature fusion. Additionally, to improve multiscale object detection, especially for small objects, we stack four DTMBs at different scales to produce multiscale feature representations, which are then sent to the Detection Neck for Mamba (DNM). The DNM module, inspired by the YOLO series, includes modifications to the SPPF and C3K2 of YOLOv11 to better handle the multiscale features. In particular, we employ crossenhanced spatial attention before the DTMB and cross-channel attention after the Fusion Mamba Block to extract more discriminative features. Experimental results on the DroneVehicle dataset show that our method outperforms the baseline OAFA method by 3.6% in the mAP metric. Codes will be released at https://github.com/Great
|
|
14:24-14:42, Paper MoC1.4 | Add to My Program |
Intersection Safety Modeling Using Semantic Scene Graph and Graph Neural Network |
|
Sarkar, Abhijit | Virginia Tech |
Sonth, Akash | Virginia Tech |
Abbott, Amos | Virginia Tech |
Keywords: Data Augmentation Techniques Using Neural Networks, Decision Making, Vulnerable Road User Protection Strategies
Abstract: Traffic intersections are critical zones where vehicle and pedestrian interactions significantly impact road safety. This study presents a novel graph-based approach to model and analyze intersection traffic dynamics, leveraging Graph Neural Networks (GNNs) for risk assessment. By representing traffic participants and road infrastructure as a structured graph, we capture spatial-temporal relationships that influence crash likelihood. Using real-world intersection video data, we construct semantic scene graphs to encode actor interactions and road topology, enabling a data-driven understanding of risk factors. Two GNN models, TransformerConv and GINEConv, are employed to assess safety risks, where TransformerConv captures dynamic interactions through adaptive attention weighting, and GINEConv models structured dependencies within the intersection network. Our findings demonstrate that this framework can effectively classify high-risk scenarios, threat assessment of each actors (node), characterize their interaction (edge), and provide near real time safety analysis with 79.8% accuracy. This provides a scalable method for proactive intersection safety monitoring.
|
|
14:42-15:00, Paper MoC1.5 | Add to My Program |
SpikingRTNH: Spiking Neural Network for 4D Radar Object Detection |
|
Paek, Dong-Hee | Korea Advanced Institute of Science and Technology |
Kong, Seung-Hyun | Korea Advanced Institute for Science and Technology |
Keywords: Radar Object Detection and Tracking, Static and Dynamic Object Detection Algorithms, Deep Learning Based Approaches
Abstract: Recently, 4D Radar has emerged as a crucial sensor for 3D object detection in autonomous vehicles, offering both stable perception in adverse weather and high-density point clouds for object shape recognition. However, processing such high-density data demands substantial computational resources and energy consumption. We propose SpikingRTNH, the first spiking neural network (SNN) for 3D object detection using 4D Radar data. By replacing conventional ReLU activation functions with leaky integrate-and-fire (LIF) spiking neurons, SpikingRTNH achieves significant energy efficiency gains. Furthermore, inspired by human cognitive processes, we introduce biological top-down inference (BTI), which processes point clouds sequentially from higher to lower densities. This approach effectively utilizes points with lower noise and higher importance for detection. Experiments on K-Radar dataset demonstrate that SpikingRTNH with BTI significantly reduces energy consumption by 78% while achieving comparable detection performance to its ANN counterpart (51.1% AP 3D, 57.0% AP BEV). These results establish the viability of SNNs for energy-efficient 4D Radar-based object detection in autonomous driving systems. All codes are available at https://github.com/kaist-avelab/k-radar.
|
|
MoDT1 Poster Session, Caravaggio Room |
Add to My Program |
Poster 2.1 >> Planning, Trajectory Prediction & Motion Forecasting |
|
|
Chair: Stevanovic, Aleksandar | University of Pittsburgh |
Co-Chair: Atoui, Hussam | Valeo |
|
15:00-16:15, Paper MoDT1.1 | Add to My Program |
A Roadmap towards Dynamic Conflict Management for Autonomous Traffic Agents |
|
Schwammberger, Maike | Karlsruhe Institute of Technology |
Keywords: Multi-Agent Coordination Strategies, Decision Making, Trust and Acceptance of Autonomous Technologies
Abstract: Semi-automated vehicles and driver assistance systems promise to increase aspects like traffic safety, more sustainable transportation systems and more road comfort. For a desirable future with the autonomous traffic agents (ATAs) that steer these automated mobility systems, it is of paramount importance to draw our attention to dynamic consistency management. A dynamic inconsistency is a run-time conflict, where an agent cannot choose an action without violating existing traffic rules or central safety goals. We suggest a step-wise engineering methodology to enable ATAs to cope with such run-time conflicts. For this, known run-time conflict must first be identified and formalised. We propose to sort similar conflicts into conflict clusters. For each conflict cluster, a conflict resolution strategy must be derived. Finally, we discuss explainability as a means to justify a conflict resolution strategy to involved stakeholders.
|
|
15:00-16:15, Paper MoDT1.2 | Add to My Program |
Boundary-Guided Trajectory Prediction for Road Aware and Physically Feasible Autonomous Driving |
|
Abouelazm, Ahmed | FZI Research Center for Information Technology |
Liu, Mianzhi | Karlsruhe Institude for Technology |
Hubschneider, Christian | FZI Research Center for Information Technology |
Wu, Yin | Karlsruhe Institute of Technology |
Slieter, Daniel | CARIAD SE |
Zöllner, J. Marius | FZI Research Center for Information Technology; KIT Karlsruhe In |
Keywords: Predictive Trajectory Models and Motion Forecasting, Motion Forecasting, Trust and Acceptance of Autonomous Technologies
Abstract: Accurate trajectory prediction is essential for safe and efficient autonomous driving. While deep learning models have improved performance, challenges remain in preventing off-road predictions and ensuring kinematic feasibility. Existing methods incorporate road-awareness modules and enforce kinematic constraints but lack plausibility guarantees and often introduce trade-offs in complexity and flexibility. This paper proposes a novel framework that formulates trajectory prediction as a constrained regression guided by permissible driving directions and their boundaries. Using the agent’s current state and an HD map, our approach defines the valid boundaries and ensures on-road predictions by training the network to learn superimposed paths between left and right boundary polylines. To ensure feasibility, the model predicts acceleration profiles that determine the vehicle’s travel distance along these paths while adhering to kinematic constraints. We evaluate our approach on the Argoverse-2 dataset against the HPTR baseline. Our approach shows a slight decrease in benchmark metrics compared to HPTR but notably improves final displacement error and eliminates infeasible trajectories. Moreover, the proposed approach has a superior generalization to less prevalent maneuvers and unseen out-of-distribution scenarios, reducing the off-road rate under adversarial attacks from 66% to just 1%. These results highlight the effectiveness of our approach in generating feasible and robust predictions.
|
|
15:00-16:15, Paper MoDT1.3 | Add to My Program |
Human-Aided Trajectory Planning for Automated Vehicles through Teleoperation and Arbitration Graphs |
|
Le Large, Nick | KIT |
Brecht, David | Technical University of Munich |
Poh, Willi | Karlsruhe Institute of Technology |
Pauls, Jan-Hendrik | Karlsruhe Institute of Technology (KIT) |
Lauer, Martin | Karlsruher Institut Für Technologie |
Diermeyer, Frank | Technische Universität München |
Keywords: Teleoperation Control Systems for Vehicles, Decision Making, Motion Planning Algorithms for Autonomous Vehicles
Abstract: Teleoperation enables remote human support of automated vehicles in scenarios where the automation is not able to find an appropriate solution. Remote assistance concepts, where operators provide discrete inputs to aid specific automation modules like planning, is gaining interest due to its reduced workload on the human remote operator and improved safety. However, these concepts are challenging to implement and maintain due to their deep integration and interaction with the automated driving system. In this paper, we propose a solution to facilitate the implementation of remote assistance concepts that intervene on planning level and extend the operational design domain of the vehicle at runtime. Using arbitration graphs, a modular decision-making framework, we integrate remote assistance into an existing automated driving system without modifying the original software components. Our simulative implementation demonstrates this approach in two use cases, allowing operators to adjust planner constraints and enable trajectory generation beyond nominal operational design domains.
|
|
15:00-16:15, Paper MoDT1.4 | Add to My Program |
A Human-Like Trajectory Learning Approach Fusing Unstructured Scene Feature Extraction with Predictive Goal Point Guidance |
|
Chen, Sien | Beijing Institute of Technology |
Zhao, Lifei | Beijing Institute of Technology |
Li, Shihao | Beijing Institute of Technology |
Zhang, Xiao | Beijing Institute of Technology |
Wang, Boyang | Beijing Institute of Technology |
Liu, Haiou | Beijing Institute of Technology |
Keywords: Motion Planning Algorithms for Autonomous Vehicles
Abstract: The essence of human-like trajectory learning is to construct correspondences between scene elements and temporal trajectory points. Extracting key scene features and setting proper guidance during the learning process are crucial to improving the accuracy of human-like trajectory learning. Therefore, this paper proposes a graph feature extraction method for unstructured scene elements combined with a learning-based two-stage trajectory planner for human-like trajectory generation. The construction of the graph structure considers environmental, trajectory, and waypoint features, with environmental features specifically constructed through pixel clustering and motion compensation to enhance efficiency. In the first stage of the dual-phase trajectory planning, feature extraction is performed using Spatial-Temporal Graph Convolutional Networks (ST-GCN), followed by proposal trajectory generation with a sequence to sequence(Seq2Seq) network. In the second stage, the proposed trajectory from the first stage serves as the input, with predicted goal points obtained through the Multilayer Perceptron(MLP) network. The final trajectory is then generated by fusing graph and guidance features. The results demonstrate that the proposed scene graph structure effectively reduces the complexity of the learning network, thereby improving algorithm efficiency. Additionally, heatmap-guided features, jointly generated with the learned predicted goal points and the regularization method, effectively guide trajectory generation and improve the accuracy of human-like trajectory generation.
|
|
15:00-16:15, Paper MoDT1.5 | Add to My Program |
LTMSformer: A Local Trend-Aware Attention and Motion State Encoding Transformer for Multi-Agent Trajectory Prediction |
|
Yan, Yixin | Hunan University |
Li, Yang | Hunan University, College of Mechanical and Vehicle Engineering |
Wang, Yuanfan | Hunan University |
Zhou, Xiaozhou | Hunan University |
Xia, Beihao | Huazhong University of Science and Technology |
Hu, Manjiang | Hunan University |
Qin, Hongmao | Hunan University |
Keywords: Predictive Trajectory Models and Motion Forecasting
Abstract: 对智能体之间复杂的时空依赖关系进行建模以进行轨迹预测一直是一项挑战。由于代理体的每个状态都与相邻时间步长的状态密切相关,因此捕获局部时间依赖性有利于预测,而大多数研究经常忽略它。此外,学习高阶运动状态属性有望增强空间交互建模,但在以前的工作中很少见。为了解决这个问题,我们提出了一个轻量级框架 textit{i.e.},LTMSformer,来提取时空交互特征进行多模态轨迹预测。具体来说,我们引入了一种局部趋势感知注意力机制,通过利用带有分层本地时间框的卷积注意力机制来捕获局部时间依赖性&
|
|
15:00-16:15, Paper MoDT1.6 | Add to My Program |
Lane-Level Navigation: A Local Drive Guide Sitting by the Roadside |
|
Li, Hongchen | Tongji University |
Lei, Mingyue | Tongji University |
Lin, Weimeng | COSCO SHIPPING Ports Limited |
Hu, Jia | Tongji University |
Keywords: Decision Making, Vehicle-to-Infrastructure (V2I) Communication, Real-Time Control Strategies
Abstract: In this research, a lane-level navigation system is designed to enhance vehicle mobility at signalized intersections. A deep learning-based lane-level long-term speed prediction (LLSP) predictor was developed to forecast traffic conditions for the upcoming planning horizon. Additionally, a lane-level navigation with speed guidance (LNSG) planner was introduced to determine the optimal lane-level route and the recommended travel speed for the ego vehicle. The performance of the proposed system was assessed using a software-in-the-loop simulation platform, considering various scenarios such as different traffic demands, vehicle arrival times at the control area, and planning resolutions. The evaluation results demonstrate that the proposed navigation system effectively improves the mobility of the ego vehicle by providing optimal lane and speed recommendations. Compared to the lane-keeping strategy, the system can reduce travel time by up to 30.2% in various traffic conditions.
|
|
15:00-16:15, Paper MoDT1.7 | Add to My Program |
CommonRoad Global Planner: A Toolbox for Global Motion Planning on Roads with Formal Guarantees |
|
Mascetta, Tobias Falco Wolfgang | Technical University Munich |
Northoff, Kilian | Technical University of Munich |
Althoff, Matthias | Technische Universität München |
Keywords: Motion Planning Algorithms for Autonomous Vehicles
Abstract: Motion planning for autonomous driving depends on or greatly benefits from global information, such as routes, reference paths, and velocity profiles. Existing global planning toolboxes (1) do not use provably unique curvilinear coordinates, (2) are mostly limited to racing scenarios, and (3) are not compatible with large scenario benchmarks. We present CommonRoad Global Planner, an open-source toolbox within the CommonRoad framework for global motion planning on roads, comprising the CommonRoad Route Planner for generating routes and smooth reference paths as well as the CommonRoad Velocity Planner, which implements several algorithms for planning velocity profiles. Our contributions are threefold: (1) our toolbox uses reference paths with provably correct curvilinear coordinates and returns velocity profiles that meet user-specified constraints; (2) the implemented algorithms are compatible with the CommonRoad benchmark suite; and (3) to the best of our knowledge, our toolbox for global planning is the first which is evaluated on large-scale numerical experiments on an open-source benchmark.
|
|
15:00-16:15, Paper MoDT1.8 | Add to My Program |
Online Velocity Profile Generation and Tracking for Sampling-Based Local Planning Algorithms in Autonomous Racing Environments |
|
Langmann, Alexander | Technical University of Munich |
Ögretmen, Levent | Technical University of Munich |
Werner, Frederik | Technische Universität München |
Betz, Johannes | Technical University of Munich |
Keywords: Motion Planning Algorithms for Autonomous Vehicles, Multi-Objective Planning Approaches
Abstract: This work presents an online velocity planner for autonomous racing that adapts to changing dynamic constraints, such as grip variations from tire temperature changes and rubber accumulation. The method combines a forward-backward solver for online velocity optimization with a novel spatial sampling strategy for local trajectory planning, utilizing a three-dimensional track representation. The computed velocity profile serves as a reference for the local planner, ensuring adaptability to environmental and vehicle dynamics. We demonstrate the approach’s robust performance and computational efficiency in racing scenarios and discuss its limitations, including sensitivity to deviations from the predefined racing line and high jerk characteristics of the velocity profile.
|
|
15:00-16:15, Paper MoDT1.9 | Add to My Program |
Biasing the Driving Style of an Artificial Race Driver for Online Time-Optimal Maneuver Planning |
|
Taddei, Sebastiano | University of Trento - DII, Politecnico Di Bari - DEI |
Piccinini, Mattia | Technical University of Munich |
Biral, Francesco | University of Trento |
Keywords: Motion Planning Algorithms for Autonomous Vehicles, Real-Time Control Strategies, Adaptive Vehicle Control Techniques
Abstract: In this work, we present a novel approach to bias the driving style of an artificial race driver (ARD) for online time-optimal trajectory planning. Our method leverages a nonlinear model predictive control (MPC) framework that combines time minimization with exit speed maximization at the end of the planning horizon. We introduce a new MPC terminal cost formulation based on the trajectory planned in the previous MPC step, enabling ARD to adapt its driving style from early to late apex maneuvers in real-time. Our approach is computationally efficient, allowing for low replan times and long planning horizons. We validate our method through simulations, comparing the results against offline minimum-lap-time (MLT) optimal control and online minimum-time MPC solutions. The results demonstrate that our new terminal cost enables ARD to bias its driving style, and achieve online lap times close to the MLT solution and faster than the minimum-time MPC solution. Our approach paves the way for a better understanding of the reasons behind human drivers' choice of early or late apex maneuvers.
|
|
15:00-16:15, Paper MoDT1.10 | Add to My Program |
Enhanced DACER Algorithm with Multimodal Q-Value Distribution for Risk-Sensitive Stochastic Vehicle Environments |
|
Liu, Tong | Tsinghua University |
Song, Xujie | Tsinghua University |
Wang, Yinuo | Tsinghua University |
Zou, Wenjun | Tsinghua University |
Shuai, Bin | Tsinghua University |
Gao, Haoyu | Tsinghua University |
He, Weixian | Tsinghua University |
Duan, Jingliang | University of Science and Technology Beijing |
Li, Shengbo Eben | Tsinghua University |
Keywords: Adaptive Vehicle Control Techniques, Real-Time Control Strategies, Decision Making
Abstract: Reinforcement learning demonstrates strong capabilities in handling complex control tasks, especially in the field of autonomous driving where vehicles cope with uncertain environments. Existing reinforcement learning methods attempt to model the value distribution using unimodal, but in this modeling process, a significant amount of the complete distribution information is lost. In response to this problem, we propose the DACER++, an online multimodal distributional RL algorithm. DACER++ characterize the value distribution as multimodal will enhance the accuracy of characterizing the value distribution and improve algorithm performance. We construct the quantiles value network and use quantile regression to approximate the full quantile function of the state-action return distribution. This method allows for the precise modeling of multi-modal distributions, and formulates risk-sensitive policies adaptable to different environment. Then, We integrate quantiles value network with the actor-critic architecture algorithm DACER. Experiments on multi-goal tasks and MuJoCo benchmarks show that DACER++ not only has multimodal policy representation capability, but also achieves state-of-the-art performance. In stochastic vehicle meeting environments, DACER++ can learn different multimodal value distributions according to various risk preferences, including the conservative and aggressive driving style.
|
|
15:00-16:15, Paper MoDT1.11 | Add to My Program |
Learning to Predict Mixed-Traffic Trajectories in Urban Scenarios from Little Training Data with Refined Environment Modeling |
|
Prutsch, Alexander | Graz University of Technology |
Possegger, Horst | Graz University of Technology |
Keywords: Motion Forecasting, Predictive Trajectory Models and Motion Forecasting, Deep Learning Based Approaches
Abstract: Trajectory prediction for autonomous driving has been extensively studied using large-scale datasets from the US and Asia. These datasets typically have a strong bias toward predicting vehicle motion. Recently, the View-of-Delft Prediction (VoD-P) dataset introduced a collection of European urban mixed-traffic scenarios, posing unique challenges due to its diversity and relatively small dataset size. In this work, we conduct a detailed study on trajectory prediction on the VoD-P dataset. We show that state-of-the-art trajectory prediction models, which perform well on large-scale vehicle-biased datasets, struggle to generalize to the scenarios. To address this limitation, we propose a simple yet effective transformer-based trajectory prediction model, specifically designed to handle the challenges posed in diverse urban scenarios. Combining a strong baseline with refined environment modeling, our approach significantly outperforms all existing methods on the VoD-P dataset.
|
|
15:00-16:15, Paper MoDT1.12 | Add to My Program |
Validation of a POMDP Framework for Interaction-Aware Trajectory Prediction in Vehicle Safety |
|
Elter, Tim | Technische Hochschule Ingolstadt |
Dirndorfer, Tobias | CARIAD SE |
Botsch, Michael | Technische Hochschule Ingolstadt |
Utschick, Wolfgang | Technische Universität München |
Keywords: Collision Avoidance Algorithms, Predictive Trajectory Models and Motion Forecasting, Decision Making
Abstract: Predicting the motion of traffic participants accurately remains a challenging task in the field of automated driving. Especially interactions between traffic participants introduce high complexity and interdependencies into the environment prediction. This work presents the remarkable performance of a Partially Observable Markov Decision Process (POMDP) framework to stochastically predict and safely respond to an interacting environment. The framework is validated for its ability to increase the overall Ego-Vehicle safety by preemptively triggering a de-escalation maneuver. The performance of the framework is analyzed on a publicly available dataset with real-world traffic (Argoverse) and on highly critical simulation scenarios specified by Euro-NCAP for emergency braking functions. The results show quantitatively that the proposed framework significantly contributes to an early de-escalation of critical scenarios. Such an early de-escalation increases the safety and comfort of automated vehicles.
|
|
15:00-16:15, Paper MoDT1.13 | Add to My Program |
DI3: Dynamic Insertable Intention Interval Based Future Motion Prediction for Autonomous Driving |
|
Wen, Lu | University of Michigan, Ann Arbor |
D'sa, Jovin | Honda Research Institute, USA |
Chalaki, Behdad | Honda Research Institute USA Inc |
Nourkhiz Mahjoub, Hossein | Honda Research Institute, US |
Moradi-Pari, Ehsan | Honda Research Institute USA |
Keywords: Predictive Trajectory Models and Motion Forecasting, Motion Forecasting
Abstract: In this paper, we address the challenges of limited interpretability and scalability in traditional trajectory prediction models for autonomous driving decision-making. We present the Dynamic Insertable Intention Interval framework (DI3), which introduces a novel representation of driving intentions by accounting for dynamic interactions with the surrounding environment. Our hierarchical approach integrates intention queries within a motion decoder, enabling the generation of multimodal predictions that closely replicate human driving behavior. Through comprehensive experiments on the highway on-ramp merging scenario using the exiD dataset, we demonstrate that DI3 enhances trajectory prediction accuracy and reduces joint prediction overlap rates compared to the Motion Transformer (MTR) baseline, demonstrating its effectiveness in high-interaction scenarios. Our work lays the foundation for more reliable and interpretable prediction models that is valuable for decision-making in autonomous driving applications.
|
|
15:00-16:15, Paper MoDT1.14 | Add to My Program |
Safe and Efficient CAV Lane Changing Using Decentralised Safety Shields |
|
Hegde, Bharathkumar | School of Computer Science and Statistics, Trinity College Dubli |
Bouroche, Melanie | School of Computer Science and Statistics, Trinity College Dubli |
Keywords: Collision Avoidance Algorithms, Reinforcement Learning for Planning, Multi-Objective Planning Approaches
Abstract: Lane changing is a complex decision-making problem for Connected and Autonomous Vehicles (CAVs) as it requires balancing traffic efficiency with safety. Although traffic efficiency can be improved by using vehicular communication for training lane change controllers using Multi-Agent Reinforcement Learning (MARL), ensuring safety is difficult. To address this issue, we propose a decentralised Hybrid Safety Shield (HSS) that combines optimisation and a rule-based approach to guarantee safety. Our method applies control barrier functions to constrain longitudinal and lateral control inputs of a CAV to ensure safe manoeuvres. Additionally, we present an architecture to integrate HSS with MARL, called MARL-HSS, to improve traffic efficiency while ensuring safety. We evaluate MARL-HSS using a gym-like environment that simulates an on-ramp merging scenario with two levels of traffic densities, such as light and moderate densities. The results show that HSS provides a safety guarantee by strictly enforcing a dynamic safety constraint defined on a time headway, even in moderate traffic density that offers challenging lane change scenarios. Moreover, the proposed method learns stable policies compared to the baseline, a state-of-the-art MARL lane change controller without a safety shield. Further policy evaluation shows that our method achieves a balance between safety and traffic efficiency with zero crashes and comparable average speeds in light and moderate traffic densities.
|
|
15:00-16:15, Paper MoDT1.15 | Add to My Program |
Frenet Corridor Planner: An Optimal Local Path Planning Framework for Autonomous Driving |
|
Tariq, Faizan M. | Honda Research Institute USA, Inc |
Yeh, Zheng-Hang | Honda Research Institute |
Singh, Avinash | Honda Research Institute, USA |
Isele, David | Honda Research Institute USA |
Bae, Sangjae | Honda Research Institute, USA |
Keywords: Motion Planning Algorithms for Autonomous Vehicles, Multi-Objective Planning Approaches, Decision Making
Abstract: Motivated by the requirements for effectiveness and efficiency, path-speed decomposition-based trajectory planning methods have widely been adopted for autonomous driving applications. While a global route can be planned offline, real-time generation of adaptive local paths remains crucial. Therefore, we present the Frenet Corridor Planner (FCP), an optimization-based local path planning strategy for autonomous driving that ensures smooth and safe navigation around obstacles. Modeling the vehicles as safety-augmented bounding boxes and pedestrians as convex hulls in the Frenet space, our approach defines a drivable corridor by determining the appropriate deviation side for static obstacles. Thereafter, a modified space-domain bicycle kinematics model enables path optimization for smoothness, boundary clearance, and dynamic obstacle risk minimization. The optimized path is then passed to a speed planner to generate the final trajectory. We validate FCP through extensive simulations and real-world hardware experiments, demonstrating its efficiency and effectiveness.
|
|
15:00-16:15, Paper MoDT1.16 | Add to My Program |
Knowledge Integration Strategies in Autonomous Vehicle Prediction and Planning: A Comprehensive Survey |
|
Manas, Kumar | Freie Universität Berlin |
Paschke, Adrian | FU Berlin |
Keywords: Decision Making, Motion Planning Algorithms for Autonomous Vehicles, Safety Verification and Validation Techniques
Abstract: This comprehensive survey examines the integration of knowledge-based approaches in autonomous driving systems, specifically focusing on trajectory prediction and planning. We extensively analyze various methodologies for incorporating domain knowledge, traffic rules, and commonsense reasoning into autonomous driving systems. The survey categorizes and analyzes approaches based on their knowledge representation and integration methods, ranging from purely symbolic to hybrid neuro-symbolic architectures. We examine recent developments in logic programming, foundation models for knowledge representation, reinforcement learning frameworks, and other emerging technologies incorporating domain knowledge. This work systematically reviews recent approaches, identifying key challenges, opportunities, and future research directions in knowledge-enhanced autonomous driving systems. Our analysis reveals emerging trends in the field, including the increasing importance of interpretable AI, the role of formal verification in safety-critical systems, and the potential of hybrid approaches that combine traditional knowledge representation with modern machine learning techniques.
|
|
15:00-16:15, Paper MoDT1.17 | Add to My Program |
Graph-Based Path Planning with Dynamic Obstacle Avoidance for Autonomous Parking |
|
Savvas Sadiq Ali, Farhad Nawaz | University of Pennsylvania |
Sung, Minjun | University of Illinois Urbana Champaign |
Gadginmath, Darshan | University of California Riverside |
D'sa, Jovin | Honda Research Institute, USA |
Bae, Sangjae | Honda Research Institute, USA |
Isele, David | Honda Research Institute USA |
Figueroa, Nadia | University of Pennsylvania |
Matni, Nikolai | University of Pennsylvania |
Tariq, Faizan M. | Honda Research Institute USA, Inc |
Keywords: Motion Planning Algorithms for Autonomous Vehicles, Collision Avoidance Algorithms
Abstract: Safe and efficient path planning in parking scenarios presents a significant challenge due to the presence of cluttered environments filled with static and dynamic obstacles. To address this, we propose a novel and computationally efficient planning strategy that seamlessly integrates the predictions of dynamic obstacles into the planning process, ensuring the generation of collision-free paths. Our approach builds upon the conventional Hybrid A star algorithm by introducing a time-indexed variant that explicitly accounts for the predictions of dynamic obstacles during node exploration in the graph, thus enabling dynamic obstacle avoidance. We integrate the time-indexed Hybrid A star algorithm within an online planning framework to compute local paths at each planning step, guided by an adaptively chosen intermediate goal. The proposed method is validated in diverse parking scenarios, including perpendicular, angled, and parallel parking. Through simulations, we showcase our approach's potential in greatly improving the efficiency and safety when compared to the state of the art spline-based planning method for parking situations.
|
|
MoDT2 Poster Session, Leonardo + Lobby Left |
Add to My Program |
Poster 2.2 >> Safety, Criticality and Risk Awareness |
|
|
Chair: Bergasa, Luis M. | University of Alcala |
Co-Chair: Abuhadrous, Iyad | INRIA |
|
15:00-16:15, Paper MoDT2.1 | Add to My Program |
A Safety Margin-Based Automatic Emergency Braking Model |
|
Ji, Xin | Beihang University |
Lu, Guangquan | Beihang University |
Wang, Jinghua | Beihang University |
Liang, Jinhao | Southeast University |
Tang, RenJing | Beihang University |
Keywords: Collision Avoidance Algorithms, Level 2 ADAS Control Techniques
Abstract: The Automatic Emergency Braking (AEB) system is capable of assessing driving risks, alerting the driver to potential collision hazards, and, in the absence of driver response to the collision risk, autonomously activating braking to mitigate the occurrence of collision accidents. Most existing Automatic Emergency Braking (AEB) systems rely on Time to Collision (TTC) for risk assessment and decision-making. However, TTC fails to account for the impact of absolute velocity on driving safety when assessing risk, leading to inaccurate risk descriptions, particularly in high-speed scenarios with minor speed differences. The Safety Margin (SM) takes into account key factors affecting driving risk, such as relative velocity and distance, and is capable of accurately quantifying driving risks. Based on the SM, this study proposes a full-speed range single-threshold Automatic Emergency Braking (AEB) model. The model comprises two components: traffic environment risk quantification and road surface friction coefficient estimation. It is applicable to automatic emergency braking tasks under varying speeds and road surface conditions. Simulation experiments were conducted by constructing three typical scenarios: stationary lead vehicle, slow-moving lead vehicle, and braking lead vehicle, to determine the braking threshold as 0.2. The safety performance of the proposed safety margin-based AEB model is evaluated by comparing it with the traditional TTC-based AEB model across the specified scenarios. The results demonstrate that the safety margin-based AEB model proposed in this study achieves 100% safe braking in all scenarios, successfully performing emergency braking and outperforming the TTC-based AEB model.
|
|
15:00-16:15, Paper MoDT2.2 | Add to My Program |
A Generative Self-Diagnosis Disengagement Reporting System for Autonomous Shuttles |
|
Kong, Xiangrui | The University of Western Australia |
Liang, Li | The University of Western Australia |
Li, Jichunyang | The University of Western Australia |
Quirke-Brown, Kieran | The University of Western Australia |
Lai, Zhihui | The University of Western Australia |
Olaru, Doina | University of Western Australia |
Braunl, Thomas | The University of Western Australia |
Keywords: Self-Diagnostic Systems for Vehicle Safety, Semantic Segmentation Techniques, Real-World Testing Methodologies for Safety Systems
Abstract: The increasing presence of autonomous vehicles on public roads has highlighted the limitations of traditional incident reporting systems, which rely on human-generated tables and descriptions. To address this, we propose a generative reporting framework that integrates a large language model (LLM) with semantic scene generation models. This framework utilizes perception snapshots and self-diagnostic data to generate detailed incident reports, addressing environmental blind spots. Our 3D scene completion network, combining diffusion and state-space models, reconstructs blind zones undetected by exterior sensors, achieving IoU scores of 41.92 on SSCBench-KITTI360 and 44.13 on SemanticKITTI. Public road experiments validate the system's ability to improve incident report quality while maintaining performance.
|
|
15:00-16:15, Paper MoDT2.3 | Add to My Program |
A Necessary Criterion for Evaluating Scene-Level Criticality Metrics in Safety Verification of Autonomous Driving |
|
Cheng, Hao | Tsinghua University |
Ge, Qiang | Tsinghua University |
Jiang, Yanbo | Tsinghua University |
Li, Haoran | Suzhou Automotive Research Institute, Tsinghua University |
Chen, Keyu | Tsinghua University |
Wang, Jianqiang | Tsinghua University |
Zheng, Sifa | Tsinghua University |
Keywords: Safety Verification and Validation Techniques, Real-World Testing Methodologies for Safety Systems, Self-Diagnostic Systems for Vehicle Safety
Abstract: Effective, reliable, and efficient measurement of autonomous driving safety performance is essential for demonstrating its trustworthiness. Criticality metrics offer an objective assessment of autonomous driving safety. However, the wide variety of criticality metrics, each with distinct characteristics,lacks a unified standard for evaluation and selection. We contend that a criticality metric should accurately reflect the true danger level of vehicle pairs at risk. This paper focuses on scene-level criticality metrics and proposes a necessary criterion: a robust criticality metric should accurately distinguish between 'collision unavoidable’and 'collision-avoidable'states. To achieve this,we employ Monte Carlo sampling to systematically explore the state space of two-vehicle conflict scenes (>106 samples) and use intention-sharing Distributed Model Predictive Control(DMPC) to determine the ground truth of collision states. We analyze failure cases of three classical and two state-of-the-art scene-level criticality metrics and quantify their performance using the Receiver Operating Characteristic (ROC) method. Our approach has the potential to establish a necessary standard for evaluating criticality metrics, facilitating accurate assessment,analysis, and enhancement of autonomous vehicle safety.
|
|
15:00-16:15, Paper MoDT2.4 | Add to My Program |
Good Enough to Learn: LLM-Based Anomaly Detection in ECU Logs without Reliable Labels |
|
Bogdan, Bogdan Mihai | Porsche Engineering Romania SRL |
Cazacu, Arina Ioana | Porsche Engineering Romania SRL |
Vasilie, Laura Ana | Porsche Engineering Romania SRL |
Keywords: Safety Verification and Validation Techniques, Fault Detection and Isolation (FDI) and Protection Level Determination, Foundation Models Based Approaches
Abstract: Anomaly detection often relies on supervised or clustering approaches, with limited success in specialized domains like automotive communication systems where scalable solutions are essential. We propose a novel decoder-only Large Language Model (LLM) to detect anomalies in Electronic Control Unit (ECU) communication logs. Our approach addresses two key challenges: the lack of LLMs tailored for ECU communication and the complexity of inconsistent ground truth data. By learning from UDP communication logs, we formulate anomaly detection simply as identifying deviations in time from normal behavior. We introduce an entropy regularization technique that increases model's uncertainty in known anomalies while maintaining consistency in similar scenarios. Our solution offers three novelties: a decoder-only anomaly detection architecture, a way to handle inconsistent labeling, and an adaptable LLM for different ECU communication use cases. By leveraging the generative capabilities of decoder-only models, we present a new technique that addresses the high cost and error-prone nature of manual labeling through a more scalable system that is able to learn from a minimal set of examples, while improving detection accuracy in complex communication environments.
|
|
15:00-16:15, Paper MoDT2.5 | Add to My Program |
Fool the Stoplight: Realistic Adversarial Patch Attacks on Traffic Light Detectors |
|
Pavlitska, Svetlana | FZI Research Center for Information Technology |
Robb, Jamie | FZI Research Center for Information Technology |
Polley, Nikolai | Karlsruhe Institute of Technology |
Yazgan, Melih | FZI Research Center for Information Technology |
Zöllner, J. Marius | FZI Research Center for Information Technology; KIT Karlsruhe In |
Keywords: Safety Verification and Validation Techniques, Deep Learning Based Approaches, Cybersecurity Measures for Connected Vehicles
Abstract: Realistic adversarial attacks on various camera-based perception tasks of autonomous vehicles have been successfully demonstrated so far. However, only a few works considered attacks on traffic light detectors. This work shows how CNNs for traffic light detection can be attacked with printed patches. We propose a threat model, where each instance of a traffic light is attacked with a patch placed under it, and describe a training strategy. We demonstrate successful adversarial patch attacks in universal settings. Our experiments show realistic targeted red-to-green label-flipping attacks and attacks on pictogram classification. Finally, we perform a real-world evaluation with printed patches and demonstrate attacks in the lab settings with a mobile traffic light for construction sites and in a test area with stationary traffic lights. Our code will be made publicly available upon acceptance.
|
|
15:00-16:15, Paper MoDT2.6 | Add to My Program |
Evaluating Pedestrian Risks in Shared Spaces through Autonomous Vehicle Experiments on a Fixed Track (I) |
|
Del Re, Enrico | Johannes Kepler Universität Linz |
Certad, Novel | Department Intelligent Transport Systems, Johannes Kepler Univer |
Varughese, Joshua Cherian | Johannes Kepler University |
Olaverri-Monreal, Cristina | Johannes Kepler University Linz, Austria |
Keywords: Vulnerable Road User Protection Strategies, Trust and Acceptance of Autonomous Technologies, Collision Avoidance Algorithms
Abstract: The majority of research on safety in autonomous vehicles has been conducted in structured and controlled environments. However, there is a scarcity of research on safety in unregulated pedestrian areas, especially when interacting with public transport vehicles like trams. This study investigates pedestrian responses to an alert system in this context by replicating this real-world scenario in an environment using an autonomous vehicle. The results show that safety measures from other contexts can be adapted to shared spaces with trams, where fixed tracks heighten risks in unregulated crossings.
|
|
15:00-16:15, Paper MoDT2.7 | Add to My Program |
Experimental Results in Cyber-Physical Transportation Systems: A Case Study in Cybersecurity |
|
Ha, Won Yong | New York University |
Chakraborty, Sayan | New York University |
Ozbay, Kaan | New York University |
Jiang, Zhong-Ping | New York University |
Keywords: Reinforcement Learning for Planning, Control Strategies for Autonomous UAVs, Cybersecurity Measures for Connected Vehicles
Abstract: This paper presents experimental results from a learning-based control framework for cyber-physical transportation systems. Building on theoretical guarantees that establish an upper bound on denial-of-service (DoS) attack durations to maintain closed-loop stability, we deploy a resilient learning-based lane-changing control algorithm on a remote-controlled (RC) autonomous vehicle equipped with GPS, IMU, and camera sensors, interfaced with an Nvidia Jetson AGX Xavier board. The algorithm leverages real-time sensor data to make suboptimal yet robust lane-change decisions while enduring intermittent DoS attacks that disrupt communication. Our experiments confirm the resilience of this learning-based approach, demonstrating safe and efficient maneuvers under adversarial conditions in obstacle-rich driving scenarios. By highlighting these experimental findings, this work underscores the importance of cybersecurity in next-generation vehicle control algorithms for autonomous transportation applications.
|
|
15:00-16:15, Paper MoDT2.8 | Add to My Program |
Monitoring Operational Design Domain Compliance in Intelligent Vehicles |
|
Charmet, Thibault | Renault, Université De Technologie De Compiègne |
Cherfaoui, Véronique | Universite De Technologie De Compiegne |
Ibanez Guzman, Javier | Renault S.A.S, |
Armand, Alexandre | Renault SA |
Keywords: Safety Verification and Validation Techniques, Real-World Testing Methodologies for Safety Systems, Self-Diagnostic Systems for Vehicle Safety
Abstract: Advanced driver assistance systems (ADAS) and automated driving functions are becoming integral to modern vehicles. Ensuring their safety and reliability requires validating their operation within well-defined Operational Design Domains (ODD). Monitoring ODD compliance is crucial to determine when these functions can operate safely. This paper presents a systematic approach to ODD monitoring using a formalized, machine-readable ODD description and fuzzy logic. The method evaluates compliance and provides explanations for non-compliance. The approach introduces a two-level hierarchical ODD representation, a membership score quantifying compliance, and an explanation mechanism for identifying the primary factors contributing to non-compliance. The monitoring results are integrated into a Conditional Activation Control System (CACS), which governs function activation based on ODD compliance. The proposed system was implemented within a production vehicle and validated using real-world data, demonstrating its feasibility for deployment. By enabling clear, real-time assessments of ODD adherence, this approach supports safer and more reliable automated driving, promoting user confidence and regulatory compliance.
|
|
15:00-16:15, Paper MoDT2.9 | Add to My Program |
Diffusion Models for Safety Validation of Autonomous Driving Systems |
|
Wang, Juanran | Stanford University |
Schlichting, Marc René | Stanford University |
Delecki, Harrison | Stanford University |
Kochenderfer, Mykel | Stanford University |
Keywords: Safety Verification and Validation Techniques, Collision Avoidance Algorithms
Abstract: Safety validation of autonomous driving systems is extremely challenging due to the high risks and costs of real-world testing as well as the rarity and diversity of potential failures. To address these challenges, we train a denoising diffusion model to generate potential failure cases of an autonomous vehicle given any initial traffic state. Experiments on a four-way intersection problem show that in a variety of scenarios, the diffusion model can generate realistic failure samples while capturing a wide variety of potential failures. Our model does not require any external training dataset, can perform training and inference with modest computing resources, and does not assume any prior knowledge of the system under test, with applicability to safety validation for traffic intersections.
|
|
15:00-16:15, Paper MoDT2.10 | Add to My Program |
An ISO 26262-Derived Evaluation Methodology for Automated Fault Injection Test Case Generators |
|
Benkendorf, Nina | Technical University of Munich |
Ganahl, Carolin | Technical University of Munich |
Munaro, Tiziano | Fortiss |
Keywords: Real-World Testing Methodologies for Safety Systems, Safety Verification and Validation Techniques
Abstract: Fault Injection (FI) is a well-established method to assess the effect of failures within elements of a system under test. Where FI test cases can be executed automatically, such as in simulation-based or Hardware-in-the-Loop (HiL) FI, numerous test case generators (TCGs) have been proposed that aim to uncover more 'critical' test cases or to accomplish this using fewer resources. However, the evaluations of these approaches do not allow for direct comparisons: Experiments are often not reproducible, metrics are commonly specific to use cases, and key properties, such as test case distribution, are often not captured. Further, the authors are not aware of any suitable comparison frameworks. Hence, to support practitioners in selecting the most suitable FI TCG for their use case, test setup, and individual goal, this work introduces a set of use case-independent metrics derived from the ISO 26262 safety standard and identifies how these metrics can be applied and analyzed to capture decisive characteristics such as the distribution, criticality, and coverage of generated test cases. We incorporate these metrics and analyses in a start-to-finish methodology and provide their implementation as an open-source tool to effectively and reproducibly evaluate TCGs for automated FI. The evaluation methodology is assessed in a case study with an industry-oriented cyber-physical system, demonstrating its ability to support practitioners in making an informed decision about the TCG providing the most appropriate balance of coverage and efficiency for their particular use case.
|
|
15:00-16:15, Paper MoDT2.11 | Add to My Program |
Exploring Communication and Roadside Perception Requirements for Cooperative Warning Systems at Intersections |
|
Wang, Tinghan | University of Michigan |
Meng, Depu | University of Michigan |
Li, Boqi | Univ. of Michigan |
Zhang, Rusheng | University of Michigan |
Zuo, Yukun | Hunan University |
Shen, Shengyin | University of Michigan |
Darian Hogue, Darian Hogue | Mcity - Univeristy of Michigan |
Maile, Michael | Ivie Communications |
Shulman, Michael | Shulman Technology Consultants, Llc |
Liu, Henry X. | University of Michigan |
Keywords: Collision Avoidance Algorithms, Vulnerable Road User Protection Strategies
Abstract: Infrastructure-based cooperative perception has been researched for several years, but few automotive warning or control applications using this information have been published. Infrastructure sensing, such as with cameras or lidars, and a communication system, allows connected vehicles to receive information about all observed objects. An SAE standard, "V2X Sensor-Sharing for Cooperative and Automated Driving" (J3224), released in 2022, introduces the Sensor Data Sharing Message (SDSM) as the standard communication message for cooperative perception. This paper investigates the use of the SDSM for a vehicle application to provide warnings of potential collisions with vulnerable road users who will cross the street at the intersection. The application was tested in CARLA simulation under various roadside detection errors and communication conditions to assess the impact on the on-board application and estimate the minimum detection and communication requirements for effective use. In addition, the system was implemented and evaluated at the Mcity test facility. The results demonstrate that the proposed warning system can accurately and promptly warn the driver, given specific communication conditions, and show that the SDSM is viable for real-time on-board usage.
|
|
15:00-16:15, Paper MoDT2.12 | Add to My Program |
A Dynamic Priority-Based Batch Verification Scheme for V2X Communication in Vehicular Networks |
|
Yang, Yang | Beihang University |
Yu, Haiyang | Beihang University |
Fu, Xiang | Beihang University |
Ren, Yilong | Beihang University |
Zhao, Yanan | Beihang University |
Shi, Yuqi | Tongji University |
Keywords: Safety Verification and Validation Techniques, Vehicle-to-Infrastructure (V2I) Communication, Cybersecurity Measures for Connected Vehicles
Abstract: V2X technology facilitates real-time communication between vehicles, enabling collision avoidance systems, proactive hazard warnings, and cooperative maneuvers to prevent potential accidents. Due to the inherent openness of wireless communication channels, vehicular networks are highly susceptible to various security threats. Digital signatures have been widely adopted as an effective verification mechanism to ensure message integrity and authenticity. However, in high-density traffic environments, the sheer volume of messages imposes a significant computational burden on the verification process, leading to excessive delays and potential packet loss which compromises the timeliness and reliability of safety-critical applications. To address this issue, we propose a priority-aware signature verification scheme DPBV that dynamically prioritizes V2X messages based on their urgency and relevance. By leveraging clustering-based classification and batch verification techniques, the proposed approach optimizes the processing efficiency of safety messages while maintaining stringent security guarantees. Simulation results demonstrate that our scheme significantly reduces verification latency and improves message authentication throughput, making it well-suited for real-time V2X communication in high-density vehicular networks.
|
|
15:00-16:15, Paper MoDT2.13 | Add to My Program |
Steering into Danger: Security Vulnerabilities in Steer-By-Wire and Steering Wheel-Less Vehicles |
|
Yedla Ravi, Bhagawat Baanav | University of Florida |
Ray, Sandip | University of Florida |
Keywords: Level 4-5 Autonomous Driving Systems Architecture, Vulnerable Road User Protection Strategies
Abstract: Steer-by-Wire (SbW) systems revolutionize automotive technology by eliminating the mechanical linkage between the steering wheel and tires, enhancing design flexibility and performance, especially in autonomous vehicles. However, this reliance on sensors and electronic data channels introduces critical security vulnerabilities. Exposed sensor locations in modern vehicles increase susceptibility to cyberattacks and physical interference, yet prior research has largely overlooked SbW-specific threats, particularly position encoders. This paper is the first to experimentally analyze SbW security vulnerabilities, presenting a novel attack methodology that disrupts SbW sensors. Our findings demonstrate how these vulnerabilities can compromise steering operations, posing severe risks to vehicle dynamics and occupant safety. As autonomous vehicles eliminate manual intervention, addressing these security risks becomes urgent. This study lays a foundation for developing more resilient SbW systems, ensuring safer and more secure automotive technologies.
|
|
15:00-16:15, Paper MoDT2.14 | Add to My Program |
Formalization and Online Monitoring of Right-Of-Way Laws for Autonomous Vehicles at Intersections |
|
Zhang, LingJun | Tsinghua University |
Zhao, Chengxiang | Beijing Institute of Technology |
Yang, Lei | Tsinghua University |
Song, Lei | Tsinghua University |
Song, Ziying | Beijing Jiaotong University |
Yu, Wenhao | Tsinghua University |
Wang, Hong | Tsinghua University |
Keywords: Smart City Mobility Integration Strategies, User-Centric Intelligent Vehicle Technologies, Collision Avoidance Algorithms
Abstract: With the rapid advancement of autonomous driving, safety concerns have become the primary barrier to its commercialization. Compliance with traffic laws is crucial for ensuring road safety. However, the current laws, formulated for human drivers, present challenges for autonomous systems due to ambiguous language description, complicating accurate judgment and government monitoring. It is imperative to transform traffic laws into machine-interpretable logical frameworks while simultaneously resolving ambiguities in legal terminology to ensure clarity and precision. This study focuses on urban intersections, characterized by high traffic complexity and diverse participants. We propose a formalization method for right-of-way laws and develop a threshold analysis framework based on processed data from SIND, which rigorously defines the prioritization of right-of-way. The optimal compliance threshold is determined through sensitivity analysis, evaluated using the proposed Weighted TPN score (WTPNs). Meanwhile, the threshold was implemented in online monitoring at intersections. The dataset is available online via: https://github.com/SOTIF-AVLab/SinD
|
|
15:00-16:15, Paper MoDT2.15 | Add to My Program |
Model-Based Development of a Hardware-In-The-Loop Setup for Assessing Cybersecurity of Vehicle-To-Robot Communication |
|
Behrens, Theodor | Volkswagen AG |
Heinrich, Lukas | Volkswagen AG |
Pannek, Jürgen | Institute for Intermodal Transportation and Logistic System, Tec |
Keywords: User-Centric Intelligent Vehicle Technologies, Vehicle-to-Infrastructure (V2I) Communication, Cybersecurity Measures for Connected Vehicles
Abstract: In automotive engineering, the integration of the vehicle into mobility ecosystems builds a new dimension of complexity in development and testing. As a consequence, existing test environments like Hardware-in-the-Loop (HiL) have to be adapted. In this paper, we propose a systematic approach to adapt the HiL testbench specification ensuring that the test environment considers the new interactions within the vehicle and with its ecosystem. We illustrate this method by specifiyng a testbench able to assess the cybersecurity of a recently developed Universal Vehicle-to-Robot Communication Interface (UVCI).
|
|
15:00-16:15, Paper MoDT2.16 | Add to My Program |
ISO 34505 Based Test Evaluation Methodology for ADAS/AD |
|
Yetkin, Sarp Kaya | AVL Türkiye Research and Engineering |
Günaydın, Batuhan | AVL Research & Engineering Turkey |
Tomruk, Mert | AVL Research & Engineering Turkey |
Bahar, Saadet | AVL |
Azak, Kaan | AVL Research and Engineering Turkey |
Keywords: Safety Verification and Validation Techniques, Level 4-5 Autonomous Driving Systems Architecture, Synthetic Data Generation for Training
Abstract: The development of Advanced Driver Assistance Systems (ADAS) and Autonomous Driving (AD) technologies involves rigorous verification processes to enhance driver and passenger safety. Even though, Society of Automotive Engineers (SAE) Level 3+ (L3+) systems are promoted to improve safety, significant gaps remain in the test evaluation and validation processes. The International Organization for Standardization (ISO) 34505 standard provides methodologies for evaluating Automated Driving Systems (ADS); however, the defined methodology was not yet fully refined or practically applicable. This study systematically implements the evaluation steps defined in ISO 34505, with a particular focus on enhancing test prioritization, microscopic analysis, and simulation environment validation. Our approach is specifically tailored for L3+ systems, addressing key limitations in current validation techniques and improving the applicability of ISO 34505.
|
|
15:00-16:15, Paper MoDT2.17 | Add to My Program |
Generating Automotive Code: Large Language Models for Software Development and Verification in Safety-Critical Systems |
|
Kirchner, Sven | TU München |
Knoll, Alois | Technische Universität München |
Keywords: Safety Verification and Validation Techniques, Level 3 Driving Systems Architecture and Techniques, User-Centric Intelligent Vehicle Technologies
Abstract: Developing safety-critical automotive software presents significant challenges due to increasing system complexity and strict regulatory demands. This paper proposes a novel framework integrating Generative Artificial Intelligence (GenAI) into the Software Development Lifecycle (SDLC). The framework uses Large Language Models (LLMs) to automate code generation in languages such as C++, incorporating safety-focused practices such as static verification, test-driven development and iterative refinement. A feedback-driven pipeline ensures the integration of test, simulation and verification for compliance with safety standards. The framework is validated through the development of an Adaptive Cruise Control (ACC) system. Comparative benchmarking of LLMs ensures optimal model selection for accuracy and reliability. Results demonstrate that the framework enables automatic code generation while ensuring compliance with safety-critical requirements, systematically integrating GenAI into automotive software engineering. This work advances the use of AI in safety-critical domains, bridging the gap between state-of-the-art generative models and real-world safety requirements.
|
|
15:00-16:15, Paper MoDT2.18 | Add to My Program |
Introducing Spatial Residual Risk for Information Degradation in Automated Driving |
|
Gehrke, Nils | Technische Universität München |
Diermeyer, Frank | Technische Universität München |
Keywords: Teleoperation Control Systems for Vehicles, Level 4-5 Autonomous Driving Systems Architecture, Vulnerable Road User Protection Strategies
Abstract: Misperception of surrounding objects and traffic participants can lead to critical situation. Autonomous Driving systems must be able to assess the safety impact of a degraded sensing and perception pipeline at any time. This assessment should be based on an independent risk evaluation framework. Introduced in this work is a residual risk that quantifies the potential risk originating from misperception compared to a response with non-degraded information. Evaluations are possible online at an average rate of 10Hz. Further, exemplary scenarios are analyzed and provided in this paper for discussion.
|
|
MoDT3 Poster Session, Raffaello + Lobby Right |
Add to My Program |
Poster 2.3 >> Perception: Segmentation & Scene Interpretation |
|
|
Chair: Fremont, Vincent | Ecole Centrale De Nantes, CNRS, LS2N, UMR 6004 |
Co-Chair: Petrovai, Andra | Technical University of Cluj-Napoca |
|
15:00-16:15, Paper MoDT3.1 | Add to My Program |
Rethinking SSIM-Based Optimization in Neural Field Training |
|
Zhang, Xiaoning | Xi’an Jiaotong University |
Su, Yuanqi | Xi'an Jiaotong University |
Lu, HaoAng | Xi'an Jiaotong University |
Zhang, Chi | Xi'an Jiaotong University |
Liu, Yuehu | Institute of Artificial Intelligence and Robotics, Xi'an Jiaoton |
Keywords: 3D Scene Reconstruction Methods, Scalable Neural Scene Representation
Abstract: The Structural Similarity (SSIM) index is a widely used metric for evaluating image quality, with broad applications in areas such as image restoration, 3D reconstruction, and novel view synthesis. A number of previous works have introduced SSIM-based optimization into neural field training to enhance the model's performance. Despite its widespread use, there has been limited research on how to effectively incorporate SSIM loss into the training process. In this work, we explore this gap and provide insights into the role of SSIM loss in neural field training. Our key finding is that SSIM loss is particularly beneficial during the early phase of training, before the model fully learns the luminance information. We show that SSIM loss acts as an effective ``guidance'' mechanism in the initial training phase, and removing it after the model has learned the luminance does not harm the final performance—in fact, it may improve it. Our experiments demonstrate the effectiveness of our strategy, offering new insights into how SSIM loss can be more efficiently used in neural field training. We believe these findings will not only enhance SSIM's application in neural field training but also inspire further research into more adaptive loss functions for deep learning models.
|
|
15:00-16:15, Paper MoDT3.2 | Add to My Program |
3D Shape Transfer Learning for Enhanced Monocular 3D Object Detection |
|
Zhang, Xiaoning | Xi’an Jiaotong University |
Su, Yuanqi | Xi'an Jiaotong University |
Lu, HaoAng | Xi'an Jiaotong University |
Zhang, Chi | Xi'an Jiaotong University |
Wang, Xiangyu | Xi'an Jiaotong University |
Liu, Yuehu | Institute of Artificial Intelligence and Robotics, Xi'an Jiaoton |
Keywords: Static and Dynamic Object Detection Algorithms
Abstract: Monocular 3D object detection (M3D) is challenging due to the lack of depth information in the RGB image. To enhance detection performance, existing works resort to various additional resources, including depth information, LiDAR data, CAD models, stereo images, video sequences, and others. However, they often require close correspondence and strict synchronization between the target RGB image and extra resources, limiting their applicability and scalability. In this work, we propose a simple yet effective framework, 3D Shape Transfer Learning for Enhanced Monocular 3D Object Detection (STLM3D). It views M3D as 3D shape reconstruction and leverages 3D shape transfer learning (STL) across datasets to enhance its reconstruction capability, thereby enhancing M3D performance. In addition, we design a plug-and-play 3D detection branch that focuses on 3D attribute prediction and also facilitates 3D shape transfer learning. Experimental results on the KITTI benchmark demonstrate that our STLM3D leads to new state-of-the-art and surpasses existing methods by a significant margin.
|
|
15:00-16:15, Paper MoDT3.3 | Add to My Program |
Attention-Based Two-Stage 3D Lane Detection and Topological Prediction |
|
Fu, Xiaohan | Tongji University |
Han, Yi | Tongji University |
Tian, Wei | Tongji University |
Yu, Xianwang | Tongji University |
Keywords: Deep Learning Based Approaches, Static and Dynamic Object Detection Algorithms, Semantic Segmentation Techniques
Abstract: The increasing demand for accurate perception of static road information in autonomous driving systems has drawn significant attention to 3D lane detection and topology prediction. This paper introduces a two-stage 3D lane detection and topology prediction model based on attention mechanism. The proposed model employs 3D Bézier curves to represent lane lines and an adjacency matrix to present the topological relationships, which facilitate the end-to-end learning of detection and topology prediction for 3D lanes. Experimental results on the OpenLaneV2 dataset demonstrate that the proposed method achieves improvements in terms of both 3D lane detection and topology prediction compared to currently outstanding methods.
|
|
15:00-16:15, Paper MoDT3.4 | Add to My Program |
Improving 3D Occupancy Estimation Using Driver Gaze Estimation |
|
Baltaxe, Michael | General Motors R&D |
Ben Ezra, Shahar | General Motors |
Tsimhoni, Omer | General Motors |
Telpaz, Ariel | General Motors R&D |
Levi, Dan | General Motors, Advanced Technical Center, Israel |
Celniker, Gershon | General Motors |
Hecht, Ron | General Motors |
Keywords: 3D Scene Reconstruction Methods, Advanced Multisensory Data Fusion Algorithms, Feedback Systems for Driver Interaction
Abstract: This Camera-only 3D occupancy estimation aims to cost-effectively reconstruct the occupancy state of a grid of voxels in a three-dimensional space, based on input from several cameras. One of the limitations of this approach is detecting occupied voxels of objects located far away, since the cameras resolution at such distances is relatively low. In this work, we boost performance by introducing gaze map estimation. Specifically, we show that although no additional sensor is used, gaze map estimation is strong enough and can be used to enhance basic occupancy estimation networks, yielding better Chamfer distance (CD), F-score and intersection over union (IoU) metrics. At long distances, we found an improvement over the baseline of more than 20% in CD, 24% in F-score and 15% in IoU.
|
|
15:00-16:15, Paper MoDT3.5 | Add to My Program |
Empirical Spatial Error Bounds for Reliable Semantic Segmentation of Pedestrians and Riders (I) |
|
Bartels, Timo | Technische Universität Braunschweig |
Stelzer, Malte | Technische Universität Braunschweig |
Bickerdt, Jan | Volkswagen AG |
Schomerus, Volker Patricio | Volkswagen AG |
Piewek, Jan | Volkswagen AG |
Bagdonat, Thorsten | Volkswagen AG |
Fingscheidt, Tim | Technische Universität Braunschweig |
Keywords: Semantic Segmentation Techniques, Deep Learning Based Approaches, Vulnerable Road User Protection Strategies
Abstract: The mean intersection over union (mIoU) is a standard metric for evaluating semantic segmentation models. While steady improvements in mIoU have been achieved on automotive benchmarks like Cityscapes, their impact on reliably detecting vulnerable road users, such as pedestrians and riders, remains unclear. This study empirically analyzes 167 semantic segmentation models w.r.t. the spatial distribution of the false positive rate and false negative rate in the Cityscapes dataset. Our analysis reveals that many segmentation errors occur at object contours, which hardly influence driving decisions and road user safety. Accordingly, we propose to exclude such irrelevant errors. We define spatial error bounds within which models reliably detect pedestrians and riders. Since time-to-collision is strongly related to distance, and a vertical pixel position is roughly related to distance, the vertical position of segmentation errors provides an effective way to evaluate the reliability of semantic segmentation models on an entire dataset. Our evaluation of such empirical spatial error bounds reveals that strong models (w.r.t. mIoU) are related to an improved detection of existing pedestrians (false negative rate, FNR). On the other hand, mIoU in general is only weakly related to hallucinations of pedestrians and riders (false positive rate, FPR). Some models even exhibit a higher FPR despite having a 11.2% absolute higher mIoU.
|
|
15:00-16:15, Paper MoDT3.6 | Add to My Program |
Robust Checkpoint Selection by Exponential Moving Averaging for Domain Generalized Segmentation |
|
Bätje, Marc | University of Luebeck, Institute for Software Engineering and Pr |
Schwonberg, Manuel | CARIAD SE and Technische Universität Berlin |
Bohlke, Henrik | Volkswagen AG |
Leucker, Martin | University of Luebeck, Institute for Software Engineering and Pr |
Keywords: Data Augmentation Techniques Using Neural Networks, Semantic Segmentation Techniques, Deep Learning Based Approaches
Abstract: Deep learning has seen significant progress in applications such as autonomous driving and healthcare, with synthetic data playing an increasingly important role. However, models trained on synthetic data often suffer performance drops in real-world settings due to domain shifts. Domain Generalization (DG) aims to address this problem by devel oping models that can robustly generalize to unseen domains, with a key challenge being the selection of an appropriate checkpoint. For the checkpoint selection problem, we introduce a standardized approach that simultaneously enhances model robustness through the combination of Exponential Moving Average (EMA) and Data Augmentation. Our method reduces performance variability and improves generalization in 25 out of 26 experiments, highlighting EMA as a promising technique for more stable and reliable DG performance in real-world applications.
|
|
15:00-16:15, Paper MoDT3.7 | Add to My Program |
Adaptive Neural Networks for Intelligent Data-Driven Development |
|
Shoeb, Youssef Omar | Technical University of Berlin, Continental AG |
Nowzad, Azarm | Continental |
Gottschalk, Hanno | Institute of Mathematics, TU Berlin |
Keywords: User-Centric Intelligent Vehicle Technologies, Deep Learning Based Approaches
Abstract: Advances in machine learning methods for computer vision tasks have led to their consideration for safety-critical applications like autonomous driving. However, effectively integrating these methods into the automotive development lifecycle remains challenging. Since the performance of machine learning algorithms relies heavily on the training data provided, the data and model development lifecycle play a key role in successfully integrating these components into the product development lifecycle. Existing models frequently encounter difficulties recognizing or adapting to novel instances not present in the original training dataset. This poses a significant risk for reliable deployment in dynamic environments. To address this challenge, we propose an adaptive neural network architecture and an iterative development framework that enables users to efficiently incorporate previously unknown objects into the current perception system. Our approach builds on continuous learning, emphasizing the necessity of dynamic updates to reflect real-world deployment conditions. Specifically, we introduce a pipeline with three key components: (1) a scalable network extension strategy to integrate new classes while preserving existing performance, (2) a dynamic OoD detection component that requires no additional retraining for newly added classes, and (3) a retrieval-based data augmentation process tailored for safety-critical deployments. The integration of these components establishes a pragmatic and adaptive pipeline for the continuous evolution of perception systems in the context of autonomous driving.
|
|
15:00-16:15, Paper MoDT3.8 | Add to My Program |
3D Segment-Based Road Boundary Extraction Method Via Spatio-Temporal Analysis |
|
Yang, Jimin | Tsinghua University |
Nan, Jiangang | Tsinghua University |
Wang, Jianqiang | Tsinghua University |
Xu, Shaobing | Tsinghua University |
Keywords: Static and Dynamic Object Detection Algorithms, Lidar-Based Environment Mapping, Representation Learning for Driving Scenarios
Abstract: Accurate and effective road boundary extraction plays a significant role in the navigation and decision-making processes of self-driving cars. Nevertheless, the reliable detection of road boundaries via 3D LiDAR is particularly difficult due to uneven point cloud density and chaotic vegetation areas. Conventional methods often require time-consuming fitting or clustering algorithms to enhance performance. To this end, this paper presents a road boundary extraction approach particularly focuses on the curbs and vegetation areas, utilizing LiDAR data without fitting or clustering, enabling the generation of detailed road maps. We exploit both single and multiple frames of data in the design, which enables spatio-temporal feature extraction. This strategy is realized by integrating the road boundary detection algorithm and the SLAM technology. The algorithm contains three stages: 1) Coarse Ground Segmentation (CGS), 2) Adaptive Spatial Feature Extraction (A-SFE), 3) Iterative Multi-scale Refinement (I-MSR). Experiments on the KITTI dataset are conducted for verification. The proposed method not only outperforms traditional methods with an average of 85% in key metrics but also demonstrates comparable performance to state-of-the-art deep learning models.
|
|
15:00-16:15, Paper MoDT3.9 | Add to My Program |
Self-Supervised Pretraining for Aerial Road Extraction (I) |
|
Polley, Rupert | FZI Research Center for Information Technology |
Deenadayalan, Sai Vignesh Abishek | FZI Research Cen Ter for Information Technology, |
Zöllner, J. Marius | FZI Research Center for Information Technology; KIT Karlsruhe In |
Keywords: End-to-End Neural Network Architectures and Techniques, Data Augmentation Techniques Using Neural Networks, Semantic Segmentation Techniques
Abstract: Deep neural networks for aerial image segmentation require large amounts of labeled data, but high-quality aerial datasets with precise annotations are scarce and costly to produce. To address this limitation, we propose a self-supervised pretraining method that improves segmentation performance while reducing reliance on labeled data. Our approach uses inpainting-based pretraining, where the model learns to reconstruct missing regions in aerial images, capturing their inherent structure before being fine-tuned for road extraction. This method improves generalization, enhances robustness to domain shifts, and is invariant to model architecture and dataset choice. Experiments show that our pretraining significantly boosts segmentation accuracy, especially in low-data regimes, making it a scalable solution for aerial image analysis.
|
|
15:00-16:15, Paper MoDT3.10 | Add to My Program |
LiDAR and Camera Fusion for Joint Depth Completion and Panoptic Segmentation Tasks in a Unified Network for 3D Semantic Segmentation (I) |
|
Choi, Youn-ho | Chungbuk National University |
Bong, Eunjung | Chungbuk National University |
Kee, Seok-Cheol | Chungbuk National University |
Keywords: 3D Scene Reconstruction Methods, Advanced Multisensory Data Fusion Algorithms, Deep Learning Based Approaches
Abstract: Autonomous driving heavily relies on advanced perception systems composed of various sensor fusions, including LiDAR, cameras, GPS, and IMU. Among these, LiDAR excels at providing 3D information, making it a critical input for many autonomous driving algorithms. However, LiDAR faces challenges like data sparsity, limited long-range detection performance, and high computational requirements. In 3D object detection, achieving high accuracy for objects beyond 70 meters is typically difficult, as shown in Figure 1., making it challenging to respond to sudden appearances or abrupt stops of objects in real driving scenarios. Conversely, cameras offer rich texture and color information but lack in-depth estimation. To overcome the limitations of these sensors while leveraging their strengths, this paper proposes an integrated network architecture that harnesses the advantages of both LiDAR and cameras. In this study, we designed an efficient fusion network through the early fusion of the two sensors, enabling simultaneous high-precision camera-based Panoptic Segmentation and depth completion to complement the sparse LiDAR data. The lightweight network structure achieved an inference speed exceeding 9 FPS on a single RTX 4070 GPU. To implement real-time 3D Semantic Segmentation, we fuse the outputs from each task and implement the algorithm using ROS2 and Autoware. Our results demonstrate that the proposed model, while being lightweight, effectively handles multiple tasks simultaneously and can perform real-time inference, making it highly suitable for real-world applications in autonomous driving and 3D scene reconstruction.
|
|
15:00-16:15, Paper MoDT3.11 | Add to My Program |
FuseRoad: Enhancing Lane Shape Prediction through Semantic Knowledge Integration and Cross-Dataset Training |
|
Hsiao, Heng-Chih | National Chung Cheng University |
Cai, Yi-Chang | National Chung Cheng University |
Lin, Huei-Yung | National Taipei University of Technology |
Wei-Chen, Chiu | National Chiao Tung University |
Chan, Chiao-Tung | National Yang Ming Chiao Tung University |
Wang, Chieh-Chih | National Yang Ming Chiao Tung University |
Keywords: Semantic Segmentation Techniques, Automotive Datasets, Perception Algorithms for Adverse Weather Conditions
Abstract: The rapid evolution of advanced driver assistance systems (ADAS) has been driven by the advances of deep neural networks, and multi-tasking is essential for autonomous driving systems. This paper presents FuseRoad, a new multi-task model that leverages cross-dataset learning to address the dependency on specific multi-task datasets and reduce the annotation costs. It integrates semantic segmentation and lane detection into an end-to-end framework while providing an effective approach to utilize multiple single-task datasets. By incorporating Semantic Road Knowledge Extractor (SRKE) to direct more attentions on the roadway, FuseRoad enhances the accuracy and reliability of lane detection. The model also employs the logit normalization loss to address the issue of overconfidence commonly faced by conventional lane detection methods. In experiments, FuseRoad outperforms state-of-the-art approaches in both accuracy and F-1 score. The evaluation on semantic segmentation metrics also demonstrates that the proposed technique is highly effective for multi-task road scene analysis. Code and datasets are available at https://github.com/HengChihHsiao/FuseRoad.
|
|
15:00-16:15, Paper MoDT3.12 | Add to My Program |
False Positive Sampling-Based Data Augmentation for Enhanced 3D Object Detection Accuracy |
|
Oh, Jiyong | Kookmin University |
Lee, Junhaeng | Kookmin University |
Woongchan, Byun | Kookmin University |
Kong, Minsang | Kookmin University |
Lee, Sang Hun | Kookmin University |
Keywords: Data Augmentation Techniques Using Neural Networks, Deep Learning Based Approaches
Abstract: 3D object detection plays a pivotal role in autonomous driving, and its accuracy has improved substantially in recent years. Among existing augmentation methods for training 3D object detection model, ground truth sampling significantly improves model performance by increasing the number of positive samples in a scene and alleviating class imbalance. However, our experiments reveal that ground truth sampling can excessively expand the model decision boundary, leading to a notable increase in false positives. To address this issue, we propose a novel data augmentation technique called textit{ false positive sampling}, which retrains the model using point clouds misclassified as positive during inference. This approach effectively reduces false positives without sacrificing the number of true positives, resulting in considerable performance gains. Furthermore, false-positive sampling maximizes the class imbalance mitigation effect of ground-truth sampling by leveraging these challenging examples. Our method also improves the model's ability to semantically understand difficult samples that typically cause confusion. Experimental results on standard 3D object detection benchmarks demonstrate the effectiveness of the proposed algorithm in achieving robust and accurate detection.
|
|
15:00-16:15, Paper MoDT3.13 | Add to My Program |
Label Correction for Road Segmentation Using Roadside Cameras |
|
Toikka, Henrik | Aalto University |
Alamikkotervo, Eerik | Aalto University |
Ojala, Risto | Aalto University |
Keywords: Data Annotation and Labeling Techniques, Perception Algorithms for Adverse Weather Conditions, Semantic Segmentation Techniques
Abstract: Reliable road segmentation in all weather conditions is critical for intelligent transportation applications, autonomous vehicles and advanced driver assistance systems. For robust performance, all weather conditions should be included in the training data of deep learning-based perception models. However, collecting and annotating such a dataset requires extensive resources. In this paper, existing roadside camera infrastructure is utilized for collecting road data in varying weather conditions automatically. Additionally, a novel semi-automatic annotation method for roadside cameras is proposed. For each camera, only one frame is labeled manually and then the label is transferred to other frames of that camera feed. The small camera movements between frames are compensated using frequency domain image registration. The proposed method is validated with roadside camera data collected from 927 cameras across Finland over 4 month time period during winter. Training on the semi-automatically labeled data boosted the segmentation performance of several deep learning segmentation models. Testing was carried out on two different datasets to evaluate the robustness of the resulting models. These datasets were an in-domain roadside camera dataset and out-of-domain dataset captured with a vehicle on-board camera. Code used for this study is available here: htoik.github.io/toikka2025label
|
|
15:00-16:15, Paper MoDT3.14 | Add to My Program |
RAILS: Radar Range-Azimuth Map Estimation from Image, LiDAR and Semantic Descriptions |
|
Rangaraj, Pavan Aakash | NXP Semiconductors |
Alkanat, Tunc | NXP Semiconductors |
Pandharipande, Ashish | NXP Semiconductors |
Keywords: Synthetic Data Generation for Training, Automotive Datasets, Deep Learning Based Approaches
Abstract: Synthetic generation of radar range-azimuth (RA) maps remains challenging due to limited automotive radar datasets. While camera images are abundant across diverse driving environments, translating visual information directly to radar representations requires sophisticated multi-modal fusion techniques. In this paper, we introduce RAILS, a novel convolutional autoencoder architecture that synthesizes realistic RA maps by integrating RGB camera images, LiDAR depth information, and semantic scene segmentation. By leveraging a U-Net inspired architecture with convolutional block attention mechanisms, our approach transforms multi modal inputs into accurate radar representations. We demonstrate the model's effectiveness across various driving scenarios using the RADIal dataset, showing superior performance in target localization and scene reconstruction compared to existing methods. Experimental results highlight the potential of using auxiliary depth and semantic information to address the scarcity of radar training data, offering a promising approach for enhancing machine learning based radar perception in autonomous driving systems.
|
|
15:00-16:15, Paper MoDT3.15 | Add to My Program |
A Novel Beam Prediction Scheme Based on Multimodal Data with High Robustness |
|
Lei, Jiahao | Northwestern Polytechnical University |
Jin, Ziteng | Northwestern Polytechnical University |
Li, Xiang | Northwestern Polytechnical University |
Liu, Jiajia | Northwestern Polytechnical University |
Keywords: Advanced Multisensory Data Fusion Algorithms, Cooperative Perception and Localization Techniques, Perception Algorithms for Adverse Weather Conditions
Abstract: With the advancement of the Internet of Vehicles, accurate beam prediction is crucial for maintaining stable and high-quality wireless communication in dynamic environments. A large number of beam prediction schemes have been proposed, which can be broadly categorized into two types: beam prediction schemes based on channel state and side information. However, the beam prediction schemes based on channel state require significant training overhead and computational complexity. In addition, most schemes based on side information neither make effective use of multimodal data (such as camera, lidar, and position), nor consider the impact of environmental noise on prediction accuracy. To address these weaknesses, we propose a novel beam prediction scheme with high robustness based on the multimodal data. Specifically, the scheme first uses a variety of data augmentation methods to reduce the noise interference caused by the adverse environment. Then, we employ ResNet to map heterogeneous data into a unified linear space to achieve effective feature alignment and correspondence. Finally, we exploit the distinctive multi-head attention mechanism of the Transformer model to guarantee that the fused features are both representative and informative. The comparison of numerous numerical results demonstrates that the proposed scheme offers both high robustness and accuracy across various scenarios.
|
|
15:00-16:15, Paper MoDT3.16 | Add to My Program |
RNOSMamba: Boosting Road Negative Obstacles Segmentation Via Vision Mamba from RGB and Depth Images |
|
Dai, Yuqi | Tsinghua University |
Cui, Zhoujuan | Tsinghua University |
Keywords: Advanced Multisensory Data Fusion Algorithms, Deep Learning Based Approaches, Semantic Segmentation Techniques
Abstract: The fusion of RGB and depth information holds significant potential for accurate road negative obstacle identification. However, effectively leveraging these multimodal data for distinguishing fine-grained road surface defects, such as potholes and cracks, remains a challenge. Inspired by the recent progress of multimodal fusion in a variety of computer vision tasks, this paper aims to propose a novel Vision Mamba based Road Negative Obstacle Segmentation framework (RNOSMamba) that leverages the complementary strengths of optical and depth images. Toward this end, optical and depth images in the feature domain are appropriately fused to boost the performance of road negative obstacle segmentation (RNOS). The hierarchical decoder incorporates Cross-Modality State Space (CMSS) blocks and Cross-Scale Feature Fusion (CSFF) modules to refine features and produce precise segmentation masks. Extensive experiments demonstrate that the proposed RNOMamba was able to achieve 68.5% mIoU and 80.7% mF1, highlighting its potential to significantly boost the accuracy of road negative obstacle segmentation.
|
|
15:00-16:15, Paper MoDT3.17 | Add to My Program |
Target-Driven and Student-Centered Knowledge Distillation for Traffic Object Tracking |
|
Ding, Zhicheng | Bowling Green State University |
Lan, Qizhen | University of Alabama at Birmingham |
Tian, Qing | University of Alabama at Birmingham |
Keywords: Dynamic Object Tracking, Deep Learning Based Approaches
Abstract: Visual Object Tracking is crucial for autonomous driving, enabling real-time monitoring of dynamic environments. While Transformer-based trackers achieve state-of-the-art performance by modeling long-range dependencies, their high computational cost limits deployment in real-world autonomous systems. To address this, we propose Target-Driven and Student-Centered Knowledge Distillation (TDSC-KD), a novel framework designed to improve the efficiency of Transformer-based trackers while maintaining accuracy. Our framework consists of (1) target-driven distillation, which leverages a ground-truth query to guide knowledge transfer toward relevant and consistent regions, filtering out background noise, and (2) student-centered distillation, which employs a mask-and-reconstruct mechanism to encourage more active student learning and reduce over-reliance on the teacher. Experiments on the LaSOT-Traffic dataset demonstrate our TDSC-KD's efficacy, narrowing the gap between the strong performance of Transformer trackers and the strict efficiency constraints of real-world deployment.
|
|
MoDT4 Poster Session, Bernini Room |
Add to My Program |
Poster 2.4 >> Cooperation & Communication |
|
|
Chair: Weidl, Galia | University of Applied Sciences Aschaffenburg |
Co-Chair: Tsukada, Manabu | The University of Tokyo |
|
15:00-16:15, Paper MoDT4.1 | Add to My Program |
Optimized Cooperative Car-Following through Lightweight Vehicle-To-Vehicle Intent Sharing |
|
Li, Hangyu | University of Wisconsin-Madison |
Oh, Juyoung | University of Wisconsin-Madison |
Ma, Ke | University of Wisconsin-Madison |
Liang, Zhaohui(Vito) | University of Wisconsin Madison |
Zhang, Peng | University of Wisconsin-Madison |
Li, Xiaopeng | University of Wisconsin-Madison |
Keywords: Multi-Agent Coordination Strategies, Cooperative Planning Strategies in Vehicle Networks
Abstract: Cooperative driving systems are expected to enhance safety, mobility, and efficiency through vehicle connectivity technologies. Lower-level vehicle-to-vehicle (V2V) communication transmits high-frequency status information, such as location, velocity, and acceleration, between vehicles. This approach contributes limitedly to prediction accuracy, requires high-frequency hardware, and is sensitive to communication delays. Recent studies have shown that intent sharing, which conveys planning trajectories, significantly improves prediction accuracy and control performance but requires higher bandwidth. However, mainstream vehicle communication methods struggle to balance cost and bandwidth for effective intent sharing. High-bandwidth wireless communication methods such as dedicated short-range communication (DSRC) and cellular vehicle-to-everything (C-V2X) cost much for devices, while low-cost visible light communication (VLC) can hardly support the necessary bandwidth. To address this challenge, we propose a lightweight intent sharing approach that reduces data transmission volume while maintaining prediction accuracy. Specifically, intended velocity trajectories are represented using regressed polynomial functions over a fixed time period, requiring only the transmission of polynomial coefficients and a timestamp for synchronization. The feasibility of this approach is demonstrated through simulations of car-following behavior using a Linear–Quadratic Regulator (LQR). Additionally, real vehicle experiments using a designated velocity cycle further validate the method. Results show that both planned and actual trajectories of the following vehicle closely align with those using ideal intent sharing approaches under significantly reduced communication data volume.
|
|
15:00-16:15, Paper MoDT4.2 | Add to My Program |
Occlusion-Aware Planning for Connected and Automated Vehicles with Cooperative Perception at Unsignalized Intersection (I) |
|
Su, Hao | Osaka University |
Arakawa, Shin'ichi | Osaka University |
Murata, Masayuki | Osaka University |
Keywords: Cooperative Planning Strategies in Vehicle Networks, Behavior Assessment Using Cooperative Data, Data Sharing and Privacy in V2X Systems
Abstract: Achieving safe and efficient navigation in urban environments, where occlusions frequently occur, is a persistent challenge for autonomous driving. As a promising solution, cooperative perception (CP) has attracted significant attention due to its advantages in enhancing individual sensing capabilities. In this context, this paper proposes an occlusion-aware motion planner integrated with CP to assist individual vehicles in optimizing their speed, thereby minimizing risks while ensuring swift arrival. Specifically, the proposed planning framework is designed using a sequential pipeline. During each timestamp, sensor features, vehicle motion, and map contextual information from the edge server are shared among vehicles for cooperative object tracking and occlusion analysis. Subsequently, the potential risk of each occluded area is represented by the appearance probability of phantom objects, which is capable of adapting to changes in multiple viewpoints. Next, an association module is designed to identify the correspondence between phantom and existent objects, enhancing risk assessment performance. Finally, predicted information from the observation and motion spaces is fed into the planning and control module for reference speed planning and vehicle action control. Simulated evaluations demonstrate that our approach delivers safer and more efficient driving policies in challenging occlusion scenarios compared with the baseline, which uses only onboard sensors, or methods that fuse only single-view perceptions.
|
|
15:00-16:15, Paper MoDT4.3 | Add to My Program |
Multi-Agent Service Migration Strategy under VEC |
|
Ye, Lei | Chongqing University |
Chen, Yulan | Chongqing University |
Han, Qingwen | Chongqing University |
Zeng, Lingqiu | Chongqing University |
Ling, Kaiwen | Chongqing University |
Keywords: Multi-Agent Coordination Strategies, V2X Communication Protocols and Standards, Vehicle-to-Infrastructure (V2I) Communication
Abstract: As Intelligent Transport Systems (ITS) advance, the Internet of Vehicles (IoV) improves traffic efficiency and safety. Vehicle Edge Computing (VEC) provides strong computing and storage capabilities. However, high vehicle speeds require efficient service migration to maintain service continuity. This study proposes a real-time service migration strategy that optimizes Quality of Service (QoS) and response latency. To support dynamic decision-making in the VEC framework, this study introduces Priority Experience Replay and Four-Trajectory exploration (PERFT), recurrent neural networks, and attention mechanisms into the Proximal Policy Optimization (PPO) algorithm. This leads to the PERFT-PPO, which addresses issues such as long sequence handling, sparse reward problems, and problems of limited exploration of long trajectories in single-agent scenarios. To address multi-agent competition in VEC, this study integrates Centralized Training and Distributed Execution (CTDE) into PERFT-PPO, creating the Prioritized Experience Replay Four-Trajectory Multi-Agent Proximal Policy Optimization (PERFT-MAPPO). Experimental results demonstrate significant improvements in real-time decision-making and overall system performance.
|
|
15:00-16:15, Paper MoDT4.4 | Add to My Program |
Evaluation of Mobile Environment for Vehicular Visible Light Communication Using Multiple LEDs and Event Cameras |
|
Soga, Ryota | Nagoya University |
Shiba, Shintaro | Woven by Toyota |
Kong, Quan | Woven by Toyota, Inc |
Kobori, Norimasa | Woven by Toyota Inc |
Shimizu, Tsukasa | Toyota Motor Corporation |
Lu, Shan | Nagoya University |
Yamazato, Takaya | Nagoya University |
Keywords: V2X Communication Protocols and Standards, Vehicle-to-Infrastructure (V2I) Communication, Data Sharing and Privacy in V2X Systems
Abstract: In the fields of Advanced Driver Assistance Systems (ADAS) and Autonomous Driving (AD), sensors that serve as the ``eyes'' for sensing the vehicle's surrounding environment are essential. Traditionally, image sensors and LiDAR have played this role. However, a new type of vision sensor, event cameras, has recently attracted attention. Event cameras respond to changes in the surrounding environment (e.g., motion), exhibit strong robustness against motion blur, and perform well in high dynamic range environments, which are desirable in robotics applications. Furthermore, the asynchronous and low-latency principles of data acquisition makes event cameras suitable for optical communication. By adding communication functionality to event cameras, it becomes possible to utilize I2V communication to immediately share information about forward collisions, sudden braking, and road conditions, thereby contributing to hazard avoidance. Additionally, receiving information such as signal timing and traffic volume enables speed adjustment and optimal route selection, facilitating more efficient driving. In this study, we construct a vehicle visible light communication system where event cameras are receivers, and multiple LEDs are transmitters. In driving scenes, the system tracks the transmitter positions and separates densely packed LED light sources using pilot sequences based on Walsh-Hadamard codes. As a result, outdoor vehicle experiments demonstrate error-free communication under conditions where the transmitter-receiver distance was within 40 meters and the vehicle's driving speed was 30 km/h (8.3 m/s).
|
|
15:00-16:15, Paper MoDT4.5 | Add to My Program |
MAGNNET: Multi-Agent Graph Neural Network-Based Efficient Task Allocation for Autonomous Vehicles with Deep Reinforcement Learning |
|
Ratnabala, Lavanya | Student, Skolkovo Institute of Science and Technology |
Fedoseev, Aleksey | Skolkovo Institute of Science and Technology |
Peter vimalathas, Robinroy | Skoltech |
Tsetserukou, Dzmitry | Skolkovo Institute of Science and Technology |
Keywords: Cooperation between UAVs and Ground Vehicles, Multi-Agent Coordination Strategies, Reinforcement Learning for Planning
Abstract: This paper addresses the challenge of decentralized task allocation within heterogeneous multi-agent systems operating under communication constraints. We introduce a novel framework that integrates Graph Neural Networks (GNNs) with a centralized training and decentralized execution (CTDE) paradigm, further enhanced by a tailored Proximal Policy Optimization (PPO) algorithm for multi-agent deep reinforcement learning (MARL). Our approach enables unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) to dynamically allocate tasks efficiently without necessitating central coordination in a 3D grid environment. The framework minimizes total travel time while simultaneously avoiding conflicts in task assignments. For the cost calculation and routing, we employ reservation-based A* and R* path planners. Experimental results revealed that our method achieves a high 92.5% conflict-free success rate, with only a 7.49% performance gap compared to the centralized Hungarian method, while outperforming the heuristic decentralized baseline based on a greedy approach. Additionally, the framework exhibits scalability with up to 20 agents with allocation processing of 2.8 s and robustness in responding to dynamically generated tasks, underscoring its potential for real-world applications in complex multi-agent scenarios.
|
|
15:00-16:15, Paper MoDT4.6 | Add to My Program |
Sticky-PRE: A Sticky Proxy Re-Encryption Protocol for Persistent Vehicle Data Privacy |
|
Ashutosh, Ashish | University of Passau |
Hasan, Omar | INSA Lyon |
Baishnav, Pratik | University of Passau |
Kosch, Harald | Unviersiy of Passau |
Brunie, Lionel | INSA Lyon |
Keywords: Data Sharing and Privacy in V2X Systems, V2X Communication Protocols and Standards, Decision Making
Abstract: Connected vehicles generate a vast amount of data and share it with external entities such as the cloud, neighboring vehicles, Road-Side Units (RSUs), and other third-party services in a Vehicle to Everything (V2X) setting. This data is vulnerable and can lead to the leakage of personal information of vehicle owners, such as driving habits, travel routes, and identity theft, among others. Moreover, with the implementation of the General Data Protection Regulation (GDPR), it becomes imperative to empower users with control over their data and the ability to choose whom they share it with. To address this objective, we present a protocol that utilizes a combination of sticky policies and a proxy re-encryption scheme. This protocol ensures that user-defined access controls on the data persist even when crossing organizational boundaries and addresses the confidentiality, integrity, and accountability of vehicle data. Furthermore, we assess our protocol using a semi-honest threat model and analyze its vulnerabilities. Lastly, we perform a quantitative analysis of the data flow model to observe the system's performance.
|
|
15:00-16:15, Paper MoDT4.7 | Add to My Program |
A Vehicle-Infrastructure Multi-Layer Cooperative Decision-Making Framework |
|
Cui, Yiming | Tongji University |
Fang, Shiyu | Tongji University |
Hang, Peng | Tongji University |
Sun, Jian | Tongji University |
Keywords: Cooperative Planning Strategies in Vehicle Networks, Multi-Agent Coordination Strategies, Vehicle-to-Infrastructure (V2I) Communication
Abstract: Autonomous driving has entered the testing phase, but due to the limited decision-making capabilities of individual vehicle algorithms, safety and efficiency issues have become more apparent in complex scenarios. With the advancement of connected communication technologies, autonomous vehicles equipped with connectivity can leverage vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communications, offering a potential solution to the decision-making challenges from individual vehicle's perspective. We propose a multi-level vehicle-infrastructure cooperative decision-making framework for complex conflict scenarios at unsignalized intersections. First, based on vehicle states, we define a method for quantifying vehicle impacts and their propagation relationships, using accumulated impact to group vehicles through motif-based graph clustering. Next, within and between vehicle groups, a pass order negotiation process based on Large Language Models (LLM) is employed to determine the vehicle passage order, resulting in planned vehicle actions. Simulation results from ablation experiments show that our approach reduces negotiation complexity and ensures safer, more efficient vehicle passage at intersections, aligning with natural decision-making logic.
|
|
15:00-16:15, Paper MoDT4.8 | Add to My Program |
Robust Collaborative Perception: Combining Adversarial Training with Consensus Mechanism for Enhanced V2X Security |
|
Poibrenski, Atanas | German Research Center for Artificial Intelligence (DFKI) |
Nozarian, Farzad | German Research Center for Artificial Intelligence (DFKI) |
Rezaeianaran, Farzaneh | Saarland University, DFKI - Saarbrücken |
Müller, Christian | German Research Center for Artificial Intelligence |
Keywords: Misbehavior Detection Using Shared Data and Messages, Cooperative Perception and Localization Techniques
Abstract: Collaborative perception enhances the robustness and accuracy of autonomous systems by leveraging shared perceptual data across agents, particularly through feature-level fusion, which balances communication efficiency with contextual preservation. However, this data-sharing introduces vulnerabilities, as adversaries can inject malicious perturbations, compromising system reliability in safety-critical scenarios. In this work, we address the adversarial robustness of feature-level fusion in collaborative perception under white-box untargeted attack settings. We propose a novel framework that combines adversarial training with a consensus mechanism, enhancing resilience to adversarial perturbations in a model-agnostic manner. Our approach not only improves robustness against attacks but also enhances performance on clean data, achieving at least 5% improvement in average precision. Extensive experiments on the V2XSet dataset with four adversarial attack types and two collaborative perception methods demonstrate the effectiveness of our method, outperforming consensus defense and adversarial training alone consistently under different adversarial perturbation magnitudes. These findings underscore the potential of our approach to advance secure and reliable collaborative perception systems.
|
|
15:00-16:15, Paper MoDT4.9 | Add to My Program |
Towards V2X HD Mapping for Autonomous Driving: A Concise Review |
|
Xiao, Xu | Navinfo. Co. Ltd |
Yang, Suhui | Navinfo Co., LTD |
Fan, Miao | NavInfo Co., Ltd |
Xu, Shengtong | Autohome Inc |
Liu, Xiangzeng | Xidian University |
Hu, Wenbo | Hefei University of Technology |
Xiong, Haoyi | Baidu Inc |
Keywords: Cooperative Perception and Localization Techniques, Crowdsourced Localization and Mapping
Abstract: High-definition (HD) maps are fundamental components to autonomous driving systems, providing essential centimeter-level accuracy and lane-level semantic information. While traditional HD mapping methods have evolved into online learning approaches, current solutions face significant challenges due to sensor limitations and environmental constraints. This paper presents a systematic review of HD map construction methods, tracking their evolution from conventional techniques to advanced Vehicle-to-Everything (V2X) cooperative mapping enabled by edge computing and communication technologies. Through comprehensive analysis of methodologies, algorithms, and datasets, we identify critical challenges in current HD mapping systems. Our review encompasses three key domains: traditional mapping methods, online learning approaches, and V2X cooperative construction of HD maps. We evaluate existing solutions against standardized metrics, compare their effectiveness, and outline promising directions for future research. This work provides researchers and practitioners with a structured understanding of the HD mapping landscape and highlights opportunities for advancing autonomous driving systems.
|
|
15:00-16:15, Paper MoDT4.10 | Add to My Program |
Systematic Derivation of Generic Scenarios for Cooperative Perception Systems |
|
Stang, Christopher | ZF Friedrichshafen AG |
Hay, Julian | ZF Friedrichshafen AG |
Bogenberger, Klaus | Technical University of Munich |
Weidl, Galia | University of Applied Sciences Aschaffenburg |
Keywords: Cooperative Perception and Localization Techniques, Safety Verification and Validation Techniques, Cooperative Planning Strategies in Vehicle Networks
Abstract: As one part of encountering present and future mobility challenges, intelligent transportation systems (ITS) have been developed over the years. As the overall goal, such systems aim to improve traffic efficiency and safety. Being an integral part of ITS for the fulfillment of the safety objective, cooperative perception realized by Vehicle-to-Everything communication (V2X) is considered a main contributor to enhancing safety. In recent years, several studies have explored the potential of such systems in safety use cases. As for driving assistance and automation systems, a common way of testing cooperative driving systems relies on the scenario-based testing approach. However, a systematic strategy is necessary to identify scenarios covering the majority of relevant situations for such applications. For defining an extensive set of scenarios, this study comprehensively analyzes the data set of the German In-Depth Accident Study (GIDAS) from the perspective of V2X-Relevance. The proposed methodology shows a systematic approach to identify relevant accident types and derive generic scenarios. In addition, this approach offers the possibility of defining standard test cases for future regulations.
|
|
15:00-16:15, Paper MoDT4.11 | Add to My Program |
Connected Vehicle Experiments on Virtual Rings: Unveiling Bistable Behavior |
|
Szaksz, Bence | Budapest University of Technology and Economics, Department of A |
Molnar, Tamas Gabor | Wichita State University |
Avedisov, Sergei | Toyota Motor North America R&D - InfoTech Labs |
Stepan, Gabor | Budapest University of Technology and Economics, Department of A |
Orosz, Gabor | University of Michigan |
Keywords: Cooperative Planning Strategies in Vehicle Networks, Multi-Agent Coordination Strategies, Behavior Assessment Using Cooperative Data
Abstract: The nonlinear dynamics of vehicles on a virtual ring is investigated. A vehicle chain is considered where a connected automated vehicle (CAV) driving at the head of the chain receives the state of a connected human-driven vehicle (CHV) at the tail. The controller of the CAV is constructed in a way that the CHV is projected in front of it; this closes a virtual ring. We construct the corresponding mathematical model and analyze the effect of nonlinearities with numerical continuation. Then, we present real car experiments with two CHVs and one CAV. Both the theoretical results and the experiments show bistable behavior for certain control parameters. The results provide an essential support for parameter tuning during the control design of CAVs.
|
|
15:00-16:15, Paper MoDT4.12 | Add to My Program |
Evaluation of Coordination Strategies for Underground Automated Vehicle Fleets in Mixed Traffic |
|
Mironenko, Olga | Örebro University |
Banaee, Hadi | Örebro University, Sweden |
Loutfi, Amy | Örebro University |
Keywords: Multi-Agent Coordination Strategies
Abstract: This study investigates the efficiency and safety outcomes of implementing different adaptive coordination models for automated vehicle (AV) fleets, managed by a centralized coordinator that dynamically responds to human-controlled vehicle behavior. The simulated scenarios replicate an underground mining environment characterized by narrow tunnels with limited connectivity. To address the unique challenges of such settings, we propose a novel metric — Path Overlap Density (POD) — to predict efficiency and potentially the safety performance of AV fleets. The study also explores the impact of map features on AV fleets performance. The results demonstrate that both AV fleet coordination strategies and underground tunnel network characteristics significantly influence overall system performance. While map features are critical for optimizing efficiency, adaptive coordination strategies are essential for ensuring safe operations.
|
|
15:00-16:15, Paper MoDT4.13 | Add to My Program |
Robust V2I Channel Prediction: A Generative Approach with Implicit State Evolution |
|
Jin, Ziteng | Northwestern Polytechnical University |
Liu, Jiajia | Northwestern Polytechnical University |
Keywords: Vehicle-to-Infrastructure (V2I) Communication, Motion Forecasting, Perception Algorithms for Adverse Weather Conditions
Abstract: Intelligent transportation plays a vital role in modern urban sustainability by enhancing traffic efficiency, ensuring safety, and mitigating environmental impact. Vehicle-to-Everything (V2X) technology, particularly Vehicle-to-Infrastructure (V2I), is fundamental to this vision, enabling seamless collaboration between vehicles and infrastructure. However, reliable communication in V2I faces challenges due to high-speed mobility, dynamic environments, and stringent latency requirements. While channel alignment methods such as scanning, tracking, and prediction offer some solutions, they struggle with efficiency and adaptability in real-world conditions. To address these limitations, this paper proposes a channel prediction model based on generative learning with implicit state evolution, capturing nonlinear mappings between channel state information and communication dynamics. Additionally, a side information module incorporating temporal and spatial data enhances adaptability to varying traffic and environmental conditions. Extensive evaluations across different weather and traffic densities demonstrate the model’s robustness and superior predictive accuracy. The proposed approach provides a reliable and efficient solution for dynamic V2I communication, offering new insights for future intelligent transportation systems.
|
|
15:00-16:15, Paper MoDT4.14 | Add to My Program |
V2X-Gaussians: Gaussian Splatting for Multi-Agent Cooperative Dynamic Scene Reconstruction |
|
Jagtap, Abhishek | Technische Hochschule Ingolstadt |
Song, Rui | Fraunhofer IVI |
Tiptur Sadashivaiah, Sanath | Fraunhofer IVI |
Festag, Andreas | Technische Hochschule Ingolstadt |
Keywords: Scalable Neural Scene Representation, Cooperative Perception and Localization Techniques, 3D Scene Reconstruction Methods
Abstract: Recent advances in neural rendering, such as NeRF and Gaussian Splatting, have shown great potential for dynamic scene reconstruction in intelligent vehicles. However, existing methods rely on a single ego vehicle, suffering from limited field-of-view and occlusions, leading to incomplete reconstructions. While V2X communication may provide additional information from roadside infrastructure, it often degrades reconstruction quality due to sparse overlapping views. In this paper, we propose V2X-Gaussians, the first framework integrating V2X communication into Gaussian Splatting. Specifically, by leveraging deformable Gaussians and an iterative V2X-aware cross-ray densification approach, we enhance infrastructure-aided neural rendering and address view sparsity in V2X scenarios. In addition, to support systematic evaluation, we introduce a standardized benchmark for V2X scene reconstruction. Experiments on real-world data show that our method outperforms state-of-the-art approaches by +2.09 PSNR with only 561.8 KB for periodic V2X-data exchange, highlighting the benefits of incorporating roadside infrastructure into neural rendering for intelligent transportation systems. Our code and benchmark are publicly available under an open-source license.
|
|
15:00-16:15, Paper MoDT4.15 | Add to My Program |
CoopScenes: Multi-Scene Infrastructure and Vehicle Data for Advancing Collective Perception in Autonomous Driving |
|
Voßhans, Marcel | University of Applied Science Esslingen |
Baumann, Alexander | University of Applied Science Esslingen |
Drüppel, Matthias | Baden-Wuerttemberg Cooperative State University (DHBW) |
Ait Aider, Omar | Université Clermont Auvergne |
Mezouar, Youcef | Institut Pascal |
Dang, Thao | University of Applied Sciences, Esslingen |
Enzweiler, Markus | Esslingen University of Applied Sciences |
Keywords: Automotive Datasets, Cooperative Perception and Localization Techniques, Behavior Assessment Using Cooperative Data
Abstract: The increasing complexity of urban environments has underscored the potential of effective collective perception systems. To address these challenges, we present the CoopScenes dataset, a large-scale, multi-scene dataset that provides synchronized sensor data from both the ego-vehicle and the supporting infrastructure. The dataset provides 104 minutes of spatially and temporally synchronized data at 10 Hz, resulting in 62,000 frames. It achieves competitive synchronization with a mean deviation of only 2.3 ms. Additionally the dataset includes a novel procedure for precise registration of point cloud data from the ego-vehicle and infrastructure sensors, automated annotation pipelines, and an open-source anonymization pipeline for faces and license plates. Covering nine diverse scenes with 100 maneuvers, the dataset features scenarios such as public transport hubs, city construction sites, and high-speed rural roads across three cities in the Stuttgart region, Germany. The full dataset amounts to 527 GB of data and is provided in the .4mse format, making it easily accessible through our comprehensive development kit. By providing precise, large-scale data, CoopScenes facilitates research in collective perception, real-time sensor registration, and cooperative intelligent systems for urban mobility, including machine learning-based approaches.
|
|
15:00-16:15, Paper MoDT4.16 | Add to My Program |
Towards Intelligent Control Centers: Case-Based Reasoning for Waypoint Assistance |
|
Gontscharow, Martin | FZI Research Center for Information Technology; KIT Karlsruhe In |
Orf, Stefan | FZI Research Center for Information Technology |
Schotschneider, Albert | FZI Research Center of Information Technologies |
Fleck, Tobias | FZI Research Center for Information Technology |
Zöllner, J. Marius | FZI Research Center for Information Technology; KIT Karlsruhe In |
Keywords: Teleoperation Control Systems for Vehicles, Multi-Agent Coordination Strategies
Abstract: Autonomous-driving research has traditionally focused on refining on-board intelligence for individual vehicles. Recent studies highlight the potential of intelligent control centers—equipped with machine learning capabilities—to complement on-board systems by sharing insights across an entire fleet and assisting with corner cases in real time. In this paper, we advance the concept of intelligent control centers through a case-based reasoning approach for remote waypoint assistance. Our system captures and reuses operator interventions, converting human-generated solutions for unforeseen obstacles into transferable cases. The remote operator remains in the loop to validate every suggested waypoint, ensuring safety before execution. Preliminary field tests with two autonomous shuttles demonstrate the feasibility of retrieving previous waypoint interventions under realistic conditions. Our initial small-scale results indicate that (i) the prototype can achieve performance comparable to fully manual interventions and (ii) solutions devised for one vehicle can effectively be transferred to another. Together, these outcomes lay a solid foundation for more sophisticated, learning-based control-center architectures.
|
|
15:00-16:15, Paper MoDT4.17 | Add to My Program |
Automatic Cause Determination in Road Scene Understanding Using Qualitative Reasoning and Four-Valued Logic (I) |
|
Belmecheri, Nassim | SIMULA Research Laboratory |
Gotlieb, Arnaud | Simula Research Laboratory |
Lazaar, Nadjib | LISN, CNRS, Paris-Saclay University |
Spieker, Helge | Simula Research Laboratory |
Keywords: Behavior Assessment Using Cooperative Data, User-Centric Intelligent Vehicle Technologies, Representation Learning for Driving Scenarios
Abstract: Road scene understanding in automated driving (AD) aims to build a comprehensive analysis of video sequences taken on the road by embedded or fixed cameras (e.g., mounted on vertical road signals). One goal is to identify the relevant actors in the scene and another goal is to determine the causes that have triggered a specific action of the ego car (i.e., stop, slow down, turn left, etc.). In a complex urban environment, these causes can be multiple, confusing, possibly contradictory to other causes and not easily expressible using simplistic reasoning. Still, providing accurate automatic cause determination supports a) user acceptance by providing appropriate explanations to the car passengers and road users; b) increased road safety by providing detailed road scene understanding to traffic. In this paper, we propose using spatiotemporal reasoning and Belnap's four-valued logic to formulate complex causes of AD action in a road scene. We compute these causes by analysing a Qualitative eXplainable Graph (QXG), which is an abstract representation of the road scene capturing spatiotemporal relations between road entities. Starting from a QXG, our approach called CaIdLogic, is targeted to determine complex causes of a selected AD action occurring in a specific frame of a road scene. The usefulness of CaIdLogic is demonstrated on several scenes extracted from the well-known NuScene dataset.
|
| |