6x Faster Ultra Fast Lane Detection v2 on TI TDA4VM

Introduction

Lane detection is a critical component in autonomous driving and advanced driver-assistance systems (ADAS). It enables vehicles to recognize lane markings on the road, ensuring safe navigation and adherence to traffic rules. As the demand for real-time, on-device processing grows, deploying high-performing lane detection models on resource-constrained hardware becomes a significant challenge.

In this case study, we'll explore how we optimized the Ultra Fast Lane Detection v2 model to run efficiently on the Texas Instruments TDA4VM hardware platform using Embedl's Model Optimization SDK. Through hardware-aware pruning, we achieved a 6x speedup with less than 1% accuracy drop, making real-time lane detection feasible on edge devices.

Ultra Fast Lane Detection v2: A Brief Overview

The Ultra Fast Lane Detection v2 model is a state-of-the-art solution [2] designed to address both the efficiency challenges and the difficulties posed by complex driving scenarios in lane detection. Traditional methods often rely on pixel-wise segmentation, which can be computationally intensive and struggle with conditions like severe occlusions or extreme lighting where visual cues are minimal or absent.

To address these issues, Ultra Fast Lane Detection v2 introduces an innovative approach that represents lanes using sparse coordinates on a series of predefined hybrid anchors. This method combines both row anchors and column anchors to match the orientation of different lanes:

Row Anchors: Used for vertical lanes (typically the lanes directly ahead of the vehicle), capturing points at specific horizontal positions along predefined rows.
Column Anchors: Applied to horizontal or side lanes, capturing points at specific vertical positions along predefined columns.

Illustration of the hybrid anchor system. In the example, two of the lanes are represented using row anchors and one with column anchors. Image source: [2]

By aligning the anchor type with the lane's orientation, the model minimizes localization errors that occur when using a single anchor type for all lanes. This hybrid anchor system effectively reduces computational load by focusing only on key lane points, rather than processing every pixel.

Moreover, the model formulates lane detection as an anchor-driven ordinal classification problem. Instead of regressing exact coordinates, it classifies lane positions into ordered categories along the anchors. This leverages the natural order of lane positions and allows the model to utilize global contextual features, enhancing its ability to detect lanes even in scenarios with minimal visual cues.

The Challenge: Deploying on TI TDA4VM

The TI TDA4VM is a high-performance system-on-chip (SoC) designed for ADAS applications. It combines AI acceleration with automotive interfaces but has constraints in terms of memory footprint and computational resources compared to server-grade GPUs.

Key constraints include:

Limited memory bandwidth: Affects the ability to handle large models.
Compute limitations: Requires models to be optimized for the embedded AI accelerators.
Real-time processing demands: Necessitates low-latency inference.

These constraints mean that deploying the unoptimized Ultra Fast Lane Detection v2 model directly onto the TDA4VM is not feasible.

The Solution: Embedl's Hardware-Aware Pruning

To overcome these challenges, we turned to Embedl's Model Optimization SDK, which specializes in compressing and accelerating deep learning models for specific hardware targets.

Hardware-Aware Pruning is a technique that reduces model size and complexity by selectively removing less important weights, guided by the hardware's characteristics as well as the loss function to maintain as much accuracy as possible.

The pruning method was applied for a few different latency targets, and the resulting models fine-tuned and measured on the device, resulting in a plot of the trade-off between latency and accuracy for this use-case. By allowing a maximum of a 1% accuracy drop, we see that we can get a model that is 6x as fast as the original model on the device, corresponding to 76 FPS i.e. well within the requirements of a real-time system.

Results: Efficiency Without Compromise

By leveraging Embedl's hardware-aware pruning, we achieved:

Model Size Reduction: The optimized model is 94% smaller, fitting comfortably within the TDA4VM's memory constraints.
6x Inference Speedup: Real-time processing is now achievable, meeting the low-latency requirements of ADAS applications.
Minimal Accuracy Loss: The accuracy drop was kept within 1%, ensuring the reliability of lane detection remains high.

Conclusion

Optimizing complex deep learning models for edge deployment is crucial for the advancement of autonomous driving technologies. By using Embedl's Model Optimization SDK, we successfully compressed the Ultra Fast Lane Detection v2 model to run efficiently on the TI TDA4VM hardware. This not only made deployment possible but also enhanced performance significantly, all while maintaining near-original accuracy.

Interested in optimizing your models for edge devices? Learn more about Embedl's solutions.

[1] Hang Xu et al., 2020. CurveLane-NAS: Unifying Lane-Sensitive Architecture Search and Adaptive Point Blending. CoRR, vol. abs/2007.12147. Available at: https://arxiv.org/abs/2007.12147.

[2] Qin et al., 2022. Ultra Fast Deep Lane Detection with Hybrid Anchor Driven Ordinal Classification. arXiv:2206.07389. Available at: https://arxiv.org/abs/2206.07389.

Case Study