Energy Measurements for Vision Transformer on NVIDIA Orin

Introduction

Staying competitive in the automotive industry requires deploying increasingly powerful and versatile deep learning models in order to deliver the latest ADAS and autonomous driving features. As models become more demanding, so too do the power and thermal constraints of the embedded environment, often requiring more expensive hardware to achieve real-time performance.

The Embedl Model Optimization SDK addresses this challenge by using state-of-the-art hardware-aware model optimization algorithms for efficient deployment within tight energy constraints, allowing you to get the most out of your hardware to achieve top-tier performance of your deep learning models without increasing hardware costs. In addition to reduced component costs, reducing the energy demands of deep learning models leads to more sustainability and improved thermal management, translating directly to greater component longevity and reliability.

Energy profile of a single inference for models with and without optimization using Embedl’s Model Optimization SDK. The compressed model not only has a much lower latency, but also peak power consumption and average power use are reduced substantially. Shaded represents the range of power consumption over a series of measurements.

The energy savings that Embedl offers can be illustrated using a machine vision model NVIDIA’s Jetson AGX Orin. Deploying a Vision Transformer model on the Orin requires approximately 70 mJ of energy per inference, with an average power rate of 35.5 W and a peak power requirement of 40.6 W. Compressing the Vision Transformer using Embedl’s Model Optimization SDK reduced the latency by 45% while also reducing the peak power consumption by 23.8% and the average power consumption by 23.6%. The energy per inference was decreased to 30 mJ with a drop in accuracy of less than 1%.

Are you deploying a deep learning model within a tight energy budget? Embedl’s Model Optimization SDK is a versatile tool that combines quantization, structured and unstructured pruning, and NAS to streamline a model’s architecture in order to achieve the best possible performance on your hardware. The flexibility of the tools allows optimization according to any quantifiable metric, including total energy consumption or peak power. Embedl’s tools make it easy to profile a deep learning model’s execution in detail, isolating the impact of individual layers so that maximum performance benefits can be gained with a minimum impact on accuracy.

Case Study

57% reduced Energy consumption per inference on Nvidia Orin

Introduction

DISCOVER THE POWER OF EMBEDL!

Ready to unlock next-level AI optimization?