Skip to content

Defense Solutions

NVIDIA Isaac

Accelerate Your Product Design, Simulation, and Visualization Workflows

Optimize Performance

NVIDIA NIM and CUDA-X microservices provided an optimized runtime and easy to use building blocks to streamline generative AI development.

Deploy With Confidence

Protect company data and intellectual property with ongoing monitoring for security vulnerabilities and ownership of model customizations.

Run Anywhere

Standards-based, and containerized microservices are certified to run on the cloud, in the data center, and on workstations.

Enterprise-Grade

Predictable production branches for API stability, management software, and NVIDIA Enterprise Support helps keep projects on track.

NVIDIA® TensorRT™ is an ecosystem of APIs for high-performance deep learning inference. TensorRT includes an inference runtime and model optimizations that deliver low latency and high throughput for production applications. The TensorRT ecosystem includes TensorRT, TensorRT-LLM, TensorRT Model Optimizer, and TensorRT Cloud.
NVIDIA TensorRT Benefits

Speed Up Inference by 36X
NVIDIA TensorRT-based applications perform up to 36X faster than CPU-only platforms during inference. TensorRT optimizes neural network models trained on all major frameworks, calibrates them for lower precision with high accuracy, and deploys them to hyperscale data centers, workstations, laptops, and edge devices.

Optimize Inference Performance
TensorRT, built on the CUDA® parallel programming model, optimizes inference using techniques such as quantization, layer and tensor fusion, and kernel tuning on all types of NVIDIA GPUs, from edge devices to PCs to data center.

Accelerate Every Workload
TensorRT provides post-training and quantization-aware training techniques for optimizing FP8, INT8, and INT4 for deep learning inference. Reduced-precision inference significantly minimizes latency, which is required for many real-time services, as well as autonomous and embedded applications.

Deploy, Run, and Scale With Triton
TensorRT-optimized models are deployed, run, and scaled with NVIDIA Triton™ inference-serving software that includes TensorRT as a backend. The advantages of using Triton include high throughput with dynamic batching, concurrent model execution, model ensembling, and streaming audio and video inputs.