10th Indian Delegation to Dubai, Gitex & Expand North Star – World’s Largest Startup Investor Connect
All News

NVIDIA Introduces TensorRT-LLM To Accelerate LLM Inference on H100 GPUs

NVIDIA recently announced it  is set to release TensorRT-LLM in coming weeks, an open source  software that promises to accelerate and optimize LLM inference.

TensorRT-LLM encompasses a host of optimizations, pre- and post-processing steps, and multi-GPU/multi-node communication primitives, all designed to unlock unprecedented performance levels on NVIDIA GPUs. 

Notably, this software empowers developers to experiment with new LLMs, offering peak performance and customization capabilities without necessitating expertise in C++ or NVIDIA CUDA.

Naveen Rao, Vice President of Engineering at Databricks, lauded TensorRT-LLM, describing it as “easy to use, feature-packed with streaming of tokens, in-flight batching, paged-attention, quantization, and more.” He emphasized that it delivers state-of-the-art performance for LLMs on NVIDIA GPUs, ultimately benefiting customers with cost savings.

Performance benchmarks demonstrate the significant improvements brought by TensorRT-LLM on the latest NVIDIA Hopper architecture. For instance, the H100 alone is 4x faster than A100. Adding TensorRT-LLM and its benefits, including in-flight batching, result in an 8X total increase to deliver the highest throughput. 

Furthermore, TensorRT-LLM demonstrated its ability to accelerate inference performance for Meta’s 70-billion-parameter Llama 2 model by a staggering 4.6x when compared to A100 GPUs.

Today’s LLMs are incredibly versatile, serving a multitude of tasks with varying output sizes. TensorRT-LLM addresses this challenge with in-flight batching, an optimized scheduling technique that allows for the concurrent execution of requests.

With the rapid innovation in the LLM ecosystem and the emergence of larger, more advanced models, the need for multi-GPU coordination and optimization has become paramount. TensorRT-LLM leverages tensor parallelism, a model parallelism technique, to efficiently scale LLM inference across multiple GPUs and servers. This automation eliminates the need for developers to manually split models and manage execution across GPUs.

TensorRT-LLM also equips developers with a wealth of open-source NVIDIA AI kernels, including FlashAttention and masked multi-head attention, to optimize models as they evolve.

​​To access TensorRT-LLM, developers can apply for early access through the NVIDIA Developer Program

The post NVIDIA Introduces TensorRT-LLM To Accelerate LLM Inference on H100 GPUs appeared first on Analytics India Magazine.

by Siliconluxembourg

Would-be entrepreneurs have an extra helping hand from Luxembourg’s Chamber of Commerce, which has published a new practical guide. ‘Developing your business: actions to take and mistakes to avoid’, was written to respond to  the needs and answer the common questions of entrepreneurs.  “Testimonials, practical tools, expert insights and presentations from key players in our ecosystem have been brought together to create a comprehensive toolkit that you can consult at any stage of your journey,” the introduction… Source link

by WIRED

B&H Photo is one of our favorite places to shop for camera gear. If you’re ever in New York, head to the store to check out the giant overhead conveyor belt system that brings your purchase from the upper floors to the registers downstairs (yes, seriously, here’s a video). Fortunately B&H Photo’s website is here for the rest of us with some good deals on photo gear we love. Save on the Latest Gear at B&H Photo B&H Photo has plenty of great deals, including Nikon’s brand-new Z6III full-frame… Source link

by Gizmodo

Long before Edgar Wright’s The Running Man hits theaters this week, the director of Shaun of the Dead and Hot Fuzz had been thinking about making it. He read the original 1982 novel by Stephen King (under his pseudonym Richard Bachman) as a boy and excitedly went to theaters in 1987 to see the film version, starring Arnold Schwarzenegger. Wright enjoyed the adaptation but was a little let down by just how different it was from the novel. Years later, after he’d become a successful… Source link