Apple is partnering with Nvidia in an effort to improve the performance speed of artificial intelligence (AI) models. On Wednesday, the Cupertino-based tech giant announced that it is researching inference acceleration on Nvidia’s platform to see if both the efficiency and latency of large language models (LLM) can be improved simultaneously. The iPhone maker used a technique called Recurrent Drafter (Redrafter), which was published in a research paper earlier this year. This technique was combined with the Nvidia TensorRT-LLM inference acceleration framework.
Apple uses Nvidia platform to improve AI performance
In a blog post, Apple researchers detailed the new collaboration with Nvidia for LLM performance and the results it achieved. The company highlighted that it is researching the problem of improving inference efficiency while maintaining latency in AI models.
Inference in machine learning refers to the process of making predictions, decisions, or conclusions based on a given set of data or inputs using a trained model. Simply put, this is the processing stage of an AI model where it decodes signals and converts the raw data into processed unseen information.
Earlier this year, Apple published and open-sourced its ReDrafter technology, bringing a new approach to speculative decoding of data. Using recurrent neural network (RNN) draft models, it combines beam search (a mechanism where AI explores multiple possibilities for a solution) and dynamic tree attention (tree-structure data is processed using an attention mechanism). Connects to. The researchers said it can accelerate LLM token generation by up to 3.5 tokens per generation step.
While the company was able to improve performance efficiency to some extent by combining the two processes, Apple highlighted that there was no significant increase in speed. To solve this, the researchers integrated ReDrafter into the Nvidia TensorRT-LLM inference acceleration framework.
As part of the collaboration, Nvidia added new operators and highlighted existing operators to improve the speculative decoding process. The post claims that when using the Nvidia platform with ReDrafter, they got a 2.7x speedup in tokens generated per second for greedy decoding (a decoding strategy used in sequence generation tasks).
Apple highlighted that this technology can be used to reduce the latency of AI processing while using fewer GPUs and consuming less power.
Follow Gadgets 360 for the latest tech news and reviews xFacebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know all about the top influencers, follow our in-house Who’sThat360 on Instagram and YouTube.
Samsung Galaxy Ring may launch in two new size options