10th Indian Delegation to Dubai, Gitex & Expand North Star – World’s Largest Startup Investor Connect
Tech

Compressing LLMs With Quantum-Inspired Software


Large language models are inefficient, period. That’s apparent at AWS re:Invent this week. Inference is a hot topic, and conversations center on how to make the most of LLMs, considering the cost of training and the energy consumption required.

Multiverse Computing, a company participating in the AWS Generative AI Accelerator, has developed ways to compress LLMs using quantum-inspired software. Based in San Sebastián, Spain, the company accelerates computing with quantum-inspired tensor networks, said founder and CEO Enrique Lizaso Olmos in an interview before AWS re:Invent.

Tensor networks are powerful mathematical structures using “methods that attempt to use a classical computer to simulate the behavior of a quantum computer, thus making the classical machine operate algorithms that benefit from the laws of quantum mechanics that benefit real quantum computers,” according to a post by Pedro L e S Lopes, on the topic of quantum-inspired computing and how it compares to quantum computing.

Consider the cost and energy needed to train models and perform inference. Multiverse compresses LLMs with techniques that, according to the company’s own published research, reduce by 93% the memory size of LlaMA-2-7B; it also reduces by 70% the number of parameters, accelerating 50% the training and 25% the inference by times of the model. Additionally, the accuracy drop is 2% to 3%.

Multiverse, Lizaso said, works with many companies that have already tried LLMs but have found it expensive to deploy. The problem: LLMs need to be more efficient. They scale in parameters, but accuracy only improves linearly. The costs increase as more computing is used. Buying the GPUs is costly, and just as costly or even more costly is buying the GPUs from a cloud services provider.

Multiverse started working with Bosch, a German engineering and technology company that wanted help with an on-premise AI system to reduce defects, Lizaso said.

“So we applied our tensor networks,” Lizaso said. “We developed a completely new set of algorithms for machine learning. Well, it worked quite well. So we applied those same systems to finance and defense and so on. But at some point, and that was in 2023, we asked ourselves, can we just prepare a better system, a compressed system of large language models?”

What’s the Future of Compression?

When we come to the age of quantum computing, the compression will be sped up, so almost anything will have some form of embedded intelligence due to quantum computing’s ability to analyze vast amounts of data far beyond what is possible using classical computing methods. It acts unlike a classical computer, processing information in the binary sense of 1s and 0s, using a quantum mechanical property called superposition, explained Kimberly Mok in a previous post on The New Stack.

It’s a bit mind-boggling, but in essence, information gets processed as either or both 1s and 0s simultaneously.

We’re not quite in quantum land. Progress toward trustworthy quantum computing is measured in qubits. The number of qubits to achieve usefulness is upwards of a million or more. When we all get to that point, we’re a long way; by the way, we will see compression on a level unimaginable to us now.

Lizaso compared the brain of a fruit fly to the size of an LLM. A fruit fly has 140,000 neurons and 55 million synapses, meaning neurons or connections between cells, according to a recent article in Nature, which published a story about a fruit fly’s brain diagram.

A fruit fly has intelligence. It can walk, fly, mate, fight. It’s autonomous. It does not need a network connection. An LLM does not do a heck of a lot, compared to any creature. But what does it take to create? Unprecedented electrical power, billions of dollars in training. And what can it do?

But what if the intelligence of a fruit fly can be embedded in a robot? It would open up a whole new way of thinking about how LLMs today will seem prehistoric when we can compress enough data to give robots the ability to fly. This means that someday, connected and unconnected devices will have super intelligence with the help of quantum computing. The fruit fly has nature on its side. But our efforts are unsustainable. We will never have the capabilities of sentient beings using classical computing. This means that what we can do now is unsustainable.

Multiverse sells two products: CompactifAI and Singularity. Both provide capabilities to make LLMs more efficient. The company supports multiple models, including Mistral, Bert, and Zephyr.  Access to the model itself is needed to compress it. According to Multiverse, OpenAI provides an API to access (query) the model, “therefore Multiverse Computing’s product is not able to compress it.”

Tradeoffs? There are a few. You need a lot of expertise and there may be the need to retrain. Accuracy is still a question mark, but still, quantum-inspired computing may be an answer to a problem that we need to solve. There’s just so much electricity we can produce for LLMs that are only increasing in size.


Group Created with Sketch.





Source link

by Siliconluxembourg

Would-be entrepreneurs have an extra helping hand from Luxembourg’s Chamber of Commerce, which has published a new practical guide. ‘Developing your business: actions to take and mistakes to avoid’, was written to respond to  the needs and answer the common questions of entrepreneurs.  “Testimonials, practical tools, expert insights and presentations from key players in our ecosystem have been brought together to create a comprehensive toolkit that you can consult at any stage of your journey,” the introduction… Source link

by WIRED

B&H Photo is one of our favorite places to shop for camera gear. If you’re ever in New York, head to the store to check out the giant overhead conveyor belt system that brings your purchase from the upper floors to the registers downstairs (yes, seriously, here’s a video). Fortunately B&H Photo’s website is here for the rest of us with some good deals on photo gear we love. Save on the Latest Gear at B&H Photo B&H Photo has plenty of great deals, including Nikon’s brand-new Z6III full-frame… Source link

by Gizmodo

Long before Edgar Wright’s The Running Man hits theaters this week, the director of Shaun of the Dead and Hot Fuzz had been thinking about making it. He read the original 1982 novel by Stephen King (under his pseudonym Richard Bachman) as a boy and excitedly went to theaters in 1987 to see the film version, starring Arnold Schwarzenegger. Wright enjoyed the adaptation but was a little let down by just how different it was from the novel. Years later, after he’d become a successful… Source link