10th Indian Delegation to Dubai, Gitex & Expand North Star – World’s Largest Startup Investor Connect
All News

Microsoft Introduces Multimodal Kosmos-2.5

Microsoft is breaking new ground in the realm of multimodal AI with the introduction of Kosmos-2.5, a literate model designed for the intricate task of machine reading of text-intensive images. Building on the success of its predecessor, Kosmos-1, and Kosmos-2, Microsoft’s Kosmos-2.5 boasts an impressive array of features and capabilities that are set to transform the landscape of image-text understanding.

Click here to read the paper

Kosmos-2.5 has been meticulously pre-trained on vast datasets containing text-intensive images. This extensive training equips Kosmos-2.5 with exceptional proficiency in two closely intertwined transcription tasks:

Spatially-Aware Text Blocks: Kosmos-2.5 can expertly generate text blocks within images while accurately assigning each block its precise spatial coordinates. This breakthrough capability enhances the model’s understanding of text in images, enabling it to provide structured and coherent textual descriptions of image content.

Structured Markdown Text Output: In addition to spatial awareness, Kosmos-2.5 excels in producing structured text output in markdown format. This ensures that not only is the text extracted from images, but it is also presented in a structured and stylized manner.

Summary – key points, training objectives, their impact on the Kosmos-2.5 overall performance, and results (especially interesting comparison with the Nougat model ) https://t.co/qi5R18hEvK

— Igor Tica (@ITica007) September 21, 2023

The remarkable capabilities of Kosmos-2.5 are achieved through a shared Transformer architecture, task-specific prompts, and adaptable text representations. This multimodal literate model is a versatile tool that can be harnessed for a wide range of real-world applications involving text-rich images.

The model has undergone extensive testing, demonstrating its proficiency in end-to-end document-level text recognition and image-to-markdown text generation. Furthermore, Kosmos-2.5 can be effortlessly adapted to various text-intensive image understanding tasks using different prompts through supervised fine-tuning.

The introduction of Kosmos-2.5 signifies a significant step towards the future scaling of multimodal large language models. This groundbreaking work by Microsoft is poised to have a transformative impact on the field of AI and image-text understanding.

Kosmos-1 showed that Language is not all that you need. It showcased the potential of integrating language, action, multimodal perception, and world modeling for the advancement of artificial general intelligence (AGI). Kosmos-2.5 is the next step.

The post Microsoft Introduces Multimodal Kosmos-2.5 appeared first on Analytics India Magazine.

by Siliconluxembourg

Would-be entrepreneurs have an extra helping hand from Luxembourg’s Chamber of Commerce, which has published a new practical guide. ‘Developing your business: actions to take and mistakes to avoid’, was written to respond to  the needs and answer the common questions of entrepreneurs.  “Testimonials, practical tools, expert insights and presentations from key players in our ecosystem have been brought together to create a comprehensive toolkit that you can consult at any stage of your journey,” the introduction… Source link

by WIRED

B&H Photo is one of our favorite places to shop for camera gear. If you’re ever in New York, head to the store to check out the giant overhead conveyor belt system that brings your purchase from the upper floors to the registers downstairs (yes, seriously, here’s a video). Fortunately B&H Photo’s website is here for the rest of us with some good deals on photo gear we love. Save on the Latest Gear at B&H Photo B&H Photo has plenty of great deals, including Nikon’s brand-new Z6III full-frame… Source link

by Gizmodo

Long before Edgar Wright’s The Running Man hits theaters this week, the director of Shaun of the Dead and Hot Fuzz had been thinking about making it. He read the original 1982 novel by Stephen King (under his pseudonym Richard Bachman) as a boy and excitedly went to theaters in 1987 to see the film version, starring Arnold Schwarzenegger. Wright enjoyed the adaptation but was a little let down by just how different it was from the novel. Years later, after he’d become a successful… Source link