10th Indian Delegation to Dubai, Gitex & Expand North Star – World’s Largest Startup Investor Connect
Tech

Apple teaching an AI system to use apps; maybe for advanced Siri


An Apple research paper describes how the company has been developing Ferret-UI, a generative AI system specifically designed to be able to make sense of app screens.

The paper is somewhat vague about the potential applications of this – likely deliberately so – but the most exciting possibility would be to power a much more advanced Siri

The challenges in going beyond ChatGPT

Large Language Models (LLMs) are what power systems like ChatGPT. The training material for these is text, mostly taken from websites.

MLLMs – or Multimodal Large Language Models – aim to extend the ability of an AI system to make sense of non-textual information also: images, video, and audio.

MLLMs aren’t currently very good at understanding the output of mobile apps. There are several reasons for this, starting with the mundane one that smartphone screen aspect ratios differ from those used by most training images.

More specifically a lot of the images they need to recognize, like icons and buttons, are very small.

Additionally, rather than comprehend information in one hit, as they would when interpreting a static image, they need to be able to interact with the app.

Apple’s Ferret-UI

These are the problems Apple researchers believe they have solved with the MLLM system they call Ferret-UI (the UI standing for user interface).

Given that UI screens typically exhibit a more elongated aspect ratio and contain smaller objects of interest (e.g., icons, texts) than natural images, we incorporate “any resolution” on top of Ferret to magnify details and leverage enhanced visual features […]

We meticulously gather training samples from an extensive range of elementary UI tasks, such as icon recognition, find text, and widget listing. These samples are formatted for instruction-following with region annotations to facilitate precise referring and grounding. To augment the model’s reasoning ability, we further compile a dataset for advanced tasks, including detailed description, perception/interaction conversations, and function inference.

The result, they say, is better than both GPT-4V and other existing UI-focused MLLMs.

From UI development, to a highly advanced Siri

The paper describes what they have achieved, rather than how it might be used. That is typical of many research papers, and there can be a couple of reasons for this.

First, the researchers themselves may not know how their work might end up being used. They are focused on solving a technical problem, not on the potential applications. It may take a product person to see potential ways to make use of it.

Second, especially where Apple is concerned, they may be instructed not to disclose the intended use, or to be deliberately vague about it.

But we could see three potential ways this ability might be used …

One, it could be a useful tool for evaluating the effectiveness of a UI. A developer could create a draft version of an app, then let Ferret-UI determine how easy or difficult it is to understand, and to use. This could be both quicker and cheaper than human usability testing.

Two, it could have accessibility applications. Rather than a simple screen-reader reading everything on an iPhone screen to a blind person, for example, it summarize what the screen shows, and list the options available. The user could then tell iOS what they want to do, and let the system do it for them.

Apple provides an example of this, where Ferret-UI is presented with a screen containing podcast shows. The system’s output is: “The screen is for a podcast application where users can browse and play new and notable podcasts, with options to play, download, and search for specific podcasts.”

Three – and most exciting of all – it could be used to power a very advanced form of Siri, where a user could give Siri an instruction like “Check flights from JFK to Boston tomorrow, and book a seat on a flight that will get me there by 10am with a total fare below $200.” Siri would then interact with the airline app to carry out the task.

Thanks, AK. 9to5Mac composite image from Solen Feyissa on Unsplash and Apple.

FTC: We use income earning auto affiliate links. More.





Source link

by Siliconluxembourg

Would-be entrepreneurs have an extra helping hand from Luxembourg’s Chamber of Commerce, which has published a new practical guide. ‘Developing your business: actions to take and mistakes to avoid’, was written to respond to  the needs and answer the common questions of entrepreneurs.  “Testimonials, practical tools, expert insights and presentations from key players in our ecosystem have been brought together to create a comprehensive toolkit that you can consult at any stage of your journey,” the introduction… Source link

by WIRED

B&H Photo is one of our favorite places to shop for camera gear. If you’re ever in New York, head to the store to check out the giant overhead conveyor belt system that brings your purchase from the upper floors to the registers downstairs (yes, seriously, here’s a video). Fortunately B&H Photo’s website is here for the rest of us with some good deals on photo gear we love. Save on the Latest Gear at B&H Photo B&H Photo has plenty of great deals, including Nikon’s brand-new Z6III full-frame… Source link

by Gizmodo

Long before Edgar Wright’s The Running Man hits theaters this week, the director of Shaun of the Dead and Hot Fuzz had been thinking about making it. He read the original 1982 novel by Stephen King (under his pseudonym Richard Bachman) as a boy and excitedly went to theaters in 1987 to see the film version, starring Arnold Schwarzenegger. Wright enjoyed the adaptation but was a little let down by just how different it was from the novel. Years later, after he’d become a successful… Source link