10th Indian Delegation to Dubai, Gitex & Expand North Star – World’s Largest Startup Investor Connect
Tech

Many of the biggest websites opted out of Apple Intelligence training


Generative AI systems are trained by letting them surf the web to scrape content. Apple allows publishers to opt out of its scraping, and a new report says that many of the biggest websites have specifically opted out of Apple Intelligence training.

This includes both Facebook and Instagram, as well as many high-profile news and media sites like The New York Times and The Atlantic

Apple’s AI training

Large language models like ChatGPT are trained by giving them access to millions of words of source material, ranging from news stories to user comments.

In Apple’s case, the company has for years been using Applebot to train Siri and surface Spotlight suggestions. More recently, the company has also been using Applebot to train Apple Intelligence.

The practice is controversial, as AIs are effectively using copyrighted material to generate their own versions of it. For more niche topics, where source material is scarce, they have even been found to regurgitate entire paragraphs with almost no changes made.

But Apple does this in an ethical way, allowing publishers to opt out, and screening out personal data (though it did get caught out by one third-party source).

We train our foundation models on licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web-crawler, AppleBot. Web publishers have the option to opt out of the use of their web content for Apple Intelligence training with a data usage control […]

We apply filters to remove personally identifiable information like social security and credit card numbers that are publicly available on the Internet.

Apple uses an Applebot-Extended tag to allow sites to opt out of AI training while still allowing search indexing – meaning that their pieces can still be included in Spotlight and Siri searches.

Many big web publishers opting out

Since opting out is done using a publicly-accessible robots.txt file, it’s easy to see which sites have done this. Wired checked a number of the biggest news and social media sites.

WIRED can confirm that Facebook, Instagram, Craigslist, Tumblr, The New York Times, The Financial Times, The Atlantic, Vox Media, the USA Today network, and WIRED’s parent company, Condé Nast, are among the many organizations opting to exclude their data from Apple’s AI training […]

In a separate analysis conducted this week, data journalist Ben Welsh found that just over a quarter of the news websites he surveyed (294 of 1,167 primarily English-language, US-based publications) are blocking Applebot-Extended.

Applebot-Extended is a relatively new tag, so it’s likely that more websites will also opt out once awareness increases.

Money is of course one factor

Apple is believed to have struck deals with some media companies, paying a fee in return for the right to use their content for training. It’s likely this is the motivation for at least some sites currently blocking Apple – holding out for a payment offer.

“A lot of the largest publishers in the world are clearly taking a strategic approach,” says Originality AI founder Jon Gillham. “I think in some cases, there’s a business strategy involved—like, withholding the data until a partnership agreement is in place.”

iOS 18.1 beta 3 includes several new Apple Intelligence features, including Photo Clean Up and more notification summaries.

Photo by Kelli McClintock on Unsplash

FTC: We use income earning auto affiliate links. More.



Source link

by Siliconluxembourg

Would-be entrepreneurs have an extra helping hand from Luxembourg’s Chamber of Commerce, which has published a new practical guide. ‘Developing your business: actions to take and mistakes to avoid’, was written to respond to  the needs and answer the common questions of entrepreneurs.  “Testimonials, practical tools, expert insights and presentations from key players in our ecosystem have been brought together to create a comprehensive toolkit that you can consult at any stage of your journey,” the introduction… Source link

by WIRED

B&H Photo is one of our favorite places to shop for camera gear. If you’re ever in New York, head to the store to check out the giant overhead conveyor belt system that brings your purchase from the upper floors to the registers downstairs (yes, seriously, here’s a video). Fortunately B&H Photo’s website is here for the rest of us with some good deals on photo gear we love. Save on the Latest Gear at B&H Photo B&H Photo has plenty of great deals, including Nikon’s brand-new Z6III full-frame… Source link

by Gizmodo

Long before Edgar Wright’s The Running Man hits theaters this week, the director of Shaun of the Dead and Hot Fuzz had been thinking about making it. He read the original 1982 novel by Stephen King (under his pseudonym Richard Bachman) as a boy and excitedly went to theaters in 1987 to see the film version, starring Arnold Schwarzenegger. Wright enjoyed the adaptation but was a little let down by just how different it was from the novel. Years later, after he’d become a successful… Source link