Who’s blocking Apple’s AI crawler – Six Colors

Wired‘s Kate Knibbs reports that many major publishers are blocking Applebot-Extended, the company’s crawler bot that helps train Apple Intelligence:

WIRED can confirm that Facebook, Instagram, Craigslist, Tumblr, The New York Times, The Financial Times, The Atlantic, Vox Media, the USA Today network, and WIRED’s parent company, Condé Nast, are among the many organizations opting to exclude their data from Apple’s AI training.

This isn’t a huge surprise, as the story points out later on that many of the publishers are automatically blocking crawler bots (including Apple’s) until and unless they strike a commercial deal with AI companies.

Remaining unanswered is the question of just what was already used to train Apple’s LLM before people were aware of the ability to block it, and whether blocking the crawler now has any effect after the horse is out of the barn.

The Wired piece’s money quote, however, comes from a The New York Times executive:

“As the law and The Times’ own terms of service make clear, scraping or using our content for commercial purposes is prohibited without our prior written permission,” says NYT director of external communications Charlie Stadtlander, noting that the Times will keep adding unauthorized bots to its block list as it finds them. “Importantly, copyright law still applies whether or not technical blocking measures are in place. Theft of copyrighted material is not something content owners need to opt out of.”

Bingo.

—Linked by Dan Moren

Related Posts