Wiredâs Kate Knibbs reports that many major publishers are blocking Applebot-Extended, the companyâs crawler bot that helps train Apple Intelligence:
WIRED can confirm that Facebook, Instagram, Craigslist, Tumblr, The New York Times, The Financial Times, The Atlantic, Vox Media, the USA Today network, and WIREDâs parent company, Condé Nast, are among the many organizations opting to exclude their data from Appleâs AI training.
This isnât a huge surprise, as the story points out later on that many of the publishers are automatically blocking crawler bots (including Appleâs) until and unless they strike a commercial deal with AI companies.
Remaining unanswered is the question of just what was already used to train Appleâs LLM before people were aware of the ability to block it, and whether blocking the crawler now has any effect after the horse is out of the barn.
The Wired pieceâs money quote, however, comes from a The New York Times executive:
âAs the law and The Timesâ own terms of service make clear, scraping or using our content for commercial purposes is prohibited without our prior written permission,â says NYT director of external communications Charlie Stadtlander, noting that the Times will keep adding unauthorized bots to its block list as it finds them. âImportantly, copyright law still applies whether or not technical blocking measures are in place. Theft of copyrighted material is not something content owners need to opt out of.â
Bingo.