- Facebook, the NYT, USA Today, and more have blocked Apple’s AI scraper.
- Exclusive deals may compartmentalize the web.
- Big publishers are digging their own graves here.
Some of the web’s biggest sites are blocking Apple’s AI data-scraping bots, and you won’t be surprised to learn that it’s all about the money.
News and social media sites, including the NYT, Vox Media, Facebook, and Condé Nast, have all opted out of allowing their sites to be harvested by Apple for the purpose of training its AI models. But the motivation doesn’t seem to have anything to do with ethics, the environmental problems of AI server farms, or anything other than cold, hard cash. In fact, it points to how AI might change how the commercial web works.
“By opting out, they’re essentially saying to Apple, ‘If you want our content for AI training, you’ll need to pay for it,'” Dev Nag, founder of AI company QueryPal, told Lifewire via email. “This trend could reshape the landscape of AI development, turning web content into a more explicitly priced commodity.”
Show Us the Money
When Apple announced its Apple Intelligence features at its WWDC keynote this summer, it told us that we could all opt out of having our websites’ data fed into its training machine, and it would abide by our wishes. The problem is that Apple had already scraped the internet and trained its AI models, making any opt-outs pretty pointless.
Blocking the Applebot from your own website is easy. You just add a couple of lines of text to a file on your server. But most of our data is not on websites we own and control. It’s in Dropbox, on social media sites like Instagram, or internet forums like Reddit. Plus, if you opt-out, does this mean Apple will retroactively remove it from its already-trained models?
Major web publishers pretty much immediately blocked Apple’s access to their sites’ content, which is no surprise. According to an article by Wired’s Kate Knibbs, data journalist Ben Walsh has found that roughly a quarter of 1,000+ news sites he surveyed have blocked the AI-scraping Applebot-Extended.
The twist is that some of these sites have negotiated deals with Apple to allow their sites to be used for AI training. Withholding their data is, in these cases, a bargaining tactic. Wired’s own parent company, Condé Nast, for example, unblocked Open AI’s bot after signing a deal.
This is a very interesting twist for the web as a whole. Until now, most websites have allowed themselves to be indexed by search engines like Google and Bing for free in exchange for the traffic they bring. Over the years, several news publishers and governments have tried to get Google or Facebook to pay for linking to news sites, but this has never really worked out.
Now, though, Big Tech companies seem willing to pay to scrape.
Consequences
This is already having consequences across the web. For example, Google did a frankly cheap $60 million deal for exclusive rights to index Reddit, aka the most useful source of knowledge on the internet. Rival search engines like DuckDuckGo (based on Bing) still show old results but cannot index any new posts.
At the same time as search engines are expected to respect these non-indexing requests, AI companies are not only ignoring these requests but actively working around attempts to block them.
This will, of course, lead to a compartmentalized pay-to-play web. Big Tech’s AI will divide up the publishers between themselves, and smaller players will either have their data stolen and regurgitated by AI bots with no links or credit or disappear from the web entirely. Google doesn’t even bother to index many of the smaller sites on the web anymore. And if only licensed scrapers can operate, other more legitimate uses are also out the window.
“The real losers in this new world are up-to-date, intelligent, open-source foundation models. Unless the model is backed by a large institution (e.g., a large tech company or the government), researchers will not have access to live, up-to-date content to train models. The next breakthroughs in foundation model tech will continue to happen in the backrooms of companies,” Sid Rao, an AI and machine learning expert and the CEO of Positron Networks, told Lifewire via email.
The fight is on, and we’re already familiar with the likely result. Whenever you want to watch a TV show or movie, you first have to find out who’s streaming it, then sign up for that service, etc. Imagine that, only for the entire web.
And this is going to hurt the publishers, too. They might get a licensing fee to sell their crown jewels to feed the AI, but they will no longer be getting any traffic to their sites. The whole point of this AI business is for Apple, Google, OpenAI, and so on to take that content and repackage it as their own.
It’s going to be a rough ride, and we haven’t even gotten to the massive environmental disaster that is AI.
Thanks for letting us know!
News Summary:
- Exclusive Backroom AI Deals Are Already Ruining The Web
- Check all news and articles from the latest Tech updates.
- Please Subscribe us at Google News.