Home Blog Newsfeed Perplexity Accused of Scraping Websites That Explicitly Blocked AI Scraping
Perplexity Accused of Scraping Websites That Explicitly Blocked AI Scraping

Perplexity Accused of Scraping Websites That Explicitly Blocked AI Scraping

AI startup Perplexity is reportedly crawling and scraping content from websites that have explicitly stated they do not wish to be scraped, according to recent research published by internet infrastructure provider Cloudflare.

Cloudflare’s research, released on Monday, indicates that Perplexity has been observed ignoring website blocks and concealing its scraping activities. The infrastructure giant accused Perplexity of obscuring its identity while attempting to scrape web pages, an action noted by Cloudflare’s researchers as an effort to circumvent website preferences.

AI products, including those offered by Perplexity, depend on the ingestion of vast amounts of data from the internet. AI startups have historically scraped text, images, and videos without explicit permission to develop their services. In response, websites have begun utilizing the web standard Robots.txt file, which directs search engines and AI companies on which pages can be indexed and which should be avoided. However, the effectiveness of these measures has yielded mixed results.

According to Cloudflare, Perplexity appears to be deliberately bypassing these directives by altering its bots’ “user agent”—a signal identifying a visitor’s device and software version—and by changing its autonomous system networks (ASNs), which identify large internet networks. Cloudflare stated that this behavior was observed across tens of thousands of domains and millions of requests daily, with their researchers able to identify the crawler using a combination of machine learning and network signals.

A spokesperson for Perplexity, Jesse Dwyer, dismissed Cloudflare’s findings as a “sales pitch,” telling TechCrunch that the screenshots provided in the blog post demonstrated that “no content was accessed.” Dwyer further claimed that the bot mentioned in Cloudflare’s post was “isn’t even ours.”

Cloudflare noted that it first detected this activity following complaints from its customers, who reported Perplexity crawling their sites even after they implemented rules on their Robots.txt files and specifically blocked Perplexity’s known bots. Cloudflare subsequently conducted tests to verify these circumventions.

The company also stated that Perplexity uses a generic browser user agent, impersonating Google Chrome on macOS, when its declared crawler is blocked. Cloudflare has since delisted Perplexity’s bots from its verified list and implemented new blocking techniques.

This incident follows Cloudflare’s increased public stance against AI crawlers. Last month, Cloudflare announced a marketplace designed to allow website owners to charge AI scrapers for accessing their sites. Cloudflare CEO Matthew Prince has previously warned that AI is disrupting the internet’s business model, particularly for publishers. Last year, Cloudflare also launched a free tool to help prevent bots from scraping websites for AI training data.

Perplexity has faced similar allegations of unauthorized scraping and plagiarism in the past. Last year, news outlets like Wired reported accusations of Perplexity plagiarizing content. Weeks later, Perplexity CEO Aravind Srinivas was unable to define the company’s stance on plagiarism during a TechCrunch interview.

Add comment

Sign Up to receive the latest updates and news

Newsletter

© 2025 Proaitools. All rights reserved.