Loading stock data...
Media 0c77fb42 8b99 47b9 9412 de4470a59963 133807079769145390

Perplexity Caught Using Underhanded Tactics to Evade Website Crawling Restrictions

Perplexity Accused of Using Stealth Tactics to Flout No-Crawl Edicts

For over three decades, internet norms have been in place to govern how web crawlers interact with websites. However, AI search engine Perplexity has allegedly been using stealth bots and other tactics to evade website blocks, violating these established protocols.

According to a recent blog post by network security and optimization service Cloudflare, the company received complaints from customers who had disallowed Perplexity scraping bots by implementing settings in their sites’ robots.txt files and through Web application firewalls that blocked declared Perplexity crawlers. Despite these efforts, Perplexity continued to access the sites’ content.

Cloudflare researchers decided to investigate further and found that when known Perplexity crawlers encountered blocks from robots.txt files or firewall rules, the company then searched the sites using a stealth bot that followed various tactics to mask its activity. This undeclared crawler utilized multiple IPs not listed in Perplexity’s official IP range and rotated through these IPs in response to restrictive robots.txt policies and Cloudflare blocks.

The researchers provided a diagram illustrating the flow of this technique, which was observed across tens of thousands of domains and millions of requests per day. This behavior is incompatible with internet norms and demonstrates a clear disregard for website directives and preferences.

A History of Disregard for Internet Norms

Perplexity’s actions are not an isolated incident. The company has faced allegations from several other publishers that it plagiarized their content. Forbes accused Perplexity of "cynical theft" after publishing a post that was extremely similar to their proprietary article, posted just a day earlier. Wired also leveled similar claims, citing suspicious traffic patterns from IP addresses likely linked to Perplexity.

Perplexity has been found to manipulate its crawling bots’ ID string to bypass website blocks. This behavior is a clear violation of internet norms and demonstrates a lack of respect for website content and directives.

Cloudflare Takes Action

In response to their findings, Cloudflare researchers took actions to prevent crawlers from accessing sites that use its content-delivery service. The company has de-listed Perplexity as a verified bot and added heuristics to its managed rules to block this stealth crawling. There are clear preferences for web crawlers: they should be transparent, serve a specific purpose, perform a particular activity, and follow website directives and preferences.

Perplexity representatives did not respond to an email asking if the allegations are true. However, Cloudflare’s actions demonstrate that there are consequences for violating internet norms and disregarding website content and directives.

Conclusion

The allegations against Perplexity highlight the need for greater transparency and accountability in AI search engines and web crawlers. While these technologies have revolutionized the way we access information online, they must also adhere to established protocols and respect website content and directives. Cloudflare’s actions serve as a reminder that there are consequences for violating internet norms, and companies like Perplexity must be held accountable for their behavior.