Sign In

Communications of the ACM

ACM News

Major Websites Are Blocking AI Crawlers From Accessing Their Content

View as: Print Mobile App Share:

Any page you can access from a web browser can also be "scraped" by a crawler—which operates just like a browser, but stores the material in a database instead of displaying it to a user.

Credit: Annelise Capossela/Axios

Nearly 20% of the top 1000 websites in the world are blocking crawler bots that gather web data for AI services, according to new data from Originality.AI, an AI content detector.

Why it matters: In the absence of clear legal or regulatory rules governing AI's use of copyrighted material, websites big and small are taking matters into their own hands.

Driving the news: OpenAI introduced its GPTBot crawler early in August, declaring that the data gathered "may potentially be used to improve future models," promising that paywalled content would be excluded and instructing websites in how to bar the crawler.

Soon after, several high-profile news sites, including the New York Times, Reuters and CNN, began blocking GPTBot, and many more have since followed. (Axios is among them.)

From Axios
View Full Article



No entries found