Home Marketing Digital OpenAI Launches GPTBot With Details On How To Restrict Access

Marketing Digital

OpenAI Launches GPTBot With Details On How To Restrict Access

August 7, 2023

170

OpenAI has launched GPTBot, a new web crawler to improve future artificial intelligence models like GPT-4 and the future GPT-5.

How GPTBot Works

Recognizable by the following user agent token and the entire user-agent string, this system scours the web for data that can enhance AI technology’s accuracy, capabilities, and safety.

User agent token: GPTBot
Full user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

Reportedly, it should strictly filter out any paywall-restricted sources, sources that violate OpenAI’s policies, or sources that gather personally identifiable information.

The utilization of GPTBot can potentially provide a significant boost to AI models.

By allowing it to access your site, you contribute to this data pool, thereby improving the overall AI ecosystem.

However, it’s not a one-size-fits-all scenario. OpenAI has given web admins the power to choose whether or not to grant GPTBot access to their websites.

Restricting GPTBot Access

If website owners wish to restrict GPTBot from their site, they can modify their robots.txt file.

By including the following, they can prevent GPTBot from accessing the entirety of their website.

User-agent: GPTBot
Disallow: /

In contrast, those who wish to grant partial access can customize the directories that GPTBot can access. To do this, add the following to the robots.txt file.

User-agent: GPTBot
Allow: /directory-1/
Disallow: /directory-2/

Regarding the technical operations of GPTBot, any calls made to websites originate from IP address ranges documented on OpenAI’s website. This detail provides added transparency and clarity to web admins about the traffic source on their sites.

Allowing or disallowing the GPTBot web crawler could significantly affect your site’s data privacy, security, and contribution to AI advancement.

Legal And Ethical Concerns

OpenAI’s latest news has sparked a debate on Hacker News around the ethics and legality of using scraped web data to train proprietary AI systems.

GPTBot identifies itself so web admins can block it via robots.txt, but some argue there’s no benefit to allowing it, unlike search engine crawlers that drive traffic. A significant concern is copyrighted content being used without attribution. ChatGPT does not currently cite sources.

There are also questions about how GPTBot handles licensed images, videos, music, and other media found on websites. If that media ends in model training, it could constitute copyright infringement. Some experts think crawler-generated data could degrade models if AI-written content gets fed back into training.

Conversely, some believe OpenAI has the right to use public web data freely, likening it to a person learning from online content. However, others argue that OpenAI should share profits if it monetizes web data for commercial gain.

Overall, GPTBot has opened complex debates around ownership, fair use, and the incentives of web content creators. While following robots.txt is a good step, transparency is still lacking. The tech community wonders how their data will be used as AI products advance rapidly.

Featured image: Vitor Miranda/Shutterstock

Con información de Search Engine Journal.

Leer la nota Completa > OpenAI Launches GPTBot With Details On How To Restrict Access

OpenAI Launches GPTBot With Details On How To Restrict Access

How GPTBot Works

Restricting GPTBot Access

Legal And Ethical Concerns

LEAVE A REPLY Cancel reply

NUESTRAS OFICINAS

MARKETING X DIGITAL

Pasaje Ottone 291
Godoy Cruz
Mendoza CP 5501
Argentina

info@marketingxdigital.com

How GPTBot Works

Restricting GPTBot Access

Legal And Ethical Concerns

RELATED ARTICLESMORE FROM AUTHOR

Plátano de Canarias lanza una saludable campaña de Navidad

iDocCar nombra a Gazali Rey, nueva directora comercial y de marketing – RRHH Press

Cómo saber si alguien te ha bloqueado en Instagram

Prohibir las bolsas de plástico gratuitas en los supermercados hace que usemos todavía más bolsas de plástico, según un estudio

las claves de su éxito más allá del marketing

LEAVE A REPLY Cancel reply

NUESTRAS OFICINAS

MARKETING X DIGITAL

Pasaje Ottone 291 Godoy Cruz Mendoza CP 5501 Argentina

info@marketingxdigital.com

RELATED ARTICLES MORE FROM AUTHOR

Pasaje Ottone 291
Godoy Cruz
Mendoza CP 5501
Argentina