Since ChatGPT’s debut in November 2022, generative AI has been a hot topic. However, it hasn’t been free of issues, including plagiarism and data source concerns. Publishers, in particular, worry about AI tools scraping their data. Now, Google is offering publishers the ability to manage data accessible to generative AI and other tools. In a blog post, Danielle Romain, Google’s VP of Trust, mentioned, “Web publishers have expressed the desire for more options and control regarding the use of their content in emerging generative AI applications.”
Fresh regulations for publishers
Romain announced that Google is introducing a fresh level of control, known as Google-Extended, which web publishers can utilize to determine whether their websites contribute to enhancing Bard and Vertex AI generative APIs, including upcoming model iterations powering these products. “Through the use of Google-Extended to regulate content access on a website, administrators can decide whether they wish to support the continuous improvement of these AI models in terms of accuracy and capabilities,” she explained. In essence, the decision to provide data to AI models that drive Bard, ChatGPT, and similar systems rests with the publishers.
Google-Extended is an independent product token that allows web publishers to control whether their websites contribute to enhancing Bard and Vertex AI generative APIs. Companies, including Google, employ web crawlers to acquire data for tool improvement purposes. For instance, Google’s standard crawlers are utilized to construct search indices, execute product-specific crawls, and conduct analyses.
According to Google, offering tools like Google-Extended via robots.txt is a crucial move to offer transparency and control, something they believe all AI model providers should offer. Romain mentioned that with the growing expansion of AI applications and tools, web publishers will encounter greater challenges in managing various uses on a larger scale.