Companies Are Not Happy AI Is Training Using Their Data
In a silent but resounding move, a dozen major media companies, among them Disney, The New York Times, and CNN, have taken action to block access to ChatGPT, a versatile generative AI chatbot developed by OpenAI. Their objective is clear: to shield their valuable content from being scraped and subsequently employed by ChatGPT to generate text on-demand.
What is ChatGPT and Why the Blockade?
ChatGPT stands as a testament to the capabilities of generative AI, effortlessly creating content ranging from videos to poetry with a simple prompt. Powered by a substantial language model known as the Chat Generative Pre-trained Transformer (ChatGPT), it draws inspiration from the vast well of billions of words scattered across the internet. ChatGPT excels in responding to natural language inquiries and crafting text in an array of styles and domains.
However, the formidable abilities of ChatGPT also pose a challenge to the media industry, raising concerns of potential copyright infringement or misuse of proprietary content. The mechanism employed by ChatGPT involves a web crawler, which combs the web for data to serve as training material for the chatbot. Notably, this web crawler neither seeks permission nor compensates content providers for the information it collects. Alot of people don’t beleive this falls under fair use policy.
To safeguard their intellectual property and interests, numerous media companies have opted to thwart Open Ais web crawler access to their websites. They have done so by employing a tool known as “robots.txt,” a file that serves as a set of directives to web crawlers, specifying which pages or files can and cannot be requested from a site. By including the ip of this crawler in their robots.txt files, media companies can effectively thwart ChatGPT’s data scraping endeavors.
Implications of Blocking ChatGPT
The decision by media giants to block ChatGPT underscores the growing tension and conflict between the tech and media industries concerning the use and regulation of AI and crawling data. Generative AI technologies have undeniably unlocked vast creative potential, there is no doubt in that. Anyone can create their own art at home for free and quite quickly too. Still, they have also provided a means for creators to disseminate false or misleading information, whether intentionally or inadvertently. The ability to identify and regulate AI-generated content is of paramount importance to ensure its responsible and beneficial application within society.
The blockade against ChatGPT also brings forward ethical and legal questions into the limelight, such as:
1. Rights and Responsibilities: What are the rights and responsibilities of both AI users and developers in terms of the use and potential misuse of copyrighted works?
2. Disclosure and Verification: What standards and practices must be established for the disclosure and verification of the source and authenticity of AI-generated content?
3. Impact on Information and Culture: What are the effects and influences of AI-generated content on the quality and diversity of information and culture?
These questions are far from straightforward, yet addressing them is imperative to pave the way for a fairer and more sustainable future for AI. As Daniel Coffey, president and CEO of the News Media Alliance, aptly put it, “We need to have a conversation about this. We need to have a conversation about the human side of AI.”