OpenAI Sued For Data Theft In Class Action Lawsuit

OpenAI Sued For Copyright Infringement

ChatGPT is in trouble. OpenAI is getting sued in the US for illegally using content from the internet to train their LLM or large language models. It got called out for unauthorised data mining to augment its information database.

As reported by First Post, a class action lawsuit has been filed against OpenAI, the creator of ChatGPT, claiming that the company’s AI training methods violated the privacy and copyright of practically everyone who has ever shared content online. OpenAI gathered an enormous amount of data from various sources on the internet to train its advanced AI language models.

These datasets consist of a wide range of materials, such as Wikipedia articles, popular books, social media posts, and even explicit content of niche genres. More importantly, OpenAI acquired all this data without seeking permission from the content creators. If this refreshes anyone’s memory, it would be Samsung’s coding for their semiconductor division, as well as other confidential data.

What The Trouble Entails

The class action lawsuit, filed in California, argues that OpenAI’s failure to adhere to proper protocols, including obtaining consent from content creators, amounts to outright data theft.

The lawsuit filing stated, “Instead of following established procedures for the acquisition and usage of personal information, the Defendants resorted to theft. They systematically scraped 300 billion words from the internet, including ‘books, articles, websites, and posts,’ which also included personal information obtained without consent.”

How OpenAI Nicks Your Ideas And Work

It is a valid argument that if you have been active online in recent decades, your digital contributions are likely incorporated into OpenAI’s datasets. Consequently, any output generated by OpenAI’s language models, which is used for profit, may contain fragments of your data obtained through silent scraping.

Ryan Clarkson, Managing Partner at the law firm suing OpenAI, explained to The Washington Post that “all of that information is being taken at scale” without it being originally intended for utilisation by a large language model.

Is the Lawsuit Really A Concern For OpenAI?

The outcome of the case in court remains uncertain. The internet’s infrastructure is complex, and the notion of a free and open web is often not entirely accurate. Online platforms have their own terms and agreements with users, and even if users contribute content to these platforms, the ownership typically belongs to the platform itself rather than the users.

Katherine Gardner, an intellectual-property lawyer, noted that when users upload content to social media or any other site, they usually grant the platform a broad license to use their content in various ways. As a result, it would be challenging for ordinary users to claim entitlement to payment or compensation for the use of their data in training models.

While it is a subject of ethics for OpenAI, such casts doubts on any organisation’s integrity and expertise. Using a chatbot to improve one’s work is not wrong. However, when the output turns out to be generated by data mining from other reliable sources, one cannot argue against the principle of data theft — albeit another platform did the deed on one’s behalf.

WANT MORE INSIDER NEWS? SUBSCRIBE TO OUR DIGITAL MAGAZINE NOW!

CONNECT WITH US: LinkedIn, Facebook, Twitter

Letter to the Editor
Do you have an opinion about this story? Do you have some thoughts you’d like to share with our readers? APMEN News would love to hear from you!

Email your letter to the Editorial Team at [email protected]

Arizona State Lawmaker Used ChatGPT To Write Part Of Law On Deepfakes

TSMC Sees Annual Sales Growth To Reach 10% In Semiconductor Industry

Samsung Swaps Executive To Tackle 'Chip Crisis' Amid AI Boom

Vietnam Versus Malaysia For Semiconductor Design Hub Crown

Christellee
July 11, 2023

ChatGPT is in trouble. OpenAI is getting sued in the US for illegally using content from the internet to train their LLM or large language models. It got called out for unauthorised data mining to augment its information database.

What The Trouble Entails

How OpenAI Nicks Your Ideas And Work

Is the Lawsuit Really A Concern For OpenAI?

Related posts: