close
close

OpenAI is under constant scrutiny over its AI training data practices

OpenAI’s near-unlimited use of information from the Internet in the form of ChatGPT AI training data continues to pose a legal problem for the company as the number of lawsuits challenging the problematic practice continues to grow.

OpenAI reportedly uses publicly available data, including books and articles available online, to train ChatGPT. Currently, their owners are demanding payment for their work.

OpenAI CEO Sam Altman speaks during the Microsoft Build conference at the Seattle Convention Center Summit Building in Seattle, Washington, May 21, 2024.
(Photo: JASON REDMOND/AFP via Getty Images)

Creating AI models that take the tech industry by storm requires training data. Prominent technology corporations such as Microsoft, Google, Anthropic, Meta, OpenAI and Anthropic are rushing to localize fresh data sources. At one point, Meta even thought about purchasing one of the largest publishing houses in the world, Simon & Schuster.

Part of the problem stems from growing accusations against these companies by publishers for deleting protected information. They want payment for the work they do. In responses to the U.S. Copyright Office, Meta and OpenAI argued that posting copyrighted content online qualifies it as “publicly available” and falls under fair use.

However, they will still have to present this defense in court, as multiple parties are suing the company over its copyrighted content.

Read also: ChatGPT reportedly hallucinates fake links to news partners’ top stories

OpenAI vs CIR

The nonprofit media group Center for Investigative Reporting (CIR), formed earlier this year through a merger with Mother Jones and Reveal, filed a complaint in federal court against Microsoft and OpenAI last week.

The lawsuit claims that intellectual property owned by CIR and other vendors around the world was used in the creation of OpenAI.

CIR attorneys accused Microsoft and OpenAI of using copyrighted Mother Jones material to train their GPT and Copilot AI models.

Previous lawsuits involving OpenAI

Last April, OpenAI and Microsoft also faced legal action from several high-profile newspapers, including the New York Daily News and the Chicago Tribune, owned by Alden Capital Group.

According to the lawsuit, both the IT corporations have intentionally infringed copyright. Several high-profile newspapers claiming copyright infringement have filed lawsuits against Microsoft and OpenAI, including the Chicago Tribune, Orlando Sentinel, New York Daily News and San Jose Mercury News.

All these newspapers are owned by Alden Global Capital. The company claims that both companies used their content to train artificial intelligence models without attribution or permission.

The article uses interview data from ChatGPT and Copilot, which shows that when these AI models were queried, they cited extensive excerpts from specific publications.

This means that the above-mentioned elements were included in the training datasets without obtaining permission from the relevant media.

They also presented the capabilities of Copilot, showing how it can instantly download news from the Internet and reproduce it in its entirety without citing sources. Additionally, the companies claim that these chatbots regularly falsely link publications to false material or fabrications.

Related article: ChatGPT exam answers go undetected, beating human students in UK university test

ⓒ 2024 TECHTIMES.com All rights reserved. Do not reproduce without permission.