Sarah Silverman sues OpenAI and Meta over alleged copyright infringement in AI training

Photo credit: Tinseltown/Shutterstock
Sarah Silverman at the HBO's Official 2019 Emmy After Party held at the Pacific Design Center in West Hollywood, USA on September 22, 2019.

Comedian Sarah Silverman is the lead plaintiff in a new lawsuit against ChatGPT maker OpenAI that alleges the company’s AI technology violated copyright laws by training itself on Silverman’s book “The Bedwetter.”

The lawsuit could have significant implications for the music industry, where concerns have been growing for some time that AI models could be violating the law by training on copyrighted material.

In a complaint filed with the US District Court for the Northern District of California, San Francisco division, on Friday (July 7), lawyers for Silverman argued that, “much of the material in OpenAI’s training datasets… comes from copyrighted works – including books written by plaintiffs – that were copied by OpenAI without consent, without credit, and without compensation.”

Silverman is one of three book authors suing OpenAI in this case. The others are Christopher Golden, in relation to his book “Ararat,” and Richard Kadrey, in relation to his book “Sandman Slim,” the first volume in a series of the same name.

“When ChatGPT was prompted to summarize books written by each of the plaintiffs, it generated very accurate summaries,” the complaint continued, adding that this means that “ChatGPT retains knowledge of particular works in the training dataset and is able to output similar textual content. At no point did ChatGPT reproduce any of the copyright management information plaintiffs included with their published works.”

The current generation of AI algorithms are trained on large datasets that allow the algorithm to detect patterns – be they words or music – and imitate them convincingly.

You can read the full complaint against OpenAI, here.

On a dedicated website from the lawyers, they also announce that they have “filed an ini­tial class-action law­suit against Meta” on behalf of Sarah Sil­ver­man, Chris Golden, and Richard Kadrey, “chal­leng­ing LLaMA, a set of large lan­guage mod­els trained in part on copy­righted books”.

You can read that complaint in full, here.

The lawsuit against OpenAI argues that, because ChatGPT integrated the copyrighted works into its algorithm, it may itself be a “derivative work” covered by copyright.

“Because the OpenAI Language Models cannot function without the expressive information extracted from plaintiffs’ works (and others) and retained inside them, the OpenAI Language Models are themselves infringing derivative works, made without plaintiffs’ permission and in violation of their exclusive rights under the Copyright Act,” the complaint states.

The lawsuit was filed by lawyers Joseph Saveri and Matthew Butterick, who a week earlier filed another, similar lawsuit, against OpenAI in the same court. That lawsuit, on behalf of authors Mona Awad, who wrote “Bunny” and “13 Ways of Looking at a Fat Girl,” and Paul Tremblay, author of “The Cabin at the End of the World,” makes similar arguments as the Silverman suit.

You can read that complaint in full here.

With AI technology proliferating at breakneck speed since the unveiling of ChatGPT last November, governments around the world have been scrambling to draw up laws that would regulate their use.

Arguably the most advanced on this front is the European Union, which last month moved forward with legislation known as the AI Act. Among its rules is a requirement that developers of “foundational models” – AI models, like ChatGPT, that can be used as the basis for building more sophisticated AI functions – will be required to disclose if they have used copyrighted materials in the training of their models.

New regulations in China have a similar requirement.

In the US, the Patent Office has opened a public consultation on how to address copyright violations related to AI, though the US Congress has not yet proposed legislation on the matter.

By contrast, Japan does not appear to be moving in the same direction. Comments earlier this year by the country’s minister for education, culture, sports, science and technology indicated that the Japanese government views the use of copyrighted materials for the training of AI as an acceptable practice, even when that material is hosted online illegally – so long as the AI doesn’t reproduce the material.

The legal action filed by the authors against OpenAI and Meta will likely be closely monitored by music recording companies or other rights holders who object to their materials being used in the training of AI models.

Following the “fake Drake” track controversy earlier this year, Universal Music Group (UMG), which owns Drake’s labels, made it clear that it views the work as a violation of copyright law.

“We [Universal] own all sounds captured on a sound recording,” stated Michael Nash, EVP and Chief Digital Officer at UMG.

“Specifically, soundalikes which serve to confuse the public as to the source or origin, or which constitute a commercial appropriation of likeness in the form of a distinctive voice, are all clearly illegal.”

With the music and other media industries raising the alarm about AI scraping copyrighted material without permission, some tech companies have recently taken pains to stress that their new models are compliant with copyright laws.

Facebook parent Meta last month unveiled a text-to-music generator called MusicGen, which the company says was trained on 20,000 hours of licensed music.Music Business Worldwide

Related Posts