Google says AI training is fair use and copyright should be policed on outputs, not inputs

Crredit: mundissima/Shutterstock

The RIAA, music publishers, and independent artists are all fighting AI companies in court over the same question: whether training a model on copyrighted work without permission is fair use.

Google, which builds its own AI music tools, has a direct stake in the outcome.

Now, in a new policy paper outlining the company’s preferred approach to AI regulation, Google has argued that training AI models on publicly available web data should “remain protected” by fair use in the US.

The paper also says copyright concerns raised by generative AI are best addressed at the level of outputs, not inputs – whether a specific piece of content copies an existing work, rather than how a model was trained.

The 21-page document, titled A Pragmatic Approach to AI Governance in America, was published on Thursday (June 25) by Kent Walker, Google’s President of Global Affairs.

On copyright, Google‘s paper states: “Using publicly available web data for training models is a transformative, non-expressive use – like an art student taking inspiration from walking through a gallery – that should remain protected under fair use in the U.S. and text-and-data-mining exceptions abroad.”

“Using publicly available web data for training models is a transformative, non-expressive use — like an art student taking inspiration from walking through a gallery — that should remain protected under fair use in the U.S. and text-and-data-mining exceptions abroad.”

Google

Google says responsible developers should still give website owners control over whether their content is used for model development, through machine-readable tags such as its own Google-Extended control.

At the same time, the paper says Google is “exploring new types of partnership and value-exchange models” with rights holders.

It adds that Google has paid for access to specialized, non-public content, including creative and educational material.

On its broader approach to regulating AI, Google says: “We believe in an approach that is fundamentally data-driven, focuses on evidence of real-world benefits and harms, and accepts a degree of uncertainty to avoid regulations that slow progress without addressing real challenges.”

“This approach would address outputs, not inputs, looking to prevent and mitigate specific harms rather than micromanaging the science behind these new tools,” the Google paper adds.

On enforcement, Google argues the focus “should again be on outputs – in this case, whether a specific image or piece of text actually copies an existing work, regardless of how it was created.”

It says technical filters should not “automate subjective decisions like whether something is ‘too similar’ to a prior work,” and that infringing material is best handled through standard notice-and-takedown systems.

Google also says it has supported proposals, such as the NO FAKES ACT, to protect individual voices and likenesses by establishing “a balanced national standard against unauthorized digital replicas.”

Google has been making this case to US policymakers since the early days of the generative-AI boom.

In 2023, the company made the same case in a filing with the US Copyright Office, calling AI training a transformative fair use and saying courts, not new legislation, should resolve the question.

Google‘s defense of fair use for AI training lands as that same question is being contested across the music industry.

The RIAA, on behalf of Universal Music Group, Sony Music and Warner Music Group, sued AI music platforms Suno and Udio for “mass infringement” of copyright in mid-2024.

Udio has since moved from defending its training under a fair use argument to signing licensing deals with Universal, Warner, Merlin and Kobalt, though Sony Music‘s case against the platform remains active.

Music publishers brought a separate case against AI firm Anthropic in 2023, alleging it trained its Claude chatbot on their copyrighted song lyrics.

They have since filed a second, larger suit covering more than 20,000 songs and seeking over $3 billion in statutory damages.


In late March, the RIAA, the National Music Publishers’ Association and other industry groups urged a federal court to reject the fair use defense raised by Anthropic in the original case, arguing that the copying was “inexcusable.”

The paper’s pitch on “value exchange” also echoes a run of AI licensing deals in music, with the NMPA agreeing an industry-wide deal with Udio in June that President and CEO David Israelite called the first of its kind with a major AI music company.

Beyond copyright, the paper’s central proposal is a “frontier AI regulatory organization,” or FARO – an independent, industry-funded body, overseen by a federal agency, that would set safety standards and verify audits for the most advanced AI models.

Google says such a body could be modeled on existing regulators such as the Financial Industry Regulatory Authority and the North American Electric Reliability Corporation.

Google frames the proposals as a “middle way” between over-regulation and no regulation, separating rules for frontier models from policies for AI that is more widely deployed.

On everyday uses, Google argues that “if something is illegal to do without AI, it’s illegal to do with AI,” and that existing laws can be adapted rather than rewritten.

Google is itself a defendant in a copyright case over AI training on music.

A group of independent artists sued the company in March, alleging it trained its Lyria 3 music-generation model on copyrighted recordings pulled from YouTube without permission.

Google has moved to dismiss the case, arguing the artists licensed their music when they agreed to YouTube‘s terms of service.Music Business Worldwide