Sony Group’s blueprint for AI music detection tech is promising. Here’s what it’s working on…

sdx15/Shutterstock

MBW Reacts is a series of analytical commentaries from Music Business Worldwide written in response to major recent entertainment events or news stories. Only MBW+ subscribers have unlimited access to these articles.


Last week, Nikkei Asia reported that researchers at Sony Group were working on technology to identify copyrighted music embedded in AI-generated tracks.

The story was widely picked up, with coverage framing the development as a kind of next-generation detection tool that could help songwriters claim compensation from AI developers.

But the broader underlying research by the team at Sony AI appears to go considerably further than that framing suggests.

In a blog post published in December, Sony AI highlighted three papers accepted at major academic conferences in 2025 for AI and audio research.

The research, according to the blog post, is focused on “musical integrity in the age of machine learning, exploring attribution, recognition, and protection,” and is “part of a growing body of work exploring how AI can unlearn what doesn’t belong to it, how connections between musical segments can be identified, and how effective current audio authentication methods are.”

As we noted last week, this work is part of Sony AI’s broader research, and the company has not announced any commercial rollout.

Sony AI, according to its about page, was established as a division of Japan-headquartered tech and entertainment giant Sony Group in April 2020 to “pursue groundbreaking research in AI and robotics to unleash human imagination and creativity with AI”. Sony AI has offices in North America, Europe, India, and Japan.

Here’s what Sony AI’s researchers are working on…

1. Attribution: ‘Unlearning’ can trace which songs shaped an AI model’s output, even when nothing sounds alike

Sony AI’s blog post introduces the first challenge as attribution, or “understanding which training data influenced what an AI system creates.”

As the blog puts it, “when an unlicensed generative model composes a new song from a text prompt, it doesn’t include any record of attribution. But Sony AI’s researchers believe it can still be determined.”

The paper, titled, Large-Scale Training Data Attribution for Music Generative Models via Unlearning was accepted at the NeurIPS 2025 Creative AI Track. It proposes a method for identifying which songs in an AI model’s training data most influenced a specific generated output. Rather than comparing generated tracks against a catalog of existing music, it works by selectively “forgetting” the generated track from the model, then measuring which training songs are most affected by that removal.

To test the approach, the researchers ran it against alternative methods. The so-called “unlearning” method produced sharper results, with influence concentrated in a small number of training tracks, while similarity-based methods showed broader, less focused patterns. When used to identify a known training track, the system achieved perfect identification while the model’s overall quality remained unchanged.

The authors describe their work as the first to explore attribution on a text-to-music model trained on a large, diverse dataset. They frame it as a practical framework for applying unlearning-based attribution at scale.

Conclusion: By “unlearning” a generated track and observing ripple effects, this method can pinpoint which training songs influenced an AI’s output, even when the output doesn’t obviously resemble them. As Sony AI’s blog notes, “by showing what happens when models forget, Sony AI’s researchers hope to help recognise the works of the original artists.”

Read the full paper here


2. Recognition: Segment-level matching can catch the kind of borrowing AI actually does

Sony AI’s blog frames the second strand as recognition, or mapping “the relationships between works.”

As the blog explains: “Two songs may not be identical, but they might still share a melody, rhythm, or phrasing that links them across eras or items in a given catalogue.”

The paper, accepted at ICML 2025, introduces CLEWS [Supervised Contrastive Learning from Weakly-Labeled Audio Segments for Musical Version Matching]. The system detects when two recordings are different versions of the same piece. The key innovation is that it works with 20-second audio snippets rather than whole tracks. As the authors note, the segments that matter in real-world cases are much shorter than full song length.

On two public benchmarks, CLEWS outperformed all existing methods. While competing systems saw steep accuracy drops with shorter audio clips, CLEWS maintained high accuracy down to just 10 seconds. The paper lists plagiarism and near-duplicate detection among its applications.

Conclusion: CLEWS can identify shared musical material between recordings at the segment level, even in short clips. As Sony AI’s blog puts it, this kind of fine-grained detection “could support copyright protection and content monitoring systems, helping identify near-duplicates or unauthorised versions that might slip past traditional matching tools.”

You can read the full paper here


3. Protection: Can Audio watermarking survive AI compression

Sony AI’s blog frames the third strand, protection, around a blunt question: “Can existing watermarking methods withstand real-world transformations?”

As the blog notes: “As audio compression becomes increasingly powered by neural networks… the very signals that watermarking systems rely on to prove authenticity are being erased.”

The paper, accepted at INTERSPEECH 2025, introduces RAW-Bench [Robust Audio Watermarking Benchmark], a framework that tests how well watermarking algorithms hold up against 20 real-world distortions including compression, background noise, reverb, and time stretching. The researchers tested four publicly available algorithms on a dataset spanning music, speech, and environmental sounds.

The key finding concerns neural audio codecs, the AI-powered compression tools used to shrink audio files. Against the Descript Audio Codec, every watermarking algorithm scored zero on full-message accuracy — meaning not a single watermark was fully recovered intact. Even after retraining two algorithms to resist these attacks, both still scored zero on this measure. Some algorithms managed partial bit recovery, but at levels too low to be practically useful.

The explanation is straightforward: watermarks hide information inside audio, while neural codecs strip out anything inaudible. Since codecs typically come last in the processing chain, they get the last word.

Conclusion: Current audio watermarking cannot survive AI-powered compression. As Sony AI’s blog suggests, “future watermarking systems may need to collaborate with codecs rather than fight against them, embedding identity in ways that persist through transformation rather than being filtered out by it.”

Read the full paper here.


The bigger picture

Together, these three papers describe a layered technical framework: attribution traces influence at the model level, recognition maps relationships at the fragment level, and watermarking benchmarks reveal where current protections fall short.

Sony AI says that its researchers “are helping define how balancing innovation with responsibility can work in the future of generative music: with AI that remembers its sources, hears its connections, and safeguards its signal”.

Looking ahead, Sony AI’s research in this area does not appear to be slowing down.

In a separate blog post published in February, Sony’s AI research unit said it will have more than 10 papers accepted at ICLR 2026, spanning “generative modeling, diffusion, multimodal representation learning, and creator-focused AI systems.”

Among the topics listed is “AI-assisted music post-production.”Music Business Worldwide

Related Posts