New Spotify patent sheds more light on potential karaoke mode – including auto-tuned vocals

Credit: Chubo/Shutterstock

MBW learned last week that Spotify is building an in-app karaoke feature after a screenshot of an unreleased ‘Karaoke Mode’ was posted on Twitter by renowned reverse engineer Jane Manchun Wong.

Apart from her discovery suggesting that the vocal level in the feature will be adjustable, details about the yet to be released (or announced) feature were pretty slim.

But today, we can shed more light on what Spotify might be building.

According to a patent granted to Spotify in the US this month – and leaked to MBW by an insider – Daniel Ek’s company has been focusing at least some of its attention on a new feature that allows users to “overlay a music track with their own vocals”.

The patent, which you can read in full here, was filed in December 2015 and granted on September 8 2020.

It states that, “in addition to playing back requested media content, users sometimes desire to sing along with the media being played”.

It adds: “Users may, for example, wish to overlay a music track with their own vocals by singing into a microphone as the music plays.”

The patent explains that typical karaoke systems “consist of a wired microphone physically plugged into a device that only plays locally stored content”.

“Consequently,” the patent adds, “typical systems substantially encumber the ability of users to select and control media content for playback, and to easily provide their vocals for overlaying with media content”.

Spotify suggests, therefore, that “there is a need for devices, systems, and methods for overlaying audio data for user vocals and media content received from distinct devices and systems”.

This sounds very much like an online karaoke feature, no?


The patent goes on to explain that the user vocals will be “captured using a microphone of a client device” which may then “be transmitted to a media presentation system, while corresponding media content, such as a music track, is transmitted to the media presentation system from a remote server distinct from the client device”.

The client device in this instance is a mobile phone, while the media presentation system, according to the patent, could be a television or external speakers.

The patent adds: “As the media presentation system plays the media content, the received user vocals are overlaid with the media content for playback as a composite data stream.

“Users are therefore able to more efficiently, effectively, and securely overlay and play back audio data.”



Perhaps the most interesting part of Spotify’s patent is its reference to “auto-tuning the vocals”.

The patent reads that “in some implementations, [Spotify’s] media presentation system auto-tunes the vocals using data received from the remote server indicating pitch, beat, and/or chords for the first media item (e.g., pitch data stored in metadata database).

“The media presentation system overlays the auto-tuned vocals with the first media item to generate the composite data stream.

It also adds that “in some implementations, the media presentation system modulates the vocals to produce a selected sound effect (e.g., a robot voice or vocoder effect) and overlays the modulated vocals with the first media item to generate the composite data stream”.


As noted last week, Tencent Music Entertainment (TME’s) WeSing app serves as a prime example of what can be achieved in the app-based karaoke business, with Spotify’s entrance into this space making a lot of sense.

TME claimed last year that WeSing users were generating 10 million recordings per day.

WeSing reportedly accounts for 77% of China’s online karaoke user base.Music Business Worldwide

Related Posts