Meet Kyogu Lee, President of Supertone – the voice cloning AI company acquired by HYBE for $32m

MBW’s World Leaders is a regular series in which we turn the spotlight toward some of the most influential industry figures overseeing key international markets. In this feature, we speak to Kyogu Lee, President of HYBE-owned voice AI company Supertone. World Leaders is supported by PPL.

AI technology is a big priority for HYBE, the South Korean entertainment giant behind K-pop superstars BTS.

Evidence of that arrived in March when HYBE’s CEO Jiwon Park consented to have his own voice cloned to demonstrate the capabilities of the company’s proprietary AI on its Q1 investor call.

HYBE’s so-called ‘voice synthesis technology’ was developed by Supertone, the AI voice replication software startup in which HYBE acquired a majority stake in a $32 million deal in 2023.

Founded in Seoul in 2020, Supertone claims to be able to create “a hyper-realistic and expressive voice that [is not] distinguishable from real humans”.

Supertone’s purported ability to do just that makes the apparent strategy behind HYBE’s multi-million dollar investment in the technology a lot clearer, when viewed through the lens of comments shared by HYBE Chairman Bang Si-Hyuk in an interview with Billboard last year.

“I have long doubted that the entities that create and produce music will remain human,” said Bang Si-Hyuk.

“I don’t know how long human artists can be the only ones to satisfy human needs and human tastes. And that’s becoming a key factor for my operation and a strategy for HYBE.”

By acquiring Supertone, HYBE has also brought into the fold the startup’s co-founder and President Kyogu Lee, a widely respected AI expert with a PhD in Computer-Based Music Theory and Acoustics from Stanford University.

In addition to heading up Supertone at HYBE, he leads applied research at the Artificial Intelligence Institute at Seoul National University (AIIS) and is also in charge of the Music and Audio Research Group (MARG) at SNU.

Lee claims in our exclusive and in-depth interview below that Supertone stands out in the AI audio landscape today because it is “theoretically capable of creating an infinite number of new and original voices, as well as recreating existing voices”.

This is made possible by Supertone’s foundation model called NANSY – which stands for Neural Analysis & Synthesis – and which Lee explains “serves as the backbone of Supertone’s speech synthesis technologies”. You can read the research paper for NANSY, here.

“NANSY has the special ability to divide and re-assemble voice components – timbre, linguistics, pitch, and loudness – individually and independently, generating natural-sounding voices with unparalleled realism,” he adds, (bolding MBW’s).

Supertone’s AI vocal cloning tech first generated global media attention in January 2021 when it “resurrected” the voice of South Korean folk superstar Kim Kwang-seok to be played on Korean television show Competition of the Century: AI vs Human.

More recently, Supertone made headlines globally after recreating the voice of Kim Hyuk Gun, the vocalist of the popular Korean band The Cross, who was paralyzed in an accident. “We collected 20 years of his voice data since debut and used it to train an AI voice in his unique vocal style,” explains Lee.

HYBE also showcased the possibilities of what it can do with Supertone’s technology when it released a new single called Masquerade from HYBE artist MIDNATT (aka Lee Hyun) last year. It was claimed by HYBE at the time to be the “first-ever multilingual track produced in Korean, English, Japanese, Chinese, Spanish and Vietnamese”.

In an increasingly global (yet localized) world, and amid the worldwide explosion of genres from K-Pop and J-Pop to Afrobeats and Spanish-language music, the opportunities presented by this use case of Supertone’s tech alone will likely have piqued the interest of music industry leaders worldwide.

Using this tech, a superstar artist – think Taylor Swift, Billie Eilish or The Weeknd – could release a new single in multiple languages in their actual voice on the same day.

According to Lee: “Supertone’s multilingual pronunciation correction technology unlocks new avenues for artists to communicate with local fans in their native language, reaching out to the global market.”

He adds: “We hope this collaboration will establish a constructive precedent for AI technology supporting artists in overcoming language barriers to connect with global fans and broaden their musical spectrum.”

MBW has previously asked if the company’s newly acquired AI technology could ever be used to recreate the voices of superstar HYBE artists like BTS for projects that don’t require the group’s in-person participation, for example, while they’re serving in the military.

MBW readers who have been following our coverage of HYBE’s financial performance over the years will recall that its “Artist Indirect-Involvement” business line – revenue-generating projects that use an artist’s brand/likeness, without the actual artist needing to be involved – became the company’s primary revenue driver in 2020 in the absence of live shows during the pandemic.

In FY 2021, a year in which HYBE revenues surpassed $1 billion in revenues for the first time, the company’s biggest organic revenue driver was, once gain, its “Artist Indirect” business, accounting for more than 60% of the company’s revenues.

This ‘Artist Indirect-Involvement’ business was only overtaken by the company’s ‘Artist Direct Involvement’ business line in Q1 2022.

We asked Supertone’s President the same question: Will its tech ever be used to recreate the voices of superstars like BTS?

 “While Supertone is theoretically capable of creating an infinite number of new and original voices, as well as recreating existing voices, we are devoted to prioritizing the rights of all artists and creators, including those under HYBE.”

Kyogu Lee

He tells us that, “while Supertone is theoretically capable of creating an infinite number of new and original voices, as well as recreating existing voices, we are devoted to prioritizing the rights of all artists and creators, including those under HYBE”.

He adds: “Our focus with HYBE artists lies in facilitating seamless communication and interaction with global audiences, transcending all barriers, including language and geography.”

Lee notes that HYBE is currently working on AI-dubbing some of its artists’ voices into foreign languages for parts of their video content, for example TOMORROW X TOGETHER’s ACT: SWEET MIRAGE concert video, where the members’ comments were dubbed into Spanish using Supertone’s technology.

One of Supertone’s latest advancements is a real-time vocal changer called ‘Supertone Shift’ that lets users switch between voices from a library of ten predefined voices. Users can then customize their chosen voice by adjusting the pitch, reverb and other effects.

Apart from the obvious production-related uses for this tool, the real-time capabilities could make it equally useful in a live setting. Just picture it: An artist could sing live on stage, and  via multiple different AI-assisted voices, all switched in real-time.

Lee tells us that Supertone Shift has already hit 70,000 downloads and 30,000 monthly active users in just over two months since its beta launch.

“The demand for expressing alter-egos has surged,” adds Lee.

Beyond music, Lee says that he envisions Supertone Shift “as the ultimate creative tool for a diverse range of content creators, including VTubers, livestreamers, podcasters, and gamers, enhancing the versatility and quality of their outputs”.

HYBE’s investment in Supertone arrived ahead of the current explosion of AI tech in the music industry and the wave of challenges it has brought with it. There are concerns about the source and legality of the training data used by many of the prominent AI music generators on the market today.

Music industry leaders have also raised the alarm about music streaming services and social media platforms being flooded with AI-generated songs. Some songwriters and artists, meanwhile, are worried about the threat of AI tools to their livelihoods.

According to Lee, AI’s future contribution to the music industry “will lie in expanding the creativity and imagination of creators and artists” rather than replacing creators and creativity altogether.

“Music, devoid of a storyteller — the artist — lacks the essential connection between storyteller (artist), story (music), and listener (fan), which leads me to believe that AI-generated music created without artist input may not endure,” he says.

Meanwhile, for Supertone, Lee says that the HYBE subsidiary is focusing on evolving into a consumer-facing company this year, by offering what he calls “artistic intelligence”  with its suite of AI tools for creators.

“By providing convenient services that are universally accessible and applicable across diverse content fields, we aim to reduce creative barriers for professionals and individuals alike,” says Lee.

Here, Supertone’s President and HYBE’s resident AI expert, Kyogu Lee, tells us more about his company’s tech, and his predictions for AI in the music business…

How is Supertone positioned in the global technology landscape today?

Voices created through Supertone’s technology can be used in various areas, including acting and singing due to its rich expressions, which has reached new heights through our recent technological advancement to generate them in real-time. Moreover, fully equipped with our R&D lab, Content Business Development department, and in-house studio, Supertone transcends the scopes of a technology provider; it serves as a gateway to elevated content, offering new possibilities for content partners spanning music, broadcasting, movies, games, and beyond. We strive to add value to the content industry by amplifying creators’ artistic expression to produce more engaging content, and by introducing innovative voices to create new forms of content.

“As we continue to collaborate with the creative industry, Supertone’s value is being appreciated across a wide range of content domains.”

Notable achievements include our contribution to the Netflix series MASK GIRL released in August, 2023, where Supertone’s multi-speaker voice morphing technology brought to life the main character Kim Mo-mi’s alternative persona as an online streamer by producing a unique third voice from fusing voice tones of two actresses who played the character.

Additionally, in the Disney+ 2022 hit series Big Bet, Supertone utilized its voice De-aging technology, the industry’s first attempt, to rejuvenate veteran actor Choi Min-sik’s voice for his character who was in his 30’s. As we continue to collaborate with the creative industry, Supertone’s value is being appreciated across a wide range of content domains.

The company received a lot of media attention around its Singing Voice Synthesis (SVS) technology in early 2021 and particularly for recreating the voice of South Korean folk superstar Kim Kwang-seok. Tell us about the reaction to the technology from the music business at the time?

Kim Kwang-seok is a legendary singer cherished by Korean people with deep connections and affection, so we approached the project with utmost respect.

Although we were cautious given the unfamiliarity of voices created with SVS technology at that time, we had confidence in our ability to authentically resurrect his voice, leveraging Supertone’s forte in creating expressive voices that could deliver emotions and meanings through singing or speech.

Thankfully, the music industry and fans embraced the result with delight and gratitude. For the public, it provided a chance to observe new possibilities in the content realm, as AI reignited their nostalgia. Hearing Kim’s recreated voice, Kim Sang-wook, a prominent Korean scientist, responded, “I hope this serves as an opportunity to explore AI and contemplate its coexistence with humanity.” I’m grateful it succeeded in its goal of evoking memories and ultimately resonating with fans as intended.

HYBE first invested in Supertone in early 2021; tell us about how the partnership came about and what made HYBE an attractive business partner?

Supertone initially engaged with HYBE [formerly Big Hit Entertainment] in the first half of 2020. During this period, Supertone’s singing synthesis technology was gaining attention, and the late Kim Kwang-seok’s project sparked interest from the entertainment industry, including HYBE, marking the beginning of our interaction.

“HYBE had long been at the forefront of pioneering and advancing technological innovation in the entertainment sector.”

HYBE had long been at the forefront of pioneering and advancing technological innovation in the entertainment sector. They recognized the promising trajectory of Supertone’s technology, including the innovative singing synthesis technology, which we both trusted would be suitable for the music industry. Concurrently, Supertone was firmly convinced of boundless possibilities and synergies that would arise from combining our technology with HYBE’s global intellectual property (IP) and established production capabilities, which resulted in this partnership.

HYBE fully acquired Supertone in 2023 – how is Supertone positioned within HYBE today and how does Supertone’s technology complement HYBE’s business objectives?

Acquired by HYBE in January 2023, Supertone contributes to HYBE’s commitment to providing new avenues for content and fan experiences through solution businesses that leverage artists’ intellectual property (IP). We’re currently in the process of running pilot projects across HYBE’s various business areas, networks, and partnerships to advance Supertone’s technology and explore applications that can support and assist artists. Our technology can be utilized as a useful creative tool for some artists like MIDNATT who seek to experience new musical endeavors beyond technological limitations.

“Supertone contributes to HYBE’s commitment to providing new avenues for content and fan experiences through solution businesses that leverage artists’ intellectual property.”

Additionally, it can enhance content immersion by integrating natural and expressive voice synthesis technology, as exemplified by Weverse Magazine’s ‘Read-Aloud’ feature.

We’re continuously discussing various business opportunities internally to innovate the possibilities of content creation.

Last year we reported on the release of HYBE artist MIDNATT’s single Masquerade which was released in multiple languages using Supertone’s technology – are their plans for this element of your technology to be released commercially or to the public so that other artists can release tracks in multiple languages simultaneously?

MIDNATT project marks the first occasion where Supertone collaborated with an already existing artist to deliver more immersive and accessible music to fans worldwide. Following the release of his track Masquerade, we monitored a significant amount of positive responses from fans in various languages.

Some expressed how hearing their beloved artists in their native tongue and instantly comprehending the lyrics moved them like never before.

It was immensely gratifying and rewarding that they understood the intention and sincerity behind [the project].

Supertone recently launched its AI ‘voice changer’ tool, Supertone Shift, which lets artists change their vocals in real-time. What are your ambitions for this tool?

Supertone’s extensive research into real-time AI voice conversion traces back to 2021, triggered by a conversation with an artist I met through a TV show. Despite being a beloved artist for a long time, he expressed regret over his voice’s inherent limitations in manifesting a wider range of expressions.

This made me realize that not only ordinary individuals like us but also those who captivate the public with beautiful voices desired to exert new vocal expressions.

Focusing on achieving real-time conversion of conversation-level voices, we showcased our initial project, then called NUVO, at the 2022 CES, where it won the Innovation Award. Later, we further refined the technology to a level suitable for live stages. This was demonstrated in 2023 when MIDNATT seamlessly transitioned between his vocal and a female vocal during a live performance. Achieving imperceptible latency prompted us to recognize the needs of real-time content creators, leading to the development of Supertone Shift.

Some think that AI technologies pose a significant risk to creators’ livelihoods. What would you say to those concerned about the future of AI in the music business?

We are fully aware of the controversy associated with AI technologies. Above all, what’s crucial is to ensure that an artist’s creative intentions are conveyed, and that AI technologies are used as a catalyst for human creativity. It is our firm belief that we can only change perceptions by showcasing exemplary cases of how technology can assist artists and creators by collaborating closely with them. Creating meaningful content based on technology cannot happevn without inspiration and ideas that originate from creators.

“We are fully aware of the controversy associated with AI technologies.”

Recently, Supertone recreated the voice of Kim Hyuk Gun, the vocalist of the Korean band The Cross. After performing The Cross’s music on a live stage together with the AI voice, Kim expressed his appreciation, saying, “Thanks to the assistance of AI,” he was able to successfully deliver a live performance despite his challenging conditions.

As showcased in this example, Supertone is constantly searching for ways to assist artists in overcoming creative barriers caused by physical or technological limitations.

However, we are often amazed by the innovative ideas and unexpected applications proposed by the artists and creators we collaborate with. Ultimately, I believe technology evolves in a mutually beneficial manner through ongoing interaction and engagement with artists and creators.

What are your predictions for the evolution of AI-assisted music in months and years to come?

AI is being utilized throughout the entire process of creating, producing, distributing, and consuming music. Perhaps the most affected aspect of this is the creative process. However, I am personally skeptical if we can call this music produced solely from AI the “evolution” of music.

“I am personally skeptical if we can call this music produced solely from AI the “evolution” of music.”

To explain the reason behind this, we need to talk about the essence of music, which I believe is “storytelling” — the fundamental purpose of all creative works and content.

Artists aspire to convey their intended story through the creative process, and various formats and genres of content have developed to maximize the effectiveness of their storytelling.

What are the biggest challenges in the AI-audio / AI music landscape today?

First and foremost, I believe establishing social consensus should be prioritized, one which will provide guidelines for identifying and addressing potential risks and issues caused by synthesized voices created without consent. This will mandate the AI industry to equip itself with the capability and readiness to respond to these issues.

“We do not monetize on a voice without the permission of its rightful owner, under any circumstances.”

Since its establishment, Supertone has adhered to the philosophy of developing products and conducting business in a manner that respects the intentions of creators. We also continue to enhance ethical guidelines and technological safeguards to prevent the abuse and misuse of AI technology. Supertone possesses watermark technology capable of detecting voices created by Supertone, and we have additionally initiated advanced research and development in this technology since April. In addition, we are actively cooperating to establish legal and institutional frameworks through continuous communication and interaction with relevant industries and policymakers. Throughout our endeavors, we will always prioritize the needs of creators and fans, striving to develop and apply relatable and coexistent technologies.

Supertone upholds the following three principles for responsible and ethical use of AI:

  • We do not monetize on a voice without the permission of its rightful owner, under any circumstances. We limit our non-commercial research to public or deceased figures and do not disclose it to the general public.
  • We minimize access to training and synthesized voice data, and possess marking technology that enables the detection of AI generated audio
  • We prioritize the rights of all artists and creators and seek harmonious coexistence with the creative industry.

What are your ambitions for Supertone’s positioning in the AI tech space in the coming years?

Supertone aspires to be the foremost choice by creators worldwide who seek solutions and services to produce voice content effectively and efficiently. We aim to imprint the equation “#1 Voice AI Tech Provider = Supertone” in the minds of all creators and potential customers globally.

If there was one thing you could change about the music business what would it be and why?

As technology advances to facilitate music production and distribution, overproduction and oversaturation emerge as significant challenges.

The democratization of music production, fueled by advancements in creation and production technologies, has empowered numerous non-professionals to create music effortlessly.

“As technology advances to facilitate music production and distribution, overproduction and oversaturation emerge as significant challenges.”

Moreover, the widespread accessibility of the internet and various platforms has enabled global distribution of music.

This inundates listeners with an overwhelming amount of music on an increasingly larger scale, making it difficult for them to discover and explore music that aligns with their preferences. Addressing this challenge will require the development of systems or methodologies capable of identifying and delivering hidden, high-quality music to listeners.

World Leaders is supported by PPL, a leading international neighbouring rights collector, with best-in-class operations that help performers and recording rightsholders around the world maximise their royalties. Founded in 1934, PPL collects money from across Africa, Asia, Australia, Europe, and North and South America. It has collected over £500 million internationally for its members since 2006.Music Business Worldwide

Related Posts