Siri Co-Founder Tom Gruber on the future of AI in the music business

April 29, 2019

Siri co-inventor Tom Gruber. LifeScore is a graduate of the Abbey Road RED Incubator.

It’s no secret that voice activated smart devices are changing the music industry as we know it.

A recent survey from Adobe Digital Insights states 70% of voice-assisted smart speaker owners were using them for music.

And although Apple‘s Siri-assisted Homepod has lagged behind rivals like Google Home and the Alexa-assisted Amazon Echo, its voice technology was one of the first to reach a mainstream audience.

Started back in 2008, Siri was bought by Apple in 2010 and was first introduced on the iPhone 4S in late November 2011, with Alexa and the Amazon Echo following three years later.

Siri Co-Founder Tom Gruber (pictured), an artificial intelligence researcher, designer, and entrepreneur, was the company’s Head of Design.

At Apple, Gruber worked on AI initiatives as the head of Siri’s Advanced Development Group, before leaving the company in 2018.

One of his more recent roles since leaving Apple is that of Co-Founder of LifeScore, an adaptive music platform devised by virtuoso composer and founder CEO Philip Sheppard.

The startup recently joined Abbey Road’s Red tech incubator, which says that LifeScore takes ‘the highest quality musical cells, recorded at Abbey Road studios performed by expert session orchestra players, and processes them with an adaptive engine that creates constantly evolving and personalised versions of the music’.

Here, Gruber shares his views on the future of AI in the music business and how far the development of the technology has come since Siri was sold to Apple in 2010.

Tell us about why you decided to get involved with LifeScore?

In my career I look for ways to augment people with technology that can help them live better lives. Music is something that enriches our lives, and technology has made it possible for everyone to experience the music of the entire world. This is mind blowing, and it’s easy to forget how profoundly this has changed our relationship with music.

Technology has helped us navigate this bounty with machine driven personalization, alongside human curation and collaborative filtering. I think the next big impact of technology will be to personalise the audio itself in real-time, to adapt to what people need and want in their particular contexts, with evolving music.

You can think of this as a soundtrack for your life, that is supportive of your life. Music for walking, running, driving, thinking, writing, cooking, relaxing, making love, sleeping. Recommendation engines are an important first step, but they are still in the realm of sequencing recorded songs. In my experience, the highest form of “functional music” — music created for given situations to achieve certain affects in the listener — is film score music.

So, when my friend Philip Sheppard, a master composer of film score music, told me that he was doing a technology play in this space, I was fascinated. We considered what it would take to have technology compose a film score for your life. He convinced me that he had a solution, and so I wanted to help make it happen at scale.

What are you and the LifeScore team looking to achieve over the coming years, and how do you hope Abbey Road Red will help you?

We are exploring the idea that a particular mix of human creativity and machine power will enable us to offer highly personal, adaptive music, with musicians and composers as the foundation.

We are betting that the raw materials — the units of music that can be composed into life scores — are essential to the quality of the experience. This means recording a library of musical raw material. Abbey Road is the best recording studio in the world, and it has a community of some the best musicians and composers.

And the Abbey Road Red team are experts in this space of technology-enabled music creation and the business models and product iterations that could make it scale. So, they are fabulous partners. Together I think we can make this vision come to life in the near future.

How central do you think AI and audio/voice recognition technology will be to the lives of music consumers in the next ten years?

AI is revolutionizing processes that can learn from large data streams and make predictions, classifications, or recommendations from that data. In the very near future, I expect the recommendation engines to get much better, as the collective feedback of millions of people carrying around context-sensing computers is fed into the learning algorithms.

“AI will also play a central role in technology-enabled music creation, but we are still in early days for that wave. My personal take on it is that the biggest impact will come from companies that figure out how to fold in human creativity into the machine intelligence, rather than trying to create totally autonomous music boxes.”

AI will also play a central role in technology-enabled music creation, but we are still in early days for that wave. My personal take on it is that the biggest impact will come from companies that figure out how to fold in human creativity into the machine intelligence, rather than trying to create totally autonomous music boxes.

As we have seen from the success of voice interfaces, machines are now capable of listening to and making sense of human generated audio. In my experience, I’ve been amazed at how robust the new deep learning approaches are at finding the signal in the noise, outperforming traditional hand-engineered signal processing techniques in many cases.

Machines can hear a needle in a haystack, so to speak. For consumers, this means machines could get surprisingly good at identifying voices, identifying songs and musical characteristics, and offering feedback to someone learning to play an instrument. They can also process other information about the context, such as whether someone is exercising or trying to chill out. Combine the ability to hear with the ability to sense context and you have the ingredients for breakthroughs in personalized music.

What opportunities in AI/voice recognition tech do you think the industry is currently not taking advantage of?

I do not have years of experience in the music industry, yet! So, I may be missing some new developments, but I haven’t seen much done yet in the space of machine-generated singing. Not vocoder-style robot intonation effects, but something that does what a great vocalist does to impart emotion and meaning in the lyrics. There is technology from text-to-speech generation that could be exploited here.

Should the use of voice assistant technology be taken into consideration when choosing song/album titles, artist names, lyrics etc?

It would be nice for the voice assistant, when someone invents a clever name that isn’t easy to pronounce, to include the pronunciation information within the metadata associated with the content.

That way the assistants won’t mangle the clever name, rendering it incomprehensible over a voice interface. And yes, until that happens, to ensure they are discoverable and easily accessible, artists will have to consider how their song or album title or name will sit in the voice activation environment. If it’s really hard to pronounce or includes a series of letters and symbols, then people will struggle to request it and the assistants will struggle to interpret it. The artist’s music won’t get heard, and that’s the saddest result possible.

As one of the co-founders of Siri, what are your views on how voice assistance has developed since the company was bought by Apple in 2010?

I am happy to see that the entire industry has made it a top priority to develop high quality voice and language interfaces. Since the launch of Siri, there has been a lot of money, talent, and technology applied to this goal.

“I am happy to see that the entire industry has made it a top priority to develop high quality voice and language interfaces. Since the launch of Siri, there has been a lot of money, talent, and technology applied to this goal.”

The AI tech in particular went through a nice upswing with the application of deep neural nets in the mid 2010’s. And all the layers of the value chain, from the form factor of devices that can listen and talk to you to the back end services that can do what you asked for, are being adapted to the language UI.

Do you have any other thoughts or predictions about the impact that AI will have on the music business?

It’s early days, and I’m very excited about the positive impact AI can make in the music business. I don’t think AI will replace humans in the creation of music that people love: not for sentimental reasons, but because of technical and psychological realities.

“AI will be quietly making our experience of music more contextually relevant; it will help us discover new music that we love; and it will likely make it possible for many more people to participate in the composition and performance of high quality music than ever before.”

However, AI will be quietly making our experience of music more contextually relevant; it will help us discover new music that we love; and it will likely make it possible for many more people to participate in the composition and performance of high quality music than ever before.Music Business Worldwide