MBW Views is a series of op/eds from eminent music industry people… with something to say. The following comes from Mark Douglas (pictured), CIO of UK-based music licensing company, PPL.
AI – who isn’t talking about it or using it? Unlike other recent breakthrough technologies that have been all hype and little delivery (I’m thinking Blockchains and NFTs), AI has already proven it is very much here to stay. And it is moving at a truly remarkable pace.
ChatGPT took a mere two months to reach 100 million users, already has one billion monthly site visits, and is producing output that many fail to recognise as being AI produced.
It is against that backdrop that I want to look under the covers of AI and question whether we aren’t getting a bit ahead of ourselves in our race to use it in every walk of life imaginable. I will do this by exploring how AI works and in particular the role and significance of training data.
This is particularly necessary for music where it is already abundantly clear that huge catalogs of commercial sound recordings have been used to train AI models, with no permission sought from the songwriters or performers, and with no flow of money to them.
It has taken until recent times for computing power to reach a point where a different kind of approach could be adopted. That approach is based on deep learning, a technique where computers ‘learn’ through the experience of being exposed to huge quantities of high-quality training data.
Whether it is music, art or text, the underpinnings of this approach are the same. It starts with every piece of training data being broken down into very small tokens. In the case of books or written works, these tokens could be the individual words, numbers, or punctuation marks.
In the music domain, these tokens would be individual notes or chords. These individual tokens are then stored, along with the probability of their presence and their likely relationship to other tokens.
“It’s essentially a brute force way of learning, and it works very well when done at large scale.”
It’s essentially a brute force way of learning, and it works very well when done at large scale. And this is where the developments in computing power kick in. Remember those stories of huge compute farms being built to do bitcoin mining? Computers packed full of high-performance graphics cards (GPUs) that were consuming more power than some small countries? It is this technology that has been the game changer for AI.
To process the staggering quantities of training material needed to make this deep learning approach work, AI models are being trained using data centres packed full of top-end GPUs such as NVidia’s H100.
An AI start-up has recently purchased 22,000 of these (at great cost – they can easily sell for $50,000 each given their scarcity and power) and, when commissioned, their data centre will be capable of performing 1.5 quintillion mathematical calculations per second (1.5 billion billion).
Returning to the way the models work, once an AI model has been trained, when it is prompted for an output, the prompts provided are used to navigate around this multidimensional model to find tokens that are likely to be present in the output.
There is no attempt to truly process the meaning of the prompts. They are simply used to hone in on the collection of tokens that are relevant for the context the user has set. Tokens with a high probability of featuring in the requested output are then stitched together where probabilities are also used to get the sequencing right.
When the underlying model has been trained on millions or billions of high-quality inputs, this probabilistic approach generates high quality, highly credible outputs. It’s why the outputs of most AI tools read so well. They have been trained on the works of the world’s best writers!
“What is vitally important to understand is that the AI tools have no innate intelligence.”
What is vitally important to understand is that the AI tools have no innate intelligence. They have no actual understanding of what is being asked or what is being output. They are literally retrieving components of what has been observed in the training data.
To create the illusion of true creation and originality, AI tools inject some randomisation to the selection of tokens it retrieves so that there is some variety in the output.
The fact that the AI tools work in this way can easily be proved. Ask ChatGPT to multiply 123,654,789 by itself and it gets the answer wrong. This is because it doesn’t understand multiplication the way that humans do – it determines its ‘answer’ by looking through its vast model pulling back individual numerical digits based on the probability that they may feature in the answer. A bit like a parrot that repeats words and phrases it has heard before with no understanding of the meaning.
The correct answer is 15,290,506,842,634,521 – so it’s only out by about 23 trillion!
This ability to confidently serve up factually wrong answers has been given the interesting name of ‘hallucinating’. It sounds so much better than just calling it what it is – getting the answer wrong!
“the value in AI output derives wholly from the data it has been trained on.”
Why is all of this important? Because it really drives home that the value in AI output derives wholly from the data it has been trained on.
There is limited value in the underlying software routines that are used. Indeed, many of the common ones are open source and freely available to all.
For sure, each individual AI tool has a context specific user interface (UI), but that doesn’t change the fundamentals of how AI works. Interestingly, there is growing evidence that these UIs contain specific logic to prevent the AI tool revealing the fact that it can recreate the copyrighted inputs on which it has been trained.
ChatGPT will gladly repeat the opening sentence of The Shining by Stephen King when prompted. It starts to stutter when asked to go further. Is it because it cannot or because it has been programmed not to?
For me, all of this throws a spotlight on a few important fundamentals:
- When an AI model is trained on the works of creators, it isn’t just harvesting the notes and words. It’s going to the very heart of the training material – it is ingesting the vocabulary, the grammar, the ideas, and the emotion – it is distilling and reusing the very essence of the human creator whose work it has been trained on.
- There is much legalistic hiding behind whether this method of training amounts to copying (in the established legal definitions of copying). Whatever the legal analysis, the outputs from AI are a regurgitation of what it has been trained on with a pinch of randomisation thrown in. That these models have been trained on so much material that it becomes difficult to establish the lineage from content used as “training data” to output does not change the fact that the value in the output derives wholly from the training data.
- In most cases, including the largest technology companies in the world, transparency of what has been ingested, where it was sourced form, and what metadata was used to build the model, is worryingly low. Recent research by Stanford University to provide some objective measurement of transparency, saw the top 10 AI models scoring between 12% and 54%. In a world where companies are increasingly obliged to make annual declarations that the goods and services they procure are from sustainable sources and that they do not damage livelihoods, it cannot be right that AI tools of unknown lineage become the norm.
- AI will never truly create – AI will never deliver those moments that gave us punk or hip-hop. AI would never have started calling cigarettes ‘blems’ or deciding that something good is ‘sick’ or ‘fire’. At best it will regurgitate the things it has been trained on in slightly new ways. For sure you can turn up the dial on the amount of randomisation that is applied to generate novel outputs, but that will create endless amounts of noise – it would be the infinite monkey theorem played out on huge compute farms – huge amounts of noise being created in the hope that occasionally there will be a nugget of gold. This would be a massive retrograde step for all creative industries and for humankind.
We desperately need our legislators to grasp these fundamentals. We need them to spend less time worrying about killer robots of the future, and really focus on the harms that have already taken place. We need them to rapidly develop codes of conduct that AI companies must abide by.
These codes must compel AI companies to gain permission to ingest training data and to pay appropriate value to the creators whose very essence fuels their large models. While this will be difficult to enforce on a global scale, they must create an environment where legitimate AI companies can show their compliance with a fair business model. A Fairtrade for AI, if you will.
The music industry is only too ready to embrace the power that can be delivered by AI, be that in the completion of song lyrics, the refinement of vocals or the development of cover art.
We have pioneered the adoption of so much technology over the decades, but when it comes to AI, we need to know we are using tools with legitimacy and which respect the labours of those that have fuelled them. As things stand, AI is the first manufacturing process in the history of mankind where nothing is being paid for the raw materials.
That cannot be right.Music Business Worldwide