top of page
  • Writer's pictureRon Jaworski

How authors and publishers can leverage AI audio solutions to improve sales

Updated: Feb 10, 2022

In September, I had the amazing opportunity to talk at the Digital Book World Global, where leaders and pioneers from across the global publishing industry gathered to riff on the industry developments, best practices, and more. The topic of my choosing was leveraging AI audio solutions to improve sales in the publishing industry, namely in the audiobook segment.

I’ll go deeper into the matter of the subject with this post and explain where the opportunities lie and how the audiobook segment of the publishing industry can benefit from it. It’s a topic that deserves more attention, I feel, as the advancements in text-to-speech technology warrant a serious look into its application for long-form audio content. The consumption of digital audio is stronger than ever, and TTS plays an important role in the audio and voice revolution.

Let’s first understand the audiobooks market (and its potential)

U.S. audiobook sales in 2019 totaled 1.2 billion dollars, an increase of 16% from the previous year. On average, the number of audiobooks listened to per year increased to 8.1 in 2020, up from 6.8 in 2019. What’s most fascinating is perhaps the fact that the increase isn’t coming at the expense of print books – more than half of audiobook listeners say they are making “new” time to listen to audiobooks, not replacing other activities with it.

Hence, it’s safe to say the audiobook audience is reaching more and more for a set of headphones instead of the old fashioned print or e-readers. And speaking of audience, Deloitte Global’s survey shows that audiobook (and podcast) listeners tend to be younger, more educated, and employed people, which are attributes that make them an attractive customer base.

Audiobook and podcast listeners skew toward being young, educated, and employed

This is also important due to their penchant for technology and the growing acceptance of voice technology in everyday life, as there is more consumption of content via eardrums now than ever before. Whether we are talking about the use of smart speakers and other audio-oriented devices or content such as podcasts and audio articles, audio is where readers are these days.

All the signs point to audiobooks (and other audio content) outgrowing the niche status to become a substantial market in their own right.

The barriers to entrance

However, things aren’t so rosy when it comes to production costs which are generally the most important factor to consider here.

There are two standard ways of producing an audiobook, the first being doing it yourself in-house. I’ve done a little digging and found out that for an audiobook containing between 50k and 60k words (my benchmark example for this post), you need:

  1. At least 16 hours of recording studio time;

  2. At least two weeks of post-recording editing;

  3. $500 on average for every hour spent in a professional studio.

I’ll delve a little bit deeper into the math behind these numbers. The general recommendation for audiobook narration is between 150 and 160 words per minute, which is the range that people comfortably hear and vocalize words. For a 60k word book, that’s just slightly under 7 hours of recording time, start to finish.

However, recording an audiobook means the number of studio hours will be greater than the hours of finished audio as there are breaks, setups between chapters, rerecording mistakes, discussions of potentially problematic areas of the book, and so on. Naturally, the longer the book, the higher the cost.

In the end, based on those three points above, it takes more than $8,000 and 100 hours to produce an audiobook.

Option #2 is to hire a dedicated production company. To produce a professionally narrated audiobook, you have to account for the length of your book, quality of the narrator, and the service you use. This option offers two cost equations:

  1. Pay 50% of the earnings for the next 7 years; or

  2. Pay $1500 + 20% of the earnings.

These are rough numbers that fluctuate depending on various use cases and factors (genre, speech rate, recording experience, etc.) but they do offer a good indicator of industry standards, which are far from ideal.

The opportunity/solution that is AI

This is where audio AI solutions step in. By audio AI, I mean text-to-speech technology (TTS) and converting all of the text into audio voiced by synthetic speech.

Now, this is the point where I usually bust out an example of such a computer-generated voice to reassure the audience how lifelike sounding it is. It’s no secret that audiobook listeners place a high priority on the quality of the narration so when I talk about TTS, I mean this:

This is a speaking style Amazon specially designed for long-form content with an aim to create a more natural and engaging experience. Powered by a deep-learning text-to-speech model, the long-form speaking style enables a speech with more natural pauses while going from one paragraph to the next or even from one dialog to another between different characters.

On the cost side of things, the results are as follows:

  1. Less than $300 per book;

  2. Less than an hour (sometimes even less than 45 minutes) of production time.

Or in a visual:

Trinity Audio player

And that’s it – you have yourself an audiobook.

In addition, text-to-speech technology brings to the table:

  1. The notion of scalability;

  2. New distribution options through integrations with leading streaming audio platforms;

  3. New monetization options such as audio ads, subscription models, and more.

The technology will only get better

The ongoing growth in audiobooks is a part of a larger trend of growth in audio overall. There is a potentially massive role for AI in book publishing which can significantly cut down on production costs and the time it takes to create an audiobook – all thanks to text-to-speech technology.

But that’s not all. As technology advances, there will be more options to fine-tune the listening experience such as pairing text and audio to see the text while listening to the content, synchronized highlighting for people with disabilities, setting different voices and reading speeds for different parts, multi-language conversion, and so on.

In other words – a bunch of things that will make listening more pleasant and accessible. For publishers, the numbers and developments mentioned here are worth watching closely as both audiobooks and audio AI continue their upward climb. If you’re a book publisher and want to join the audio revolution – give us a shout and we’ll get the ball rollin’!

Let's connect via LinkedIn!

Image credits:

bottom of page