Audiobook renderer

Sun, Nov 08, 2020 @ 18:51 263

A Ruby experiment that renders MP3s from markdown articles, with Google speech API and Wavenet voices.

I am a ruby beginner, by that I mean starting from zero. Its not a language that has grabbed my attention until now. So to learn the ins and outs of a new thing I generally need to engage with some kind of personal project, something that get the blood boiling, and at the end of it I have something for the effort. On-line video tutorials don't really do it from me.

To learn Ruby I decided to make a script that would open a markdown file, parse it into HTML paragraphs, then render each of those paragraphs as an MP3 audio file, using the Google Wavenet TTS (text-to-speech) API. Wavenet is pretty amazing, and it am confident that over time it will just the best synthetic voice API.

There is one snag about Google services, they cost money, which means rendering a short story everything the script runs could quickly become really expensive, unless there was a way to check the existing audio file against the proposed text paragraphs, and if the text changed only then re-render the MP3. Nice, this added an additional level of depth for me to explore Ruby with.

Stuff you need to do

  1. Google account and text to speech API
  2. Ruby and gcp helper that works
  3. You need to be online, it is after all it is SAAS

What you get when it's done

  1. A folder full of MP3s one for each paragraph
  2. A Ruby script to generate audio from markdown