No voice? Text to Speech + Audio Hijack + Loopback For the Win!

Audio workflowI’m not sure you noticed on last week’s show, but the audio was just a little bit “different”. I started losing my voice on Friday and by Sunday I could barely squeak out a sentence. I get laryngitis periodically, ever since I was a little girl. Not sure why it happens, seems to be more allergies than a sickness because I felt fine.

Anyway, since I’ve been doing the show non-stop for over a decade, I knew I couldn’t NOT have the show. Luckily I had pre-recorded bits for the majority of the show, but how was I to make sure that the non-recorded bits were at least listenable? And remember that not only do I do the recorded show for the audio listeners, but there’s also the live show that’s broadcast on YouTube as a Hangout on Air and from there broadcast in audio and video to Alpha Geek Media, and the audio is piped to the NosillaCast app. It’s pretty complicated how all this works, and now I had no voice to make things just that much harder.

I started with the built-in Text to Speech capabilities of OS X. If you haven’t messed around with it, it’s really quite extraordinary. You can simply select a section of text in any application on your Mac, and in the menu of the application choose Edit, Speech, Start Speaking. In System Prefs you can also add a keystroke to invoke it, and you can add higher quality voices, one of which happens to be called Allison. That wasn’t actually the hard part though, it was figuring out how to get the Text to Speech version of Allison to be heard by all of the audiences.

In addition to the built-in text to speech capabilities of OS X, I used Rogue Amoeba’s Audio Hijack to hijack some of the audio pieces, and a new product also from Rogue Amoeba called Loopback. I thought it would be fun to walk you through how all of these pieces worked together to make the show a success in the end.

But first, the problem or problems to be solved. When I do the live show, there are three ways people will consume what I’m creating, both live and recorded.

  • There’s the Google Hangout on Air which is audio and video
  • Some people listen on Alpha Geek Radio or the iOS NosillaCast app
  • And finally, the audio needs to be recorded into my recording software, Amadeus Pro.

So that sounds easy, right? I’ve been doing all of these things for quite a while now, but the tricky bit was getting Text to Speech Allison into the mix.

Loopback from Rogue Amoeba was designed to allow us to create virtual audio sources. Applications like Amadeus Pro or GarageBand or even a Hangout on Air require an audio input and under normal circumstances you would simply select your microphone. But Text to Speech Allison isn’t a microphone, she’s coming from the Mac’s system audio. The system audio can come out of the speakers, or headphones, OR it can be assigned to a virtual source in Loopback.

In Loopback I created a source named, quite imaginatively, System Audio. Then on the Mac, in the Sound Preferences I set the output from the Mac to go to this newly created System Audio. That means that anything that uses System Audio as an input will be able to hear any sounds generated by the Mac operating system. In fact, as we found out during the show, it also means everyone could hear my TextExpander snippets expanding until I turned them off!

Ok, so all system sounds will come out of the virtual System Audio. Over on Amadeus Pro, instead of selecting my mic as the input, I could select System Audio and when the other Allison talks, Amadeus Pro could hear her. Even though this works, it only fulfills one requirement. Amadeus Pro can hear her, but it can’t hear the real Allison. In addition, I can’t hear her, and neither can the live audience.

We need to create another virtual source in Loopback. Rogue Amoeba suggests one called simply, Pass-Thru. This is very similar to the one I created for System Audio.

This is where Audio Hijack comes into play. Audio Hijack allows you to pipe multiple audio input sources together and send them to multiple audio outputs. In Audio Hijack, I use the little building bricks to create input sources for my mic and System Audio and I pipe the two of them into the output device we created called Pass-Thru. Now the Pass-Thru virtual device will include both my voice, and any system audio including fake Allison.

Back on Amadeus Pro, I can change the input to Pass-Thru and Amadeus Pro can hear me AND fake Allison. Whew. We’re past halfway guys!

Loopback settings for the HOANext up let’s tackle the Hangout on Air. Inside it, I have to set up my Input source so that Steve (who is the only participant) and the viewers can hear everything. We want people watching to hear me and fake Allison, but I can’t just use Pass-thru because they also want to hear any playback I do on Amadeus Pro. Back over in Loopback I created another virtual device called HOA Input. This time I did something more than making it a pass through device. In Loopback you can define software sources to go into the new virtual device. I put Amadeus Pro and Audio Hijack as the inputs to the HOA Input device, so now the Hangout on Air viewers can hear me, fake Allisonn and playback of Amadeus Pro.

Ok, two down, one to go. This one builds on the same idea. The audio-only listeners and Alpha Geek Radio and the NosillaCast app need to hear everything the Hangout on Air people hear, but they have one disadvantage, they can’t hear Steve. To be fair, he’s pretty quiet but from time to time he chimes in and, without one more piece, only the Hangout on Air people would hear him. Back to Loopback, and I created one last virtual device, called Nicecast Input. Like the Hangout on Air input, I piped Amadeus Pro, and Audio Hijack, but also Chrome so they’d hear what I hear.

Whew! I have to confess that even though I took screenshots during the live show, because I knew I’d never remember what I did, it still took me a couple of hours to reconstruct the whole thing! I even enlisted my good friend Pat to listen to me try and explain it because even I got tangled up. I made a real pretty graphic of the whole thing using Omnigraffle that I hope you’ll take a look at if only to appreciate how hard this was to do.

I made the diagram because I know I’d NEVER figure this out again the NEXT time I have laryngitis!

3 thoughts on “No voice? Text to Speech + Audio Hijack + Loopback For the Win!

  1. Doug Blunt - April 10, 2016

    HI Allison .. have you ever tried inputting a video file Into Audio Hijack ? My daughter does make up videos and her aIr conditioning Is so loud that she gets a buzzing sound . just curious .. do you know of any software that you can Input video Into and make the audio better ? thanks

  2. podfeet - April 10, 2016

    Hey Doug – yes this is a good use of Audio Hijack. She would start by having her mic as in input, and she could process it with an AU Dynamics Processor. That virtual device can take out low level noise, and also prevent peaking of her voice. Then she would pipe that to a virtual output source, like Soundflower. Then in her video recording app, she would use Soundflower as the input.

    The best approach is to eliminate the noise BEFORE it’s recorded. Taking it out after the fact will distort the “real” audio you want to preserve. There’s also a secret trick that allows you to hear EXACTLY what Audio Hijack will hear so she would know whether it was going to sound good or not.

    Audio Hijack is available as a trial so I could help you set it up if you want. Once you get it, it would be easier to do this over email I think?

  3. […] No voice? Text to Speech + Audio Hijack + Loopback For the Win! […]

Leave a Reply

Your email address will not be published.