Jeff Schmidt Sound Design & Development

View Original

Dolby Atmos for Podcasts - Year One

July 2022 marks one full year of my mixing in Dolby Atmos. Mostly podcasts (documentary & fiction), identity elements, and a bit of music. I figured it's a good time to take stock and share where I've been and how it's going.

I started the way most people start with Atmos these days - using the Dolby Atmos Production Suite (or any DAW with the Atmos Renderer built-in) and working by listening to the Binaural Headphone mix.

After a few weeks, I knew I wanted to mix on speakers. I couldn't trust what I was hearing in my headphones. Binaural spatialization has always been a bit funky for me. I have a non-average-shaped head, I guess.

Did someone order 14 speakers?

It took a minute for all the gear to get spec'd out and ordered. Supply chain issues delayed many things, but I had a 7.1.4 Atmos system fully operational by mid-October 2021. It was mind-blowing.

Many of the things I mixed in headphones sounded great, but other choices I made were clearly mistakes that one would ONLY make by mixing exclusively in headphones. (Hey, this size control is cool, let’s make sounds HUGE!!!)

The headphone experience is nice - but it's not the same as the speaker experience. And now I know it's not a substitute. If you can't have speakers, I strongly encourage you to find a studio that does and check your mixes there.

Alls I'm saying.

Within a few months, I decided to add “Wide” speakers to the rig making the system 9.1.4

HELLO LEARNING CURVE

I'm glossing over a very large technical learning curve I needed to engage with. I don't want to bore you with that, but there was a stretch of time (weeks) when I felt like the dumbest person on the planet. It was all new, and I was making ALL the mistakes. Most of this came down to routing all the gear and creating a mountain of complexity in my previously easy-to-use studio.

For the morbidly curious, I used an external Rendering computer because I wanted to keep my Avid HDX System running. The only way to do that at that time was to use two machines. And Dante audio over IP.

I bristle at being called an “Audio Engineer” because I don’t really have the depth of technical knowledge I’d expect someone with the title of “Engineer” to possess.

So when it came to Dante Networking, I was pretty far out of my depth. I got the system working, but devices would drop off the Dante Network randomly, often in the middle of working! Super frustrating.

Brains of the Atmos beast: 128 Channels out of Avid HDX to MTRX, to Dante Audio Over IP to MacMini Atmos Rendering Computer, back to MTRX to Trinnov Speaker calibration out to 9.1.4. Also included is Apple 4KTV to Dante so I can listen to Atmos Music and watch TV/Film in full Dolby Atmos. Good times.

I did overcome those hurdles, thanks in large part to my friend Andrew, who is a Certified Dante Network Engineer.

Andrew came out to my studio to diagnose the issues. In short order, he decided my situation called for assigning static IP addresses to all the Dante devices on my network. That’s not supposed to be necessary with a Dante audio network, but it’s the only solution that brought rock-solid stability to my Atmos rig.

That was in March 2022. To date, not a single Dante issue has cropped up.

Now I know how everything works and how to troubleshoot when things don't work as planned. This was my experience, your mileage may vary. When in doubt, keep it as simple as possible to get started and document everything!

NOT FADE AWAY….

One more thing I ultimately had to come to terms with in upgrading to Atmos: Mixing with physical faders and adopting the traditional VCA-based mixing workflow. I’ve always mixed in the box with a mouse. Always. I never wanted to fill up space on my desk with a fader bank. Never.

But Atmos doesn’t (currently) feature a “Master” Bus with Master Compression, EQ, or Limiting.

We have up to 128 channels of audio and panning metadata fed discretely into the Dolby Atmos Production software.

The DAPS, as it’s known, “records” those audio channels and data like a 128 Channel tape recorder. That requires a bit more level riding to keep things in check.

After 8 or 9 months, I finally caved. I installed an Avid S1, 8 Fader control surface and moved my 64 Key midi keyboard controller off my desk.

Having a midi controller on my desk is vital, so I downgraded to a 25-key device and moved it off to the side, officially signaling the room is now “Atmos-focused.”

I realized pretty quickly that using faders is a new skill for me. Here I am with 20+ years doing professional audio work and only now learning the basic old-school skill of riding faders. Ha!

That said, the Eucon-enabled features of the S1 are pretty deep. I needed to start small by including a few basic functions in my workflow. Over time I add more depth and understanding through actual use scenarios. I can read about all the “features,” but if I’m not using them regularly, they won’t stick.

ELEPHANT, LITERALLY IN THE ROOM

Well, that “elephant” would “literally” be the 13-inch LFE channel which weighs in at over 130 lbs

Prop table, or collection plate?

But I suspect the larger elephant might be: how much did all this cost?

The short answer is a lot.

The short follow-up answer: it’s already paid for itself.

As I mentioned earlier, I started mixing with Atmos by using the Dolby Atmos Production Suite and a normal pair of headphones. Total cost $299.

You could start mixing in Atmos today for free using the 90-Day Free Trial of the same piece of software.

Apple Logic Pro ($199) now features Dolby Atmos built-in.

Now, if you want a room full of matching speakers, that’s gonna cost.

7.1.4 is the minimum “home entertainment” setup. That’s 12 speakers. If you want a room full of speakers that also meet the Dolby Atmos SPL specs (I did), that might cost even more.

The brand of speakers you can use is flexible. Obviously, the more exotic your taste in speakers, the higher the cost. Many people operating successful Atmos rooms spent half what my room cost. While others spent many times more. It’s scalable to your space, taste, and budget.

NOTHING PROPRIETARY

What comforted me when contemplating the upgrade expense was that my investment would not be locked into a closed or proprietary system.

My Atmos room is designed according to the Dolby Atmos Music spec, not the theatrical spec. The Music spec is a series of guidelines and best practices on how the equipment is best configured and operated.

Aside from the Dolby software I already mentioned, the Atmos Music spec uses standard pro-audio equipment. The theatrical spec is a different beast and largely unnecessary for my work.

If I ever decide to scale back to stereo, I can sell any or all of the existing speakers. It’s all standard pro-audio gear. I made sure to purchase gear I liked, had a history with, and tend to retain an above-average resale value.

Since I designed my rig from the ground up, everything would be new. I decided to sell some of my stereo gear I’d no longer use (audio interfaces, speakers, etc.)

SuperCalibrationexpialidocious!

Eventually, I covered about 1/3 of the cost of the new Atmos system by selling gear from the stereo system.

It should also be noted that while I certainly had a personal interest in owning a properly operating Atmos rig, I'm running a business, and this upgrade had to make sense as a business decision.

I didn’t demand the system pay for itself in one year or even two years. Instead, I calculated a 3-5-year window (the basic tax depreciation schedule) and knew that would be plenty of time to recoup the expense. I was fortunate not to need to take on debt to build the system. The business had enough cash, and my CPA even suggested this might be a “good year for more expenses.” Turns out I had a whopper planned!

So by mid-August 2021, it was a go. The system was configured, payments made, gear arrived, was set up, calibrated, and set up and calibrated again. Mixing commenced!

LET ME HEAR… SPEAKERS!!

Once I was able to close the gap between what I thought I was doing while mixing in headphones and what I was actually hearing in speakers, I decided to ignore headphones for the first Mix passes.

My process was to get the mix rocking in the room over speakers and then bounce over to the Dolby Binaural headphone mix to listen.

I learned that the placements and movements that were clear as day to hear in the room became more of an approximation in headphones.

This was particularly true for anything in the height channels and things that moved from front to back or back to front.

There were also some issues with the dialogue getting stepped on by Music & FX in the Binaural fold-down.

The center channel is really powerful in the room and increases intelligibility considerably. Once every sound populating 12 speakers needs to crowd into the Left-Right headphone virtualization, you're back to mixing stereo with some added space.

Checking on headphones is essential to ensure the fold-down doesn't bury important details like dialogue.

Admittedly, moving from speakers to headphones was a bit discouraging at first. A lot of distinct placement of sounds around the room became hazier.

The situation improved a bit for me when Dolby introduced a Personal Head Related Transfer Function creator for use in the Dolby Atmos Renderer.

This iPhone app takes a 360-degree scan of your head from the shoulders up and calculates how sound is changed by your physical attributes. It’s cool, and Apple has since introduced something similar for IOS 16.

The most important point here is that with some work and practice, I started producing what I believed were better-sounding headphone mixes than my standard stereo mixes.

EARLY ADOPTER FUN AND PAIN

Of course, it should be acknowledged that any complaining and nitpicking is the pain of being an early adopter.

Dolby Atmos has been around for over 12 years, but podcasts have only been using it for a few years. I've been at it for just one year.

It’s also important to point out that NO public podcast platform (Spotify, Apple Podcasts, Audible, etc..) officially supports proper playback of Dolby Atmos audio. Yet. I don’t have any insider info, but I expect that to change.

No doubt it's been thrilling being on the bleeding edge of designing cinematic sound for podcasts using Atmos. It's also been challenging because only a few people can properly hear how amazing this stuff is sounding.

YEAH, BUT 5.1 MUSIC FAILED, QUAD FAILED, WAH WAH WAH…

Seeking information about spatial audio and Atmos online inevitably leads you to old-time audio engineers that seem to delight in telling stories of all the past failures to move beyond stereo (Quad failed, 5.1 Music failed, etc...) Ipso, facto, they warn, Atmos for music & podcasts will share a similar fate!

I'm not convinced.

The headphone virtualization of Atmos is embedded in many of the devices people already own. It has orders of magnitude more consumer adoption than any previous surround format. And while it’s not “outstanding” yet, virtualization is steadily improving. To steal a line from Andrew Scheps, “today is the worst day Atmos over headphones will ever sound.” Which is an odd way of saying it’s getting better all the time.

Apple, Netflix, and others have a large investment in delivering spatial audio to as many non-tech people as simply as possible.

Hey gramps, Netflix just killed Stereo

Apple has introduced a custom binaural option into the iPhone that promises to improve the headphone experience for iPhone users.

Netflix recently announced all stereo tier subscribers would soon be getting Ambeo Spatial by default instead of stereo.

Think about that - Netflix is replacing the stereo feed with the spatial feed.

So no, I do not believe spatial audio will go the way of 5.1 music or Quad.

Let us now leave aside the rantings of cranky old audio engineers.

NEEDING TO CHANGE MY BRAIN, TOO

Even though I was excited about using Atmos for podcasts, I couldn’t help but have some serious skepticism.

I had difficulty imagining how working in Atmos would fit into certain projects' more hectic production schedules.

Would these projects give me another week or more after content lock to produce the Atmos mixes?

Turns out, no one was excited about that. Schedules were already crunched to make publishing dates, and adding an extra week into schedules seemed impossible.

DOLBY ATMOS: DESIGNED FOR MIXING

The Dolby Atmos workflow was not designed to be a dynamic production tool like I was considering. It has a baked-in assumption derived from its heritage as a tool for cinema.

The Dolby Atmos workflow assumes you have locked content with a known number of tracks and elements you need to mix. You then create your Atmos mixing set-up and routings based on that known quantity.

When I mix other people’s projects - (say music or a finished podcast episode), I clearly see how many tracks I’ll need to account for and set up my Atmos session around that.

But when the design is in progress and constantly changing - I won’t know how many tracks I’ll end up needing because I’m still designing, and the producers are still changing!

I’ve often likened working on certain kinds of podcast projects to having the film editor, director, producers, and sound designers all on the mix-stage changing the film at the same time while the “final mix” is supposed to be happening.

It’s literally building a plane while in the air. That’s challenging enough in stereo - adding Atmos makes it more challenging.

One of the challenges is adding more Atmos objects to a mix/design in progress. It’s not as straightforward as adding more mono or stereo tracks. I created this video a few months back showing all the steps it takes just to add one extra stereo object to an existing session.

See this content in the original post

LOOKING TO HISTORY… OR SOMETHING LIKE THAT

It helped that I could look back across years of projects to get a strong idea about how many objects and beds I “might” need. I found it helpful to start with the end in mind. What do I need to deliver?

I knew the bare minimum delivery for documentary-style podcast work was the Final Stereo Mix and Stereo Stems for Dialog, Narration, Music, and Effects groups.

Additionally, there could be the Atmos assets when requested.

While many music mixers boast about never using the Bed channels, I considered that was more of a “pose” than practical advice. Since stems were needed in my work, I realized I had to use Bed channels for each Stem group (DX, MX, FX, Nar) in order to collect the proper reverb and delay returns for each food group stem.

For Documentary style work, I settled on a 10-channel bed for each Music & FX. And 5 Channel Beds each for Dialog and Narration for a total of 30 Bed Channels. That enabled me to use a few surround reverbs and effects on the DX & Nar groups as I often do. The docu-series I’ve worked on often ended up needing international assets so having the DX & Nar separated became necessary.

From there, I created a Pro Tools template with groups of mono and stereo objects for each food group. I also included some extras over what I “typically used” to allow for objects should I need them. Having this built-in and routed in my template decreased the need to have to make new objects and routings in the middle of the design.

I discarded the separate narration stem for the Fiction series. Instead, I opted for a single 10 Channel bed for all Dialog and narration. I'll reconsider if and when I need to start delivering M &E on the fiction series.

DID IT WORK?

I started with a docu-series where there were many editorial changes. It took a bit more time than doing a stereo production, but I hit all the deadlines, nothing was delayed, and the mixes were approved. It proved the workflow and process worked.

Since then, I’ve designed and mixed over 12 hours of Atmos content this way.

For context: that’s the equivalent to about 18 40-minute albums. That accounts for the productions I designed and mixed at the same time. There have been another 8 hours of finished stereo content I mixed in Atmos.

Anyway - this way of working is now my default workflow. It’s all Atmos, all the time now at Jeff Schmidt Productions.

STARTING OFF WITH STEALTH ATMOS DELIVERIES

On the client side, I delivered the "baked in binaural" stereo file, which could be listened to on any device in any set of headphones. This file is produced by the Dolby Renderer. It's a standard stereo wav file and sounds the closest to the mix I created in the room as I've heard in headphones. And things were good!

Initially, I didn't tell people I was delivering them the stereo virtualization of a Dolby Atmos mix. I presented it the same way I had always delivered stereo mixes. Some clients noticed things sounded more spacious. Another noticed some of the more aggressive pan moves I made. Interestingly I did not get any notes about the mix sounding weird or different. No requests to “put it back” or make it "less" than what they were hearing.

These Atmos mixes did not generate new notes or new kinds of mix requests compared to previous stereo mixes. Hold on to that thought for a bit.

Dolby Atmos Renderer

Now, one could interpret that to mean Atmos isn't worth it if people couldn't hear the difference. If you're inclined to think that way, I will not try to convince you otherwise. I'm not on Dolby's payroll.

But that's not the way I saw it.

My goal was to prove (to myself, at first) that I could take a project from the DX edit into an Atmos session where I’d design, score, and spatially mix the project. And then seamlessly deliver a binaural render of that mix to the client that was as good or better than the stereo process I had always used. And to further work through the many editorial changes to the content while keeping the Atmos mix intact without needing extra days or weeks.

It was great that no one noticed anything other than that the mixes sounded good. Which they absolutely did, if I do say so myself. :)

Mission accomplished. This process also creates Atmos assets which are archived should that project seek an “Atmos Mix” in the future. It will already be done!

It’s only after clients became aware they were listening to ATMOS that things got…..

DANGER, WONKINESS AHEAD!

As to be expected, once clients became more aware I was doing these projects in Atmos, they naturally wanted to hear the mixes in Apple Spatial and smart speakers. This is where things started to get wonky.

Two new issues emerged. One was technical. The other, I think, is more psychological.

Let's talk about the technical issues first.

Apple Spatial doesn't sound like the mix I created in the room.

Apple Spatial doesn’t sound like Dolby's binaural headphone virtualization of the mix I created in the room.

Apple Spatial sounds completely different.

So it follows that the feedback generated by listening to the mix on Apple Spatial would be different.

Apple Spatial in Logic Pro monitored via Apple Air Pod Max

Some people were confused by how much "reverb" suddenly appeared in the mix. Thank you, Apple.

It’s true, Apple Spatial applies a heavy dose of reverb reflections to the mix to aid spatialization and head-tracking. The amount is actually not "that" bad on music, but it is particularly unflattering to dialogue-based content like podcasts.

It's important to point out that the Apple Spatial effect used for Atmos TV & Films is DIFFERENT than the one you hear applied to Atmos MUSIC or the podcasts playing out of the files app on your phone.

Hopefully, that will change, and podcasts can be treated more like TV/Film.

But for now - we have to suck it up. Your podcast will have an unflattering amount of extra reverb added in Apple Spatial. Next!

EXPECTATIONS ABOUT ATMOS (real and imagined) INVADE THE MIX

And that get's us into the psychological issue. Once clients were aware "this is an Atmos mix," they definitely listened differently and started asking for different things.

I think their expectation shifted to an imagined experience about what Atmos is “supposed to be” rather than what it actually is.

I've heard music mixers say they play other Atmos mixes for clients BEFORE playing them their own mix. This was to get their ears used to the difference between stereo and Atmos. I'm not sure how to do that with my work just yet, but I definitely understand why it’s important.

NEW REQUESTS - BECAUSE - ATMOS!

It seemed that KNOWING a project was mixed in Atmos created a new set of expectations and requests.

It’s Atmos - it makes sounds “move,” right?

In came new requests to make sounds “move more” make this “swirl,” and “make it wider.

So much of the early Atmos feedback was to make things move more along the left-right plane.

It’s easy to perceive LEFT and RIGHT locations in headphones, but it’s not as easy to detect height, side, and rear locations.

It occurred to me that these were mostly 2D comments (Left-Right) applied to 3D mixes. It's not surprising because they listen on headphones, which is a 2D rendering of a 3D mix.

Suddenly, I was adding extra left-to-right panning effects just so something obvious would “wiggle” in space. I’d never done that in my stealth Atmos mixes or in the stereo mixes before that.

I would explain that in my room, I have six speakers on my left side in which to place or move a sound “Left.” In the headphones, there is mostly only LEFT. And the requests often reflected that disconnect - as in “move this sound to the left” but never “move this sound to the left rear height.

To make matters more frustrating, the many elements that were already moving from front to back or up and around the room were not noticed distinctly. That kind of movement just isn’t obvious ENOUGH on headphones. So it’s back to making things wiggle along the stereo plane - haha!

Again - this is still very early days for many.

Watch as Dobby makes this sound wiggle!!

While I always inform clients when I’ll produce their work in Dolby Atmos, some don’t know what that means or even care.

One recent project replied, “the Harry Potter character?” A joke, but a good one.

They received the baked-in-binaural mix and loved it without any requests for panning tricks or gimmicks, and the mix in the room is wonderful.

The best of both worlds.

And this gets back to the real point of this - I don’t want Atmos to be perceived as something “special” to put on a pedestal reserved for Marvel movies. It’s not a shortcut or a bag of gimmicks and tricks.

Atmos can be very dramatic, to be sure. But it also excels at increasing immersion and space and is often best when used subtly.

WE’LL GET THROUGH THIS GANGLY ADOLESCENT AWKWARDNESS. I HOPE.

As people become more accustomed to the sound of Atmos in general, I know we will get through these early growing pains.

Even sitting in a 7.1.4+ room for hours a day, it still took me time to “Close The Gap” between the Atmos headphone experience and the speaker experience. I remember what it was like only hearing my mixes in headphones. It compelled me to buy a 14-speaker Atmos rig!

I've started initiating conversations with clients about this in an attempt to push things ahead a bit faster. This post reflects some of what we’ve been discussing.

SPEAKERS ARE FOR CREATIVE CHOICES

For clients with access to proper 7.1.4+ environments, I've encouraged them to review productions in that environment first. That’s the best way to hear the production as intended.

I've further suggested they limit creative feedback to that 7.1.4+ speaker listening experience. Physical speakers are the best way to make true creative immersive panning and placement choices.

SOUNDBARS ARE FOR QC

Next, I suggest moving into what could be called the "Consumer Device" review phase. This includes Apple Spatial via Airpods/headphones and Atmos-enabled smart speakers/soundbars.

Since I don't think these devices are the best place to make creative choices, I suggest the feedback in this phase should be limited to QC and translatability issues.

Even though Atmos dynamically delivers an “appropriate” mix based on the device’s capability, not everything in a mix ends up where it should be at the consumer level.

Issues that pop up on consumer devices:

  • Dialogue and other essential elements that get stepped on or pop out too loud.

  • QC issues such as clicks & pops.

  • Some placements suddenly sound panned incorrectly or distracting that wasn’t an issue in the 7.1.4 listen.

The goal at this phase would be to check to ensure the cool stuff we're doing in 7.1.4+ doesn't create problems on the consumer playback devices.  As there are countless potential consumer playback devices, this review is always going to be limited to the exact setup you’re using. It’s still worth doing.

I've conceded that (for now, at least) many of the cool things we can do in 7.1.4+ won't be "as cool" on headphones or sound bars.

But they shouldn't lessen the experience on those devices, so these checks are essential. As the tech improves, these kinds of issues will decrease.

DID YOU SAY FLOUNDER? WHAT? SPEAK UP! OH - LOUDNESS!

To close this out on a technical note, I want to address the issue of Loudness.

Currently, there is no loudness spec for Atmos Podcast Mixes.

The Dolby loudness spec for ATMOS MUSIC is -18 LUFS integrated and -3db True Peak. 

Dolby settled on this spec (in collaboration with Capital Music Studios, I believe) as the maximum suggested loudness levels of an Atmos Music mix. Platforms that support Atmos Music like Tidal, Apple Music, and Amazon Music have adopted that spec and have been known to reject mixes that exceed it.

The purpose of that maximum is to allow enough headroom for 128 channels to properly fold down to the various play-out options (like two-channel Binaural) without clipping.  A side benefit of this is that music that was normally mixed at -10 LUFS (or louder) in stereo suddenly had a ton of headroom and dynamics previously lost to the loudness wars.

ATMOS PODCAST LOUDNESS - THE QUEST FOR SPEC

I'm not convinced the Atmos Podcast spec should be as loud as the Atmos Music spec.

In my experience so far, -18 LUFS feels pretty loud for dialogue-based content coming out of 12+ speakers at 79-82 decibels.

I’ve been able to hit that level with some of the projects I’ve up-mixed from stereo assets where the “mastering” processing was baked into the stems, but man, it’s loud. I usually turn down my monitors to work on projects like that.

Getting my native Atmos designs to -18 without pushing the True Peak over -3db requires more limiting than sounds good to my ears. So my mixes have naturally come in about 2-4 LUFS lower.

To steal a phrase from someone whose name I can’t remember - with Atmos, “Space is the new loudness.”

PODCASTS ARE MORE LIKE TV & FILM THAN MUSIC

Rather than blindly adopting the Atmos Music loudness standard for podcasts, I suggest we look at the more dialogue-focused specs of TV/Film mixes for guidance.

TV/Film streamers (Netflix, Apple, Amazon, Disney, etc.) have loudness specs ranging between -22 to -27 LUFS Integrated with a TP of less than -2db. They also have loudness range and dialogue gating specs, but let’s ignore that for now.

My gut and ears tell me that -20 to -22 LUFS Integrated (+/- 1db) with -3 to-2db TP could be a solid loudness spec for Atmos Podcast mixes. 

It's louder than most TV/Films because podcasts are more of a mobile format.

It's quieter than music which matches how stereo podcast levels compare to stereo music mix levels. 

It also retains the -2 to -3db TP level, which "should" prevent Atmos mixes from sounding clipped or distorted when folded down onto the various consumer devices.

I’m not suggesting everyone needs to adopt this exact spec. It’s just a suggestion.

If -18LUFS/ 3db TP works for you - go for it.

I’ll likely settle in at -22 to -20 LUFS Integrated with a -2 to -3db TP; we’ll see!

It’s early days - we get to decide.

But let’s let our ears guide us on the podcast front the same way they did when creating the music spec.

So that’s that, for now!

I could probably fill another bunch of pages on Atmos for podcasts and will certainly have more thoughts as I move into my second year with Atmos.

As always - feel free to reach out with comments and suggestions - I enjoy hearing from people who read these rantings. 🙂