The age-old problem of the so-called loud commercial is rearing its ugly head again — and this time it really hurts. In truth, this problem has never gone away, there are simply too many other things going on and too few people actually watching DTV signals. The difference this time around is that the benefits of digital audio, such as its wide dynamic range, seem to be making the old problem worse.
Analog systems had a legislated maximum modulation and audio processing was added to the signal chain to prevent exceeding legal limits. As processing became more advanced, it also became better at correcting inconsistent loudness within a given channel. Due to differences in processor capabilities or chosen settings, loudness shifts from channel to channel were never really tamed.
The Dolby Digital (AC-3) system offers over 100dB of dynamic range, a dramatic improvement over the limits of the analog system it replaces, but is all of this dynamic range useful or necessary? Probably not, but is it possible to deliver programming and preserve its original dynamic range while still protecting viewers from loudness problems?
This is where metadata comes into play. Designed to accompany the audio data, metadata is descriptive information that conveys parameters such as dialogue loudness (i.e. the famous dialnorm parameter), reversible dynamic range compression (DRC), number of channels in the coded bitstream, how to downmix channels if reproducing from fewer speakers than encoded channels (i.e. 5.1 audio through stereo speakers), and various other less useful parameters.
The two parameters that directly impact consistent audio levels are the dialnorm (dialogue level) and DRC (dynamic range control) values. Dialnorm simply controls a 1dB per step attenuator with a 30dB range present in all consumer decoders. It can be though of as a broadcaster-controlled volume control, with the idea being that consumers will set their volume control to a given level and broadcasters can use dialnorm to adjust the loudness of programs so they remain consistent — sort of re-adjusting the volume behind the scenes on a program-by-program basis. It allows programs mixed at different levels to be integrated into a single audio stream with corresponding volume corrections applied at the consumer side.
DRC is similar in that gain control values are calculated and generated upstream, but applied at the consumer side. This allows a consumer to tailor the dynamic range of programming to their listening environment. For example, if someone wanted to watch a movie late at night and have very controlled dynamics, they could choose to fully apply the values so as not to wake the children. If it were daytime and full dynamic range were desired, the DRC gain control values could be ignored and the original audio dynamic range would be reproduced.
However, both of these important values require correct settings in order to be effective. Audio must be measured to find the proper dialnorm value, and the limitations of the wideband DRC system in Dolby Digital (AC-3) must be observed, or else the system can work against the audio and make the problem worse. These values can also be intentionally mis-set in an effort to trick the system. Proper function of DRC relies on dialnorm being set correctly, and this is accomplished by the measurement of so-called "normal spoken dialogue" and as such does not include whispering, shouting, singing, or making strange wounded-animal-like sounds (or any other non-dialogue sounds). So, even if the normal dialogue levels of the program were matched, if the parts of a program with normal dialogue are short and followed by long passages of very quiet dialogue, viewers will likely turn up the volume control, only to be surprised or annoyed when the normal dialogue returns — now too loud.
This is where DRC is supposed to kick in and narrow the gap between the very quiet and normal dialogue sections of a given program. Assuming that normal spoken dialogue is the reference, then all other parts of the programming must be a comfortable distance away from this reference. There is research that suggests comfortable dynamic ranges for listening, and they are nearer to one tenth of the more than 100dB available. This can be done manually with artistic intent, hopefully tempered by the needs of the typical viewer, or it can be done automatically.
One automatic answer is to just compress the dynamic range of all programming, similar to what is done today in analog broadcasting, however this will ruin the artistic intent of the original mix. Perhaps the better way is for mixers to realize that the large majority of television viewers are hearing their mix through the speakers that came built in to the television set, and that if dynamic range is not appropriately controlled by some means, then viewers will tune away and broadcasters will lose money as the ratings and their ad rates drop. Happily, there is new professional equipment that spans the gap between pure metadata and pure dynamic range control, striking a balanced between what is set and what is actually required: the better the mix is, the less the processor needs to do.
The frightening and permanent end to creative control, though, will be if dynamic range compressors end up built in to every television set. With professional gear, metadata can be analyzed and used, but with some recently announced consumer-side processes there is no control afforded to program producers. It is a good time for mixers to realize the power they have and how to best match it to audience needs.
Tim J. Carroll is the founder and president of Linear Acoustic Inc. (www.linearacoustic.com) based in Lancaster, PA.