Digital Audio Standards - Quick Guide

Digital audio technology is fast moving, constantly innovating, and this introduction brings you up to speed in no time.

Digital Audio signals are represented by three different parameters, each of these has an effect on audio quality. For best quality, match the encoder with the source.

For example: compressing an Audio CD, encode it to 2 Channel, 16 bit, 44.1 kHz.

Channels

Audio CDs contain 2 channels of audio, which is 2 independent audio signals. The idea being that your hi-fi has two speakers and that the listener sits in the middle facing the speakers. Your two ears detect differences from each speaker (created during CD mastering). This gives depth to the audio reproduction (stereo separation), as well as placing the vocalist in the center of the two speakers.

Movies benefit more than music from extra speakers. Effects some times need to appear from behind and it is easier to do this when there are actual speakers at the rear. DVDs have 5.1 sound (5 speakers and the ".1" is the low frequency sub-woofer).

Why is music not 5.1? Traditionally, if a concert was attended, all sound would appear to come from the front and nothing from behind. In car chase scene in a film, the police sirens would be behind. That is not to say music cannot improve with more speakers... certain tracks might try to place the listener in the middle of audio. If I had the choice of 2 very good speaker or 5 average ones, I would choose the two good speakers for a better music experience.

Channel Count Common Name
1 Mono
2 Stereo
4 Quadraphonic
6 5.1
8 7.1
10 9.1

Frequency (Sample Rate, or Samples Per Second)

Sound is made up from pressure waves. A single constant wave has its frequency measured in Hz (Hertz) (oscillations per second). Humans can hear from a lowest frequency of 10's of Hz, up to higher frequencies just below 20,000 Hz (20 kHz).

When talking about digital audio, frequency has a different meaning. It is the rate at which each sound sample is recorded.

Imagine you were told the temperature outside once a day and your friend was told the temperature four times a day. Who would have the more accurate picture? Your friend.

The higher the frequency, the more accurate the representation, up to a point... Human hearing can not hear above 20 kHz, so reproducing 50,000 Hz would be a waste of space. Nyquist's theorem states that to reproduce a 22 kHz sound signal, it must be sampled (recorded) at more than 2 x the required frequency. I.e: A sample rate of 44.1 kHz can reproduce a 22 kHz signal.

It just so happens that audio CDs have a sample rate of 44.1 kHz. So why is DVD audio 96 kHz, or 192 kHz? Is it a marketing ploy? Yes & No.

Yes it is a ploy in that more appears to be better. It has already been said that an audio CD can reproduce a sound that has a higher frequency than people can hear.

No, as it is easier (cheaper) to create a piece of audio equipment that plays back a 18 kHz signal without distortion, when fed a 192 kHz signal rather than a 44.1 kHz signal. High-end gear would not have much distortion, so there is no point in 96 or 192 kHz audio. Its just the cheaper consumer gear which improves.

Bit Depth (and Amplitude)

Consider these two audio sine waves:

B has a higher amplitude (2 x) than A. It is louder, but B is not twice as loud as A. Perceived audio loudness works on a logarithmic scale. The human ear was designed this way, so that the quietest mouse can be heard whilst the loudest jet tolerated (there is a great order of magnitudes difference between the two).

Bit depth is the resolution audio samples (recordings) can be stored at. Consider these 3 images as its representations of bit depth:

8 bit 16 bit 24 bit


8 bit has the worst detail. It looks coarse and for audio it sounds coarse, but there is not too much difference between 16 bit and 24 bit as they are both reaching the limits of perception.

Audio CDs are 16 bit, whereas DVDs are 24 bit. Again is it a marketing ploy? Yes & No.

Yes, as most people cannot hear the difference between the two.

No, as 16 bit audio CDs have been spoiled by the loudness race (CDs produced now are volume compressed... the quiet parts are pushed up louder, so that when played on radio or TV the track sounds louder) (a 1980's CD would sound quiet in comparison to one from 2000).

The downside is that 16 bit CDs are no longer effectively 16 bit. The full audible range is not being used. 24 bit helps, but in the long run, the same fate (loudness war) might happen to 24 bit DVD-audio discs.

Compression

When talking audio, compression can have two meanings: Volume compression where the volume levels are 'compressed' to make the overall piece louder and audio compression used to reduce the file size. We are discussing audio compression, of which there are two types:

  1. Lossy - the majority of compressed audio files are lossy. When encoding audio quality is sacrificed to achieve higher rates of compression. How much quality is lost depends on the encoder and settings used for compression. Bit rate plays the biggest role in determining final quality. Higher bit rate files have better quality than lower bit rate files. Bit rate is normally presented in Kbps (Kilo-bits-per-second).

    Bit rate can be fixed at the same value throughout the file and this is know as Constant Bit Rate, or CBR. Bit rate can constantly vary on demand as an audio track might have quiet parts and these quiet parts could have a lower bit rate, whilst complex parts use a higher bit rate. When the bit rate is allowed to change it is called Variable Bit Rate, or VBR. Finally there is also Average Bit Rate (ABR). Basically it is VBR but with constraints. These constraints give the whole file an average set bit rate so that the final file size can be roughly estimated (with VBR it could be any size).

    Typically a lossy 3 minute audio track might be 3 MB in size, around 10:1 compression (at 160 Kbps), or 10% of it's uncompressed size. Common lossy encoders are: mp3, ogg, vorbis, windows media audio (wma), advanced audio compression (AAC, typically stored in a .m4a container).
     
  2. Lossless - audio which is compressed using lossless can be uncompressed. Exactly the same (bit for bit) as the source file. It is without loss!

    Lossless is slowly gaining ground on lossy. The main advantage is that once your CD collection is ripped into lossless, that's it! - No more re-ripping. Unlike lossy where the need to re-rip might present itself if a newer encoder is released. Lossless can be converted to any other Lossless format without loss. Lossless can be converted to any lossy format and then has the same quality as ripping directly from the audio CD.

    The main reason lossless is held back, is that the final compression rates (size) are no where near as good as lossy. A typical 3 minute audio track might be around 30 MB uncompressed. Lossless could compress this down to 15 MB which is around 2:1 compression, or 50% of it's uncompressed size.