WAVLEVEL is a tool to adapt the volume of a set of WAV files (maybe from different CDs) so the listener hears no annoying change in loudness from song to song. This is useful when you compile a sampler with many different artists.
I.) THEORY
There are 4 different strategies:
- Use maximum dynamic range for each song (NORMalize) (call wavlevel -mn) Every song will be scaled, so that it's data stretch from -32768 to +32768, thus the maximum loudnes of each song is (roughly) the same. This method produces good results if the ratio (peak_volume/average_volume) is roughly the same in all songs, but will provide unsatisfying results if you combine a song that has only one loud part with one that is uniformely loud.
- Use average peak-to-peak as a norm value (AMPLitude) (call wavlevel -ma) Not only the loudest part is considered, but an average of all peak-to-peak values of the song is calculated. The song with maximum loudness will get the full availlable range, the other songs are scaled so that their average peak-to-peak values match.
- Use logarightmic peak-to-peak as a norm value (LogAMPLitude) (call wavlevel -ml) This works just like the peak-to-peak method, but log10(amplitude) is calculated to an average value. This takes into account that the human ear has a logarithmic characteristic. Theoretically. ;-) Practically the muting/amplification effect is too weak.
- Use average energy in wave as a norm value (POWER) (call wavlevel -mp) The concept is the same as before, but every song will have the same average energy per wave. I approximate the energy as frequency*amplitude
The method that has proven to deliver best results is the amplitude average. Here is the output from
wavlevel -c -v -ma *.wav :
processing
bob_marley.jamming.wav analyzing volume statistics...
min. sample=-29691, max. sample=30579
avr. pp=5231, max. pp=43300 in 214404 cycles
avr. lpp=3.453030, max. lpp=4.636488
avr. en=343.923632, max. en=5587.500000
processing frank_sinatra.new_york_new_york.wav
analyzing volume statistics...
min. sample=-32669, max. sample=32676
avr. pp=8630, max. pp=59473 in 316207 cycles
avr. lpp=3.637568, max. lpp=4.774320
avr. en=465.781139, max. en=9445.000000
processing gary_moore.friday_on_my_mind.wav
analyzing volume statistics...
min. sample=-27956, max. sample=25091
avr. pp=7135, max. pp=39081 in 574770 cycles
avr. lpp=3.686575, max. lpp=4.591966
avr. en=479.042950, max. en=3672.625000
processing pink_floyd.high_hopes.wav
analyzing volume statistics...
min. sample=-32768, max. sample=31910
avr. pp=5769, max. pp=54135 in 636927 cycles
avr. lpp=3.272179, max. lpp=4.733478
avr. en=255.377495, max. en=6459.000000
processing the_police.msg_inabottle.wav
analyzing volume statistics...
min. sample=-32768, max. sample=32767
avr. pp=8027, max. pp=61773 in 495523 cycles
avr. lpp=3.728423, max. lpp=4.790799
avr. en=853.328151, max. en=13311.000000
processing the_specials.monkey_man.wav
analyzing volume statistics...
min. sample=-24045, max. sample=20398
avr. pp=4011, max. pp=31682 in 299900 cycles
avr. lpp=3.392512, max. lpp=4.500813
avr. en=314.747845, max. en=8877.000000
global maximum dynamics ratio is 11.522 at normvalue 5231.000 and value range span = 60270.000. global normvalue calculated to 5687.704.
processing bob_marley.jamming.wav
offset is -444
amplification is 1.087307
raising avr energy 343.924 -> 373.951, avr pp. 5231 -> 5687.704
processing frank_sinatra.new_york_new_york.wav
offset is -3
amplification is 0.659062
raising avr energy 465.781 -> 306.979, avr pp. 8630 -> 5687.704
processing gary_moore.friday_on_my_mind.wav
offset is 1432
amplification is 0.797155
raising avr energy 479.043 -> 381.872, avr pp. 7135 -> 5687.704
processing pink_floyd.high_hopes.wav
offset is 429
amplification is 0.985908
raising avr energy 255.377 -> 251.779, avr pp. 5769 -> 5687.704
processing the_police.msg_inabottle.wav
offset is 0
amplification is 0.708572
raising avr energy 853.328 -> 604.644, avr pp. 8027 -> 5687.704
processing the_specials.monkey_man.wav
offset is 1823
amplification is 1.418026
raising avr energy 314.748 -> 446.321, avr pp. 4011 -> 5687.704
That's how you read the output:
min. sample : minimum value found in data
max. sample : maximum value found in data
avr. pp : average peak to peak value of song max. pp : maximum peak to peak value of song
avr. lpp : average log10(pp) of song max. lpp : maximum log10(pp) of song
avr. en : average wave energy of song (pp*f) max. en : maximum wave energy of song
the loudest song (Frank Sinatra) was muted by factor 0.659, the quietest song (The Specials) amplified by factor 1.418.
II.) EXAMPLES
In practice, when you have a directory full of WAVs you want to compile to a CD, just call
wavlevel -d -v -ma *.wav
It pays to have the -v option (verbosity on) there because then you can write down the normvalue used. (In the above output it would be 5231.000) This is useful when you want to add anoyher file to the collection. You needn't proces all the others again (this wouldn't be optimal for the sound quality anyway) but simply run:
wavlevel -d -v -ma -n 5231.000 my_new_song.wav
And voila, the new song sounds just as loud as the others! (Of course, you must not change norming-methods between the two run. Normvalues are different for each method)
