goto section page goto youngmonkey main page make contact, e-mail

Image Processing Techniques for Audio - Part 2
Advertising Space Available

Author:dhomas trenn
Published by:CU-Amiga magazine (UK)
Date:December 1997


Last month we introduced the unusual technique of using image processing software as a tool to manipulate audio data. This month, we will further apply the technique to create some interesting effects. We will also take a look at using the image processor to generate new sounds from scratch.

To hear the potential of this unique method of sound processing, listen to the song clips included in the SongClips/ sub-directory.

First, a quick recap of the processing method covered last month:

STEP 1: Preparing the audio file.

To load a sound file into image processing software we first need to convert it to a format common to both audio and image data: RAW. This can be done simply with a program like SOX:

SOX Piano.iff Piano.raw

STEP 2: Loading the audio file.

If the image processing software does not contain a RAW loader module, the SCULPT module can be used. For 8-bit audio files use SCULPT GREY. The loader will prompt you for the width and height to use for the image. Use dimensions that are large enough to accommodate the entire audio file: WIDTH * HEIGHT = RAW BYTES.

STEP 3: Processing the loaded data.

Perform whatever image processing functions you like on the loaded sound.

STEP 4: Saving the data.

If provided, save using a RAW saver module, otherwise use a SCULPT module.

STEP 5: Listening to the modified sound.

To hear the raw sound, use a program such as Play16 and indicate the frequency to play back the sound at:

Play16 FREQ=16780 Piano.raw

Tutorial: Part 2

For the following tutorial, we will again use ImageFX 1.52 from the coverdisk of CU-Amiga - June 1995 to do the data manipulation. However, the methods described here can be easily applied to most image processing programs. For audio playback and audio format conversion we have included Play16 and SOX, respectively, in the SoundLab directory.

Also in the Sounds/ sub-directory we have included some sounds to experiment with. In the Sounds-Processed/ directory you will find some examples of these sounds after having been processed with ImageFX using the methods that are described here.

As with most of these processes, the effects of the data manipulation can be very different with different sounds. The solarize image effect is a good example of a process that can have very different results.

Load the Sounds/Piano.raw sound using a width of 300 and height of 289. From the toolbox, select COLOR and then SOLARIZE (Figure 1). This is a preset function so it does not have any options.


Figure 1

Then save the sound as Piano-Solarize.raw and listen to it with:

Play16 FREQ=16780 Piano-Solarize.raw

The effect is not drastic but makes the piano sound like it is swarming with noisy bees. Compare it to the original Piano.raw sound:

Play16 FREQ=16780 Sounds/Piano.raw

Load the Sounds/Thunder.raw sound using a width of 250 and height of 281. Perform the same solarize function and save the result as Thunder-Solarize.raw. Listen and compare it to the original:

Play16 FREQ=16780 Thunder-Solarize.raw
Play16 FREQ=16780 Sounds/Thunder.raw

The effect here is very much like a low end audio filter. To hear a more obvious example of this filtering effect, perform the solarize function on the Sounds/Sine.raw sound. Load it using a width of 256 and height of 256. Perform the solarize and save it as Sine-Solarize.raw. The low end filtering is very obvious with this sound. Listen and compare with:

Play16 FREQ=9600 Sine-Solarize.raw
Play16 FREQ=9600 Sounds/Sine.raw

Image processing functions can have a very interesting effect on vocal samples. Listen to the Sounds/YMMind.raw sound to hear a normal voice speaking:

Play16 FREQ=9600 Sounds/YMMind.raw

Now, try the solarize function with the Sounds/YMMind.raw sound. Load it using a width of 200 and height of 141, solarize, save it as YMMind-Solarize.raw and listen with:

Play16 FREQ=9600 YMMind-Solarize.raw

This effect makes the voice sound very nasal and cheesy. Add a pocket protector and a pair of glasses with some white tape on the nose bracket and we have got a real nerd here!

It is easy to see that it is important to experiment with an effect on several different kinds of sounds. Otherwise, you might never know what potentialy great surprises you have missed.

The rotate image effect can have some interesting and very drastic results.

Load the Sounds/YMMind.raw sound using a width of 200 and height of 141. From the toolbox, select ROTATE, set the angle to +28 degrees and select ANY ANGLE (Figure 2) to start the process.


Figure 2

Save it as YMMind-Rotate+28.raw and listen with:

Play16 FREQ=9600 YMMind-Rotate+28.raw

This is a very drastic change, but not very useful, unless you like raunchy sounds like this. Reload the original sound, or use ImageFX's undo function and try the ROTATE effect with an angle of -90 degrees (Figure 2). Save it as YMMind-Rotate-90.raw and listen:

Play16 FREQ=9600 YMMind-Rotate-90.raw

This is still a drastic change from the original speaking voice, but the change in rotation angle gives a much nicer result for this sound. If you needed a good electricity sound then it could be a very useful one.

The Laser-Copter
So far, we have only used the image processing software to perform effects on already existing sounds. We can also use it to generate new sounds from scratch.

To begin, we need to create a buffer to store the image that we are going to create. From the toolbox, select BUFFER and then CREATE BUFFER. Set the width to 1060, height to 226 and make it a GREYSCALE buffer (Figure 3). The width and height values given here determine the length of the sound file, so 1060 x 226 = 239560 bytes. At a playback rate of 16780, that will give us 14.28 seconds of sound.


Figure 3

Now that we have our buffer ready, lets create something to put in it. From the toolbox, select ALPHA and then CREATE. From the Create Alpha Channel requester select WAVE GENERATOR. Set the Wavelength to 30, Center X to 530, Center Y to 113, Angle to 0 and Dampening to 0 (Figure 4).


Figure 4

This function generates the wave in an alpha channel buffer, so when the function is complete we need to copy it to our image buffer so we can see it. To do this, from the toolbox select ALPHA and then COPY FROM ALPHA (Figure 5). Save the result as LaserCopter30.raw but do not listen to it just yet.


Figure 5

You can alter the pitch of the sound by changing the Wavelength value of the WAVE GENERATOR requester. Setting it to a higher value will generate a sound with a lower pitch; while a lower value will result in a higher pitched sound. Try the process again with a Wavelength of 5 and save it as LaserCopter5.raw. You can also change the pitch by using Play16's FREQ parameter, but this would change the length of the sound as well.

Signed Vs. Unsigned Data
Another thing to understand is that image processing software works with its data as unsigned values. That means that all values are greater than zero. This is generally the way that all image data is stored, regardless of the software being used. However, this is not necessarily the case for audio data. On platforms such as the Amiga and SGI, audio data is stored as signed with positive and negative values. While on the MAC, PC and ST, data is stored as unsigned values. It is important to keep this in mind when playing back sounds because they can sound very different if played wrong.

The Laser-Copter sound that we just generated demonstrates this very well. The image processing software will create the image with all positive values. But, since the Amiga uses the signed data format for audio, Play16 will use that by default when playing back the sound. For this example, that assumption would be wrong. Try it:

Play16 FREQ=16780 LaserCopter30.raw


Figure 6

To hear the sound the proper way, it is necessary to tell Play16 that the data format in the raw file is unsigned and should be played back as such:

Play16 FREQ=16780 LaserCopter30.raw UNSIGNED


Figure 7

If you compare the audio waveforms in Figures 6 and 7 with the image in Figure 5 you can see that Figure 7 is the closer representation.

Instead of using the UNSIGNED option of Play16 you could first use SOX to convert the file.

SOX -u -b -r1000 LaserCopter30.raw -s LaserCopter30-2.raw

will do the conversion. The -u indicates the source file is in unsigned format, -b indicates that the source is in 8-bit byte format, -s indicates the destination file should be in signed format.

Note that SOX requires that a rate (-r) be given for the source file. For RAW data this value is not used, so it does not matter what value you use here. Refer to the SOX documentation for more details.

Having converted the format already, it is no longer necessary to specify the UNSIGNED option. So, this will now work:

Play16 FREQ=16780 LaserCopter30-2.raw

The Big Finish

These unique processes can produce some very useful results. But like any great experimental technique, it takes time and patience to get to the really good results. The best things in life come from hard work, and this is no exception.

In many cases, it may be necessary to clean up the created sounds by using audio waveform editing software to filter unwanted noise or cut bad sections. Sometimes, even just a small section of a processed sound is enough to make a terrific percussion sound or some strange instrument.

Use the results of this experimentation as a starting point. Further experiment with the ideas presented here and with the many other image processing effects available. Above all else, have fun! If you come up with anything particularly exciting please drop me an email.