goto section page goto youngmonkey main page make contact, e-mail


Image Processing Techniques for Audio - Part 1
Advertising Space Available

21
Author:dhomas trenn
Published by:CU-Amiga magazine (UK)
Date:November 1997

Image processing software provides all sorts of interesting functions for manipulating images: morphing, rippling, convolutions, etc. What if we could apply some of these processes to audio files? It is a great idea, but if you try to get an image processor to load in a standard format audio file it is not surprising that it will fail. But that does not mean it is impossible...

From the computer's point of view, image data and audio data are no different from each other. It is all just numbers to a computer. As far as the computer is concerned, you could just as easily listen to an image file as look at one. Unfortunately though very little, if any, software allows you to directly do these kinds of things. But, with a little creative exploration it can be done.

Before we get started, it is important to understand some of the inner-workings of this process. There is one major difference between audio and image data. Audio files are single dimensional, in that they are played back one data item (sample) at a time in a continuous stream (width). Images, however, are two dimensional in that they have width and height and are displayed as such. In order to relate the two, we need to understand how audio and image processing software handles data differently.

For discussion purposes, we will use a small sampling of raw data (Figure 1). As audio data, this is processed as a single stream of data played back from left to right. In order to use this as image data though, it needs to have width and height. We want to be sure to use all the data, so it is necessary to choose our dimensions appropriately.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Figure 1

We will start with a simple example that simulates the single dimension of audio data to recreate a simple reverse waveform audio effect. To do so, we will use width = 16 and height = 1. The image processor's equivalent of reverse waveform is the flip function. A horizontal (left-right) flip of the data in Figure 1 will result in the data in Figure 2. This gives us the reversed waveform that we wanted. Note that a vertical flip would not have altered the data at all. Alternatively, we could have used width = 1 and height = 16 with a vertical flip, to achieve the same effect. Read on to find out why.

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Figure 2

That was interesting but not very exciting or new. We could do this much easier with most audio waveform editing software. To better demonstrate the usefulness of using image processing in this way, we will start again, but with width = 4 and height = 4. When loading the raw data in Figure 1, the image processor will arrange the data in a table based on the dimensions you give it. Data will be read in column by column, skipping to the next row when the full width is reached. If there is insufficient data, the remaining will be filled with zeroes. If there is too much data, the extra will be ignored. In this case, we have just the right amount of data, arranged as in Figure 3.

0123
4567
891011
12131415

Figure 3

Because the data is now arranged two dimensionally, a horizontal flip (a single dimension operation) will have a different effect, resulting in Figure 4. If we were to then reorganize or store this data one dimensionally, we would have data as in Figure 5. This will cause some interesting things to happen to the sound. Instead of reversing the whole waveform as before, it divides it into four separate parts with four values in each and reverses each part individually, all in one simple step. To do the same thing with an audio waveform editor would have required four reverse processes each with a different set range.

3210
7654
111098
15141312

Figure 4

3 2 1 0 7 6 5 4 11 10 9 8 15 14 13 12

Figure 5

In order to do a proper waveform reverse with data organized in two dimensions, it is necessary to flip both horizontally and vertically. If we now flip the data from Figure 4 vertically (up-down), we get Figure 6. Storing this data would give a reverse waveform, again, as in Figure 2.

15141312
111098
7654
3210

Figure 6

Now that we have a better understanding of what happens internally, we will try working with some real data. For this tutorial, we will use ImageFX 1.52 from the coverdisk of CU-Amiga - June 1995 to do the data manipulation. However, the methods described here can be easily applied to most image processing programs. We will also be using two audio utilities: Play16 for audio playback and SOX for audio format conversion.

STEP 1: Preparing the audio file.

The problem is that it is very unlikely that you will find an AIFF, WAV or other audio file format loader or saver in an image processor. It is also unlikely that you will find an image data loader or saver in an audio processor. Fortunately, there is a storage format that is common between audio and image data formats and that is RAW.

We have included a sample IFF audio file for this tutorial: Piano.iff (Figure 7). It is an 8-bit sound with a sampling rate of 16,780. To begin, we need to convert the sound file to RAW format. SOX is smart enough that it will recognize what we want it to do, so from the CLI:

SOX Piano.iff Piano.raw

is sufficient. This will result in the creation of a new file called Piano.raw.

Fig.7

Figure 7

STEP 2: Loading the audio file.

Most image processors do not specifically contain a RAW module, and ImageFX is no exception. The SCULPT image format, however, is a RAW data format and ImageFX contains two SCULPT modules: SCULPT GREY and SCULPT RGB. The difference between the two is that SCULPT GREY is an 8-bit data format while SCULPT RGB is 24-bit. Because our sample audio file is 8-bit, we will be working in greyscale.

Fig.8

Figure 8

Load Piano.raw. When prompted for the format, select SCULPT GREY (Figure 8). Because the RAW image format does not contain the dimensions, ImageFX will prompt you for the width and height. It is important to use dimensions that are large enough to accommodate the entire audio file. Generally, we want dimensions such that WIDTH * HEIGHT = RAW BYTES. Piano.raw is 86,700 bytes so enter 300 for the width and 289 for the height (Figure 9).

Fig.9

Figure 9

We have instructed the software to load this audio data as an 8-bit greyscale image, so the data will be represented as shades of grey. When loaded as an image, the Piano sound looks very different (Figure 10).

Fig.10

Figure 10

STEP 3: Processing the loaded data.

To perform a reverse process on the sound we need to use the horizontal and vertical flip transformations, as explained earlier. Select TRANSFORM (Figure 11) and then FLIP HORIZONTAL. Then TRANSFORM, again, and FLIP VERTICAL.

Fig.11

Figure 11

STEP 4: Saving the data.

In order to playback the new reversed sound we need to save it as a RAW data file. Select SAVE and then SCULPT for the Save Format. ImageFX knows that this is a greyscale image so it will use the appropriate SCULPT GREY 8-bit format. Select SAVE AS and name the file Piano-Backward.raw.

STEP 5: Listening to the modified sound.

We could convert the new RAW file to IFF format using SOX before playing it back, but it is not necessary. However, because the sound is in RAW format we have to tell the audio player what sampling rate to play it back at.

Play16 FREQ=16780 Piano-Backward.raw.grey

will do what we need. We have used the original sampling rate of Piano.iff for reference here, but you can try whatever rate you want. Note that ImageFX automatically appends a suffix ".grey" to the name you give it.

STEP 6: The weird and potentially wonderful.

Earlier we talked about the strange effect that happens if we only flip in one direction. To hear it, do one more FLIP VERTICAL transformation and save it as Piano-Horizontal.raw. Because we have already done a vertical transformation once, we are now flipping it back again. Effectively, doing just a horizontal flip. To hear this weird thing:

Play16 FREQ=16780 Piano-Horizontal.raw.grey

We have only touched on the basic procedure here. Next time, we will delve a little more in depth into this process. In the mean time, explore this technique further and if you come up with anything particularly exciting please drop me an email. These ideas are presented only as a guide to possibilities. Hopefully they will be used as a starting point to something new and wonderful.