AUDIO HEADERS AND ROUTINES FOR  ANALYSING THEM

R.E.E.Robinson
Speech, Hearing, and Language Research Centre
Macquarie University

INTRODUCTION


There are many different Audio File Formats. Some are historical, some are proprietary, some are our own. These usually grow or change randomly, and new ones appear. We can use tools provided, updated, or withdrawn at other peoples discretion. Alternatively we can use our own tools, which can be changed to suit our needs and are not dictated by other peoples requirements. We have the source code and a competent programmer can use or change these to suit our needs. Sometimes a script may be an alternative, which will be determined by execution speed and versatility.

To make these routines easy to use, they are modular, perform only one function, are self descriptive, have built in debugging and take a variety of inputs. They are updated from time to time as needs dictate.

These routines grew from a need to play audio files in 1990 on the Sun workstation under the SunOs 4.1 operating system and were extended to Solaris and other Sun architectures. Our first Sun had a primitive audio device that performed poorly, did nothing like the the maker claimed, and indeed the maker didn't know what their audio device did. The routines were then extended to bypass the restrictive licensing systems imposed by software companies.

The existing routines read WAVE, ESPS, SSFF, AU, and TEXT file headers and write ESPS and WAVE file headers. There is an additional routine that determines the file type. The routines have self explanatory names and are:

read_au_file_header
read_esps_file_header
read_ssff_file_header
read_text_file_header
read_wav_file_header
read_file_type
write_esps_sd_file_header
write_wav_file_header

More routines may be added as required.

HOW TO USE THE HEADERS

The routines can be used in a program that you write. They are written in the program language called C and
have been compiled to objects ready for linking. They are located in the /home/accounts/ray/headers directory. Simply call them by name and declare the parameters. They require some information, and they return other information. The source code has comments that explain the routine's arguments, the internal workings, and the modification history.

If you wish to change these routines, add a comment at the top to declare your change, the author, and the date. Use the compile script in the same directory. It has the same name as the routine with a c prepended. For example, if you change the read_file_type.c routine, then use the cread_file_type script to compile it. It is a good idea to rename a copy of the old file, or to work in your own directory, just in case an unrecoverable error is made.

When used, the read routine will open the file, analyse the header, close the file, and return the information. No attempt was made to keep the file open, as returning file pointers is less flexible, narrows the programmers choice, and incurs no speed penalty with modern computers and operating systems, as the files are usually cached and a further disk access and delay does not occur.

The routines are called in this way:

read_wav_file_header (arg1, arg2, arg3, arg4, arg5);

Where the first argument is the file name, the second is a buffer, and the following arguments are header dependent. The last argument is the debug flag. The arguments are not changed in size or length but are merely extracted from the header and returned as they are. The programmer may change them or not, as required.

The routines allow files to be opened in the normal way, or for files to be piped to the main program. This complicates matters a little. When using a pipe, the file is read character by character from Stdin. When the read_file_type routine is called, it processes the first 32 characters of the file to determine the type. Is is difficult to rewind the Stdin file, so this information is stored in a buffer and returned by the routine. This buffer can then be passed to the other read header routines for processing. In this way the information is preserved. Argument two is the buffer, and is only used for pipes.

The debug flag is normally set to zero and the routine performs silently, unless there is an error. If the debug flag is set to one, then the analysis is reported step by step and the information can be examined for errors. Each line of information also contains the routine from which the debugging information comes from. The debug flag can be controlled from the command line, providing the calling routine passes it on. The routines will report information like:

read_wav_file_header: opened file speech8.wav
read_wav_file_header: read 12 bytes from riff chunk
read_wav_file_header: found RIFF id ok
read_wav_file_header: found WAVE id ok
read_wav_file_header: riff length is 26878
read_wav_file_header: found (fmt ) chunk of length 16
read_wav_file_header: format length is 16
read_wav_file_header: data type is 1
read_wav_file_header: number of channels is 1
read_wav_file_header: sample rate is 8000

EXAMPLES

The source code of p_codec.c shows an application that uses several of these reading routines in the one program. This is located in /home/accounts/ray/codec. The source code of esps2wav.c and wav2esps.c shows two applications that use the writing routines for conversion between audio formats. They are located in /home/accounts/ray/wav.

DETERMINING FILE TYPE

There is a routine to find out what type of file we are dealing with and will return several different file types. Call the routine this way:

read_file_type (filename, &type, buf, debug);

These are declared in the calling program as:
char *filename;
unsigned char buf[32];
int type, debug;

where:
The first argument called filename is the name of the file to be examined. It can contain a valid file name like speech.sd or a pipe name like Stdin.
The second argument called type is initially empty and the type of file is returned. It will return a number where: 3 = au, 4 = wave, 7 = ESPS, 12 = text, 13 = ssff, or 0 = unknown.  Future file types will be: 1= headerless,  2= AFsp,  5 = AIFF-C,  6 = NIST SPHERE, 8 = IRCAM, 9 = SPPACK, 10 = INRS-Telecom, and 11 = AIFF.
The third argument called buf is initially empty and is returned empty for a normal file call. It is returned filled with the first 32 characters of the file upon return from a Stdin filename call.
The fourth argument called debug is used to report analysis information. Set it to zero for normal silent operation. Setting it to one to turn on debugging messages, may result in output like:

read_file_type: opened file bong.au
read_file_type: read 32 bytes
read_file_type: found AU id

Put debugging code in the calling program,  which can be activated from the command line, for example:
if (debug==1) { if(type==11) printf("AIFF audio file type\n"); }
as this will complement the debugging code in the routine, and will aid debugging during program writing and error diagnosis in the future.

SUN FILES (AU)

There is a routine to read the header of a Sun type audio file. These normally have an extension of .au but may also have extensions like .snd sometimes. This routine will return several different parameters extracted from the header. It is a simple header. Call the routine this way:

read_au_file_header (filename, buf, &offset, &frequency, &channels, &data_length, &coding, debug);

These are declared in the calling program as:
char *filename;
unsigned char buf[32];
int offset, frequency, channels, data_length, coding, debug;

where:
The first argument called filename is the name of the file to be examined. It can contain a valid file name like speech.au or a pipe name like Stdin.
The second argument called buf is empty for a normal file call and this routine ignores it. If this routine is called after determining the file type from a pipe or Stdin filename, it then contains the first 32 chars of the file, and the buffer contents will be the first part of the header analysed.
In the third argument called offset is returned the number of bytes to skip to the first data byte in the file. It is actually the header length.
In the fourth argument called frequency is returned the sample frequency of the data in the audio file.
In the fifth argument called channels is returned the number of data channels in the file, which would normally be 1, or maybe 2 for dual channel files.
In the sixth argument called data_length is the size of the data portion of the file. This is optional and so it may sometimes contain zero.
In the seventh argument called coding is returned the type of data in the file. The type is a number where: MULAW_8 = 1, LINEAR_8 = 2, LINEAR_16 = 3, LINEAR_24 = 4, LINEAR_32 = 5, AFLOAT = 6, ADOUBLE = 7.
The eighth argument called debug is used to report analysis information. Set it to zero for normal silent operation. Setting it to one, to turn on debugging messages, may result in an output like:

read_au_file_header: opened file bong.au
read_au_file_header: format = au file
read_au_file_header: offset = 48
read_au_file_header: data length = 12446
read_au_file_header: coding = 1
read_au_file_header: frequency = 8000
read_au_file_header: channels = 1

The standard structure of au file headers is located in /home/apps/s32cdsp/include/audio.h

ESPS FILES (SD)

READING AN ESPS FILE HEADER

There is a routine to read the header of an ESPS/WAVES type file. These normally have an extension of .sd but may also have extensions like .d sometimes.  The ESPS header is very complicated and may have many headers, as an additional header is added each time an ESPS routine processes the file. This routine will return several different parameters extracted from the header. This routine does not require a license. Call the routine this way:

read_esps_file_header (filename, buf, &offset, &Dfreq, &Dstart, &record_size, &leng, &mach, &nd, &nf, &nl, &ns, &nc, columns, debug);

These are declared in the calling program as:
char *filename, columns[6000];
unsigned char buf[32];
int offset, record_size, leng, mach, debug;
double Dfreq, Dstart, nd, nf, nl, ns, nc;

where:
The first argument called filename is the name of the file to be examined. It can contain a valid file name like speech.sd or a pipe name like Stdin.
The second argument called buf is empty for a normal file call and this routine ignores it. If this routine is called after determining the file type from a pipe or Stdin filename, it then contains the first 32 chars of the file, and the buffer contents will be the first part of the header analysed.
In the third argument called offset is returned the number of bytes to skip to the first data byte in the file. It is actually the header length.
In the fourth argument called Dfreq is returned the sample frequency.
In the fifth argument called Dstart is returned the start time of the file, which is useful if the file was created as a slice of another file, as it can be related to the original file. It is usually zero.
In the sixth argument called record_size is returned the size of the the data that was sampled at the sample clock. For example, a single channel 8 bit sampled file would have a record size of 1, whereas a 10 channel 16 bit sampled file would have a record size of 20.
In the seventh argument called leng is returned the number of data records.
In the eighth argument called mach is returned a number that indicates what hardware was used to originally record this file. The number can be: MASSCOMP_CODE =  1, SUN3_CODE =   2, CONVEX_CODE = 3, SUN4_CODE = 4, HP300_CODE = 5, SUN386i_CODE = 6, DS3100_CODE = 7, MACII_CODE = 8, SG_CODE =  9, HP800_CODE = 10, VAX_CODE = 11, DG_AVIION_CODE =  12, APOLLO_68K_CODE = 13, APOLLO_10000_CODE = 14, HP400_CODE = 15, CRAY_CODE = 16, SONY_RISC_CODE = 17, SONY_68K_CODE = 18, STARDENT_3000_CODE =   19, IBM_RS6000_CODE = 20, HP700_CODE = 21, DEC_ALPHA_CODE =  22, SOLARIS_86_CODE = 23, LINUX_CODE = 24, or UNKNOWN_CODE = 99.
In the ninth argument called nd is returned the number of data records that contain type double.
In the tenth argument called nf is returned the number of data records that contain type float.
In the eleventh argument called nl is returned the number of data records that contain type long.
In the twelfth argument called ns is returned the number of data records that contain type short.
In the thirteenth argument called ns is returned the number of data records that contain type char.
In the fourteenth argument called columns is returned any text that is found in the header.
The fifteenth argument called debug is used to report analysis information. Set it to zero for normal silent operation. Setting it to one, to turn on debugging messages, may result in an output like:

read_esps_file_header: opened file speech8.sd
read_esps_file_header: reading the file preamble
read_esps_file_header: read 32 bytes from the preamble
read_esps_file_header: data offset 3333 bytes
read_esps_file_header: record size 2 bytes
read_esps_file_header: machine code is 4
read_esps_file_header: read 3301 bytes from the header
read_esps_file_header: 0 doubles in record
read_esps_file_header: 0 floats in record
read_esps_file_header: 0 longs in record
read_esps_file_header: 1 shorts in record
read_esps_file_header: 0 chars in record
read_esps_file_header: 13421 data records in file
read_esps_file_header: FT_FEA file type
read_esps_file_header: frequency=8000.000000
read_esps_file_header: start time=0.000000
read_esps_file_header: TYPTXT comment added by parker:  There's usually a valve.
read_esps_file_header: COMMENT sdtofea - speech.sd
read_esps_file_header: CWD sparc1:/sun4_home2/production/products/esps.sun4/demo
read_esps_file_header: SOURCE <stdin>

The standard structure of ESPS file headers is located in /usr/esps/include/esps/header.h

WRITING ESPS FILE HEADER

There is a routine to write the header of an ESPS type sd file. The ESPS header is very complicated with many fields of different sizes and types. This routine will fill those fields from the supplied parameters and return the header size. This routine does not require a license. Call the routine this way:

write_esps_sd_file_header (outname, &offset, &size, &frequency, &nrec, debug);

These are declared in the calling program as:
char outname[20];
int offset, size, frequency, debug;
long nrec;

where:
The first argument called filename is the name of the file to be examined. It can contain a valid file name like speech.sd.
In the second argument called offset is returned the number of bytes to skip to the first data byte in the file. It is actually the header length.
The third argument called size is the size of the data record. It assumes that the data will always be shorts and thus size will be 2 for a single channel file or 4 for a dual channel file.
The fourth argument called frequency is the sample frequency.
The fifth argument called nrec is the number of data records in the file.
The sixth argument called debug is used to report information. Set it to zero for normal silent operation. Setting it to one, to turn on debugging messages, may result in an output like:

write_esps_sd_file_header: filename=AH.sd
write_esps_sd_file_header: offset=44
write_esps_sd_file_header: size=2
write_esps_sd_file_header: frequency=22050
write_esps_sd_file_header: number of records=34714
write_esps_sd_file_header: opened file AH.sd ok
write_esps_sd_file_header: filling the ESPS FEA file header preamble
write_esps_sd_file_header: filling the ESPS FEA file dummy header fields
write_esps_sd_file_header: filling the ESPS FEA file header common
write_esps_sd_file_header: Wrote ESPS file preamble of 32 bytes
write_esps_sd_file_header: Wrote ESPS file header of 441 bytes
write_esps_sd_file_header: offset=473

The standard structure of ESPS file headers is located in /usr/esps/include/esps/header.h

The calling program uses this routine to create a header with all the appropriate fields correctly filled, and no actual data in the file. The calling program will then reopen the file and append the audio data.

WAVE FILES (WAV)

READING A WAV FILE HEADER

There is a routine to read the header of an WAV type file. These normally have an extension of .wav but some NIST-SPHERE files also use this extension. The original WAV files had 3 parts called chunks, the Wave Chunk, Format Chunk, and Data Chunk. Microsoft has added a 4th chunk called the Fact Chunk. The WAV header was originally fairly simple, but many software writers have added undocumented variable size additions and sometimes extended the existing standard chunks. This routine will handle all WAV files of old and new types and also skip any non-standard chunks. It will return several different parameters extracted from the header. Call the routine this way:

read_wav_file_header (filename, buf, &offset, &dtype, &channels, &frequency, &drate, &nbytes, &nbits, &dlength, debug);

These are declared in the calling program as:
char *filename;
unsigned char buf[32];
int offset, dtype, channels, frequency, drate, nbytes, nbits, dlength, debug;

where:
The first argument called filename is the name of the file to be examined. It can contain a valid file name like speech.wav or a pipe name like Stdin.
The second argument called buf is empty for a normal file call and this routine ignores it. If this routine is called after determining the file type from a pipe or Stdin filename, it then contains the first 32 chars of the file, and the buffer contents will be the first part of the header analysed.
In the third argument called offset is returned the number of bytes to skip to the first data byte in the file. It is actually the header length.
In the fourth argument called dtype is returned the type of data in the file. The data type can be: 1 = PCM, 0x0101 = MU_LAW, 0x0102 = A_LAW, 0x0103 = ADPCM.
In the fifth argument called channels is returned the number of data channels in the file, which would normally be 1, or maybe 2 for dual channel files.
In the sixth argument called frequency is returned the sample frequency.
In the seventh argument called drate is returned the data rate of the file in bytes per second.
In the eighth argument called nbytes is returned the number of bytes sampled at the clock rate.
In the ninth argument called nbits is returned the size of the data sample.
In the tenth argument called dlength is returned the size of the data in the file.
The eleventh argument called debug is used to report analysis information. Set it to zero for normal silent operation. Setting it to one, to turn on debugging messages, may result in an output like:

read_wav_file_header: opened file newjwing.wav
read_wav_file_header: read 12 bytes from riff chunk
read_wav_file_header: found RIFF id ok
read_wav_file_header: found WAVE id ok
read_wav_file_header: riff length is 237606
read_wav_file_header: found (fmt ) chunk of length 16
read_wav_file_header: format length is 16
read_wav_file_header: data type is 1
read_wav_file_header: number of channels is 2
read_wav_file_header: sample rate is 22050
read_wav_file_header: data rate is 44100 bytes per second
read_wav_file_header: bytes per sample is 2
read_wav_file_header: bits per sample is 8
read_wav_file_header: found (data) chunk of length 237570
read_wav_file_header: offset is 44

WRITING A WAV FILE HEADER

There is a routine to write the header of a wav type file. It writes the WAV file with 3 parts called chunks, the Wave Chunk, Format Chunk, and Data Chunk. Call the routine this way:

write_wav_file_header (outname, &woffset, &dtype, &channels, &frequency, &drate, &nbytes, &nbits, &dlength, debug);
 

These are declared in the calling program as:
char outname[20];
int woffset, dtype, channels, frequency, drate, nbytes, nbits, debug;
long dlength;

where:
The first argument called outname is the name of the file to be examined. It can contain a valid file name like speech.wav.
In the second argument called woffset is returned the number of bytes to skip to the first data byte in the file. It is actually the header length.
The third argument called dtype is the type of coding used in the file. The coding can be: 1 = PCM, 0x0101 = MU_LAW, 0x0102 = A_LAW, 0x0103 = ADPCM.
The fourth argument called channels is the number of data channels in the file, which would normally be 1, or maybe 2 for dual channel files.
The fifth argument called frequency is the sample frequency.
In the sixth argument called drate is the data rate of the file in bytes per second.
In the seventh argument called nbytes is the number of bytes sampled at the clock rate.
In the eighth argument called nbits is the size of the data sample.
In the ninth argument called dlength is the size of the data in the file.
The tenth argument called debug is used to report analysis information. Set it to zero for normal silent operation. Setting it to one, to turn on debugging messages, may result in an output like:

write_wav_file_header: filename is speech21.wav
write_wav_file_header: offset is 44
write_wav_file_header: format length is 16
write_wav_file_header: data type is 1
write_wav_file_header: number of channels is 1
write_wav_file_header: sample rate is 20000
write_wav_file_header: data rate is 40000 bytes per second
write_wav_file_header: bytes per sample is 2
write_wav_file_header: bits per sample is 16
write_wav_file_header: data length is 384000
write_wav_file_header: opened file speech21.wav ok
write_wav_file_header: filling the riff chunk
write_wav_file_header: Wrote wav file riff chunk of 12 bytes
write_wav_file_header: filling the fmt chunk
write_wav_file_header: Wrote wav file fmt chunk of 24 bytes
write_wav_file_header: filling the data chunk
write_wav_file_header: Wrote wav file data chunk of 8 bytes

The calling program uses this routine to create a header with all the appropriate fields correctly filled, and no actual data in the file. The calling program will then reopen the file and append the audio data.

MACQUARIE UNIVERSITY FILES (SSFF)

There is a routine to read the header of a Macquarie University type audio file. These normally have an extension of .SSFF_sd but may also have other extensions. This routine will return several different parameters extracted from the header. It is a simple text based header using key strings. Call the routine this way:

read_ssff_file_header (filename, buf, &offset, &freq, &stime, machine, columns, debug);

These are declared in the calling program as:
char *filename, columns[6000], machine[20];
unsigned char buf[32];
int offset, debug;
float freq, stime;

where:
The first argument called filename is the name of the file to be examined. It can contain a valid file name like speech.au or a pipe name like Stdin.
The second argument called buf is empty for a normal file call and this routine ignores it. If this routine is called after determining the file type from a pipe or Stdin filename, it then contains the first 32 chars of the file, and the buffer contents will be the first part of the header analysed.
In the third argument called offset is returned the number of bytes to skip to the first data byte in the file. It is actually the header length.
In the fourth argument called freq is returned the sample frequency of the data in the audio file.
In the fifth argument called stime is returned the start time of the file, which is useful if the file was created as a slice of another file, as it can be related to the original file. It is usually zero.
In the sixth argument called machine is returned the type of hardware that the file was recorded on. This will determine if any byte swapping is required.
In the seventh argument called columns is returned any other text that is found in the header.
The eighth argument called debug is used to report analysis information. Set it to zero for normal silent operation. Setting it to one, to turn on debugging messages, may result in an output like:

read_ssff_file_header: opened file msajc001.SSFF_sd
read_ssff_file_header: format = ssff file
read_ssff_file_header: machine = SPARC
read_ssff_file_header: start = 0.000000
read_ssff_file_header: frequency = 20000.000000
read_ssff_file_header: offset = 177
read_ssff_file_header: strings = Comment CHAR Created by CONV
Column samples SHORT 1
max_value DOUBLE 13975.000000

This routine may need to be extended to extract the data type and number of channels.
The standard structure of SSFF file headers is located in /home/publish/src/mu-plus/SSFF/ssff_formats.h

TEXT FILES (TA)

There is a routine to read the header of a text type audio file. These normally have an extension of .ta but may also have other extensions. This routine will return several different parameters extracted from the header. It is a simple text based header using key strings. Call the routine this way:

read_text_file_header (filename, &offset, &freq, debug);

These are declared in the calling program as:
char *filename;
int offset, debug;
float freq;

where:
The first argument called filename is the name of the file to be examined. It can contain a valid file name like speech.ta.
In the second argument called offset is returned the number of bytes to skip to the first data byte in the file. It is actually the header length.
In the third argument called freq is returned the sample frequency of the data in the audio file.
The fourth argument called debug is used to report analysis information. Set it to zero for normal silent operation. Setting it to one, to turn on debugging messages, may result in an output like:

read_text_file_header: opened file bab.ta
read_text_file_header: format = audio text file
read_text_file_header: offset = 918
read_text_file_header: frequency = 19980.000000

This routine may need to be extended to handle Stdin and adding a buff argument.

CONCLUDING REMARKS

These are useful routines for programmers that handle several different types of audio file headers. As with all routines, they have been written to meet published formats, have been tested on many different audio files and have succeeded in all tests. As files change, the formats may cause these routines to stop working correctly. If so, the routines will be corrected. Please report any problems.