Tutorial 6: SoundFont structure

The SoundFont structure

The overall structure of a SoundFont is described above. The Preset (also often referred to as an ”instrument”,
a “program”, or a “patch”) is the feature that is visible to the outside. Presets are combined into Banks. Each Bank can hold 128 Presets and these are numbered either from 0 to 127 (or 1 to 128 - SynthFont uses the range 0-127). There can be 128 Banks (numbered 1-128 or 0-127). Hence the total number of Presets in a SoundFont file is large enough. Very few SoundFonts have more than a few Banks in use. Usually these are “variation banks”, .i.e there may be a slightly different Acoustic Piano in Bank 1 Preset 0 (1:0) than in 0:0. Banks 0 to 127 are called the Melodic banks while bank 128 is reserved for Percussion presets. The MIDI standard defines MIDI channel 9 (on the scale 0 to 15) as the Percussion channel and hence all MIDI Programs in this channel will automatically call for a preset in bank 128.

A Preset may have one or more Layers. In the simplest case there is only one Layer, which then has to cover the whole key range and also the whole velocity range. The sound data used for a Layer is called an Instrument. The Instrument structure usually contains all the information needed to create the sound. Hence, in the simplest case, the Preset has only one Layer, referring to one single Instrument. NOTE: Instruments are pooled, meaning that several Presets may share the same Instrument. This is cost effective but also a risk. When you edit an Instrument, be sure to check what happens with ALL Presets making use of this Instrument.

A Layer may contain parameters (usually called “generators”) that can alter the sound of an Instrument. So, although two Presets may share one single Instrument, they may sound differently due to the Layer’s data.

Layers are often used to separate key and velocity ranges, but this is no rule. Many Presets are “artificially” created by layering several Instrument sounds on top of each other. Another very common use is to add stereo to the sound with Layers. The wave samples used in a SoundFont file are always monophonic. A stereophonic wave sample is hence split up into two monophonic wave samples (labeled e.g. “Horn (L)” and “Horn (R)”) and Instruments may be created for each these horn sounds. In the layers, these two horn Instruments are then included on top of each other, but panned hard left and right. Artificial stereo may be created in the same way by having one instrument in two layers slight tuned off up and down (and panned, of course).

Instruments, again, are structured in Splits. Usually, a split is used for a certain key and/or velocity range. A split always uses one single wave sample for the sound and has generators that can alter the sound. Instruments often need to make use of several splits and samples for a particular sound. For a real world instrument, the pitch range is so large that it cannot be recreated in a synthesizer just be manipulating one single sample. Hence, many samples are recorded and assigned a root note. Typically one sample can cover a key range equal to root note ± 5 semitones. Velocity splits are also common. Typically the same sample is used, but with different frequency filter and envelope parameters.

One sample may be used by many splits and Instruments. Again, this is cost effective but potentially a risk. If you replace one sample with another then it may have unexpected side-effects.

All in all, it is important to understand that there are MANY parameters (usually called “generator) that are involved when creating a Preset sound, and it may be a very CPU power consuming task.

SynthFont[1] tutorial, part 6