Music Synthesis

microfarad
N00b

Posts: 25

Music Synthesis Jul 24, 2012 2:13:05 GMT -5

Quote

Post by microfarad on Jul 24, 2012 2:13:05 GMT -5

This thread will be about how to synthesize music with programs. It shouldn't be language specific and users should avoid relying on code whenever possible so that people who can't even program can still get something out of this.

Synthesis of simple sine wave:
Programs that synthesize sine waves should avoid the simple approach of sin(2*pi*frequency*time) because if you change the frequency while generating the note there will be a distinct jump in the amplitude level at that point which sounds like an annoying pop. If you steadily change the frequency over time the side effects are even worse and more confusing, with wild pitch changes resulting. Programs generating sine waves should instead maintain a domain variable which is incremented by 2*pi*frequency/framerate every frame. The amplitude of the synth for any given moment can be found by taking sin(domain). In practice, the precision of floating point numbers decreases steadily as the value contained increases. If your domain variable simply continues to grow slowly over time, the quality of the tone will get worse and worse. To avoid this, you will need to find a way to lower the domain variable every once and a while without causing a pop.

Useful waves:
Saw, ramp, triangle, and square waves can also be synthesized. The formulas for these periodic functions are relatively easy to figure out. Many of the tips that apply to sine wave synthesis apply to these waves.

Additive synthesis:
Add waves together, see what happens. Add harmonics to change the voice of an instrument.

AM synthesis:
Multiply one wave by another. Can be interesting!

FM synthesis:
Modulate the frequency of one wave slightly (or a lot) with another wave. If the frequency of the second wave is low this sounds like vibrato. If it is high, this can make new and interesting noises.

Bells, Dings, Xylophones, Bips, etc:
Similar to sine wave synthesis. Use a cosine wave instead. At the beginning of the sound, initialize the domain to 0. Initialize a volume variable to 1. Every frame, multiply the volume by a number just slightly less than one (like .9999) and store the result back into volume, causing exponential decay in the volume over time. The amplitude of the sound at any time can be found with cos(domain)*volume. To choose a half life for the decay (in seconds, assuming your framerate is measured in fps), the number you multiply the volume by should be (1/(framerate*halflife))^(1/2).

Filters:
Wikipedia has good articles on high and low pass filters. Low pass filters try to remove frequencies higher than the cutoff frequency from a signal while high pass filters do the opposite. Band pass filters try to eliminate all frequencies but those in a certain range, and can be made by running the output of a high pass filter through a low pass filter or visa versa. Adding the outputs of high and low pass filters will render a band reject filter which eliminates frequencies in a certain range. By modulating the cutoff frequency of a low pass filter with a very low frequency sine wave and running a deep bass wave through it you can get started making your own dubstep!

Goertzel algorithm:
The Goertzel Algorithm is a way to find the amplitude of a given frequency in an audio sample. This can be used for many things, most of which are not at all related to music. Still, it's good to know about. Read more on Wikipedia.

Envelope filter:
An envelope filter tries to outline the peaks of an audio signal to give a general idea of its volume throughout the signal.

Vocoder:
A vocoder can be used to make voices sound strange or robotic. Kraftwerk made extensive use of vocoders. ELO's Mr Blue Sky also has vocals that have been run through a vocoder. Vocoders are also used extensively in Sci-Fi movies. The basic idea is to find the envelopes for several (usually evenly spaced) frequencies in an audio track then use the envelopes to modulate filtered samples with corresponding frequency (bandpass) of a carrier wave. The Goertzel Algorithm or a Fast Fourier Transform will directly give these envelopes, but band pass filters and envelope filters can be used together to achieve the same end. In fact, because of the frequency response of some of the other algorithms, band pass filters are usually preferred.

White noise:
Simple random samples. Completely random samples. Very annoying.

Brown noise:
Every frame an amplitude variable (initialized to 0) is incremented or decremented by a small, random quantity. This variable will give you your brown noise. In practice, audio programs need to keep their values in a small range, like -1 to 1. In this case, the amplitude can simply be multiplied by a number slightly less than one every frame, pulling it in towards 0 as it jitters up and down, producing the static. Just to be sure, the amplitude should be tested every frame to ensure it still lies in an acceptable range. If not, it should be moved in. Numbers very close to one will give brown noise with more low frequency content. Numbers that are much less than one (depending on your framerate, maybe something like .99) will begin to filter out the low frequency content. An easy way to make ocean noises is to instead multiply the amplitude by a very very low frequency sine wave (like .1hz or less). When the sine wave is at 1 it sounds like a wave is crashing in. As it retreats to 0 the surf is running along the beach. As it falls below 0 the water is retreating, pulling in the sand and making that hissing noise.

Making sound in practice:
Sounds can be generated in .NET languages using DirectX (or so I hear, I've never done it myself). The Wave module can be used to write .wav files in Python as this language is too slow to write audio samples fast enough to be played directly through the speakers. Either SDL or PortAudio can be used with C (or C++?).

Soon I will talk about the Fourier Transform since the Wikipedia article is far too confusing for most people.

Last Edit: Aug 3, 2012 17:35:42 GMT -5 by microfarad

Draxor Premiumfag Posts: 403	Music Synthesis Jul 24, 2012 13:23:57 GMT -5 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Draxor on Jul 24, 2012 13:23:57 GMT -5 Making specific tone fluxes?

microfarad
N00b

Posts: 25

Music Synthesis Jul 24, 2012 18:43:47 GMT -5

Quote

Post by microfarad on Jul 24, 2012 18:43:47 GMT -5

What do you mean Drax?

The Discrete Fourier Transform (DFT) is a way to take an audio sample and analyze its frequency content. Here's the simple explanation:
You put in a list of numbers representing the amplitudes of a sound signal sampled at regular intervals.
You get out a list of numbers representing the amplitudes of different frequencies in that signal.
The first number in the output is the DC component, basically the average value of the signal.
The remaining values represent frequencies of k*framrate/N where k is the index of the value (first, second, third, etc.) and N is the number of input values.

Now that was the simple answer. The complicated answer is that the DFT doesn't really give you the amplitudes, it gives you a vector in the complex plane which describes both the phase and amplitude of the wave. In fact, the DFT usually accepts inputs of complex numbers, but I have simplified it for you.

Here are the formulas. Given that your input has N elements, the first formula gives you an array of k numbers representing the real part of the output, the second formula similarly gives the imaginary components, the third formula demonstrates how to calculate amplitude, the fourth formula demonstrates how to calculate phase.

By breaking your signal into smaller chunks, you can make your calculations faster. This is called the FFT (Fast Fourier Transform). The FFT is often used in audio software to show the user a pretty graph of the frequency content of their song over time. Audacity is one piece of software that offers this functionality. The FFT has tons of other uses in music, such as vocoders.

Last Edit: Jul 24, 2012 20:02:19 GMT -5 by microfarad

nmagane Banned im gay %7C ur mom %7C 420.smoke.weed %7C ~Nigga 4 Lyfe~ %7C hi dad %7C metal sucks I Am A Proud Neckbeard Come at Me Hater.[F4:tonsofdicks] Posts: 459	Music Synthesis Jul 24, 2012 19:21:21 GMT -5 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by nmagane on Jul 24, 2012 19:21:21 GMT -5 Synthesis thats a tongue twister
	8======D

Monokr0me Administrator [F4:monokr0me] Posts: 204	Music Synthesis Jul 28, 2012 22:24:22 GMT -5 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Monokr0me on Jul 28, 2012 22:24:22 GMT -5 Do you know anything about analyzing audio, up to and including speech recognition?

microfarad
N00b

Posts: 25

Music Synthesis Jul 31, 2012 3:08:54 GMT -5

Quote

Post by microfarad on Jul 31, 2012 3:08:54 GMT -5

A bit, but speech recognition is pretty complicated. You'll want to find some way to analyze frequency content, band pass filters and envelope followers or fourier transform. From that, try to distinguish the formant of the different vowels. Certainly look up the word diphones. Beyond that I am of no help.

microfarad
N00b

Posts: 25

Music Synthesis Aug 3, 2012 18:06:18 GMT -5

Quote

Post by microfarad on Aug 3, 2012 18:06:18 GMT -5

"Robot" voice sound effects:

I have recently been working on some robotic sounding voice effects. I wrote some programs that simply change voices on the fly using a microphone and headphones, so I can't currently process any audio samples for you to hear.

There are two very easy ways to make robotic voices.
A comb filter can be used to make some nice robotic voices, this was the kind of filter used with C-3PO in Star Wars. Be warned however, there are two different kinds of comb filters and they do produce slightly different effects. A feed forward comb filter takes and incoming signal and sends it through a delay and the delayed and un-delayed versions of the signal are mixed together (usually not at equal volume). A feed backward comb filter also uses a delay, but the source for the delay unit is not the incoming signal, but rather the output of the mixing stage (which also happens to be the output of the filter). A feed backward comb filter with a very long delay is no longer really a filter and makes, rather, a repeating echo effect. At short delays, however, the feed backward comb filter makes excellent robot noises when a voice is run through it. Wikipedia has some good diagrams on the difference between the two filters, I am not sure which one C-3PO uses.

Amplitude modulation (AM) can be used to make Dr. Who styled voice effects (Daleks and Cybermen). Just multiply your signal by a pure sine wave (the carrier wave). Daleks use a range of carrier wave frequencies ranging from the low twenties to something over 60hz, but the modern Daleks are usually about 25.64hz according to most sources. Cybermen are harder to find frequency information for, but I believe they run a carrier wave at about 150hz, and that sounds about right when I try it. I've also heard that a triangle wave makes a good carrier wave for this, but I haven't tried it. An analog implementation of amplitude modulation used for voice effect and musical purposes is usually in the form of a "ring modulator", so plenty of people have built Dalek voice changing circuits which are quite nice because you don't get the latency of a computer with one of those.

Vocoders are heavily used for sci-fi robots such as the old Battlestar Galactica series Cylons and the Star Wars Trade Federation battle droids. I've recently managed to implement a good vocoder to make these effects using band pass filters described here: www.musicdsp.org/files/Audio-EQ-Cookbook.txt
In addition to those band pass filters, I ran absolute valued waveforms through the simple low pass filter described on Wikipedia to achieve my envelope follower.

Combining effects can make great sounds too! I wrote a program which uses a feed backward comb filter as the input for AM synthesis (as described above) to get a very nice sound. The AM synthesis used a 150hz sine wave and the comb filter had a delay of 20ms with the portion being fed back at 70% the strength of the incoming signal.

Monokr0me Administrator [F4:monokr0me] Posts: 204	Music Synthesis Aug 3, 2012 19:25:43 GMT -5 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Monokr0me on Aug 3, 2012 19:25:43 GMT -5 Nice. What about text-to-speech?

microfarad N00b Posts: 25	Music Synthesis Aug 4, 2012 0:11:55 GMT -5 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by microfarad on Aug 4, 2012 0:11:55 GMT -5 Text to speech is much easier, I'm thinking of trying it some time. If you have a very limited set of words being used I recommend domain specific synthesis. Otherwise use Formant synthesis.

Monokr0me Administrator [F4:monokr0me] Posts: 204	Music Synthesis Aug 4, 2012 0:56:44 GMT -5 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Monokr0me on Aug 4, 2012 0:56:44 GMT -5 I figured text to speech would just be comparing an array of regexp's with an array of recorded/generated syllables, is this a fairly accurate generalisation?

microfarad
N00b

Posts: 25

Music Synthesis Aug 4, 2012 14:41:39 GMT -5

Quote

Post by microfarad on Aug 4, 2012 14:41:39 GMT -5

No. Concatenating single phones usually leads to bad audio glitches. You could use diphone synthesis but that requires a database of thousands of sounds. Formant synthesis is completely artificial and you don't need to have a huge database.

Post by microfarad on Jul 24, 2012 2:13:05 GMT -5

Post by Draxor on Jul 24, 2012 13:23:57 GMT -5

Post by microfarad on Jul 24, 2012 18:43:47 GMT -5

Post by nmagane on Jul 24, 2012 19:21:21 GMT -5

Post by Monokr0me on Jul 28, 2012 22:24:22 GMT -5

Post by microfarad on Jul 31, 2012 3:08:54 GMT -5

Post by microfarad on Aug 3, 2012 18:06:18 GMT -5

Post by Monokr0me on Aug 3, 2012 19:25:43 GMT -5

Post by microfarad on Aug 4, 2012 0:11:55 GMT -5

Post by Monokr0me on Aug 4, 2012 0:56:44 GMT -5

Post by microfarad on Aug 4, 2012 14:41:39 GMT -5

Quick Reply