# Diving into the DFT

Time to look at the Discrete Fourier Transform or DFT. It’s a bit daunting to try and figure out what’s going on in there - I found it confusing at first, and looked to various online resources for help. I found Jack Schaedler’s tutorial to be very helpful for giving an intuitive picture of the DFT. David Dorran’s YouTube tutorial is also very good. Julius O. Smith’s book lays out the math in great detail, and the Coursera Course he does with Xavier Serra covers both the theory and programming aspects of the DFT. So…

The equation for the DFT is:

\[X[k] = {\sum_{n=0}^{N-1}}x[n]e^{-j2{\pi}kn/N} \quad k = 0, 1, 2, …, N-1\]

I translate this into English by saying “the kth bin of the DFT is the sum of the product of each signal sample x[n] and \(e^{-j2{\pi}kn/N}\) from n = 0 to n = N-1.” I’m not sure how useful it is to read, but I find it useful to force myself to put equations like this into words.

Another way to think of it is that since \[e^{-j\theta} =cos(\theta ) - jsin(\theta)\] The DFT equation can be rewritten as \[X[k] = {\sum_{n=0}^{N-1}}x[n]cos(\frac{2{\pi}kn}{N}) - j{\sum_{n=0}^{N-1}}x[n]sin(\frac{2{\pi}kn}{N}) \quad k = 0, 1, 2, …, N-1\]

Which basically says, among other tings, that the number of DFT frequency bins or “Basis Frequencies” is related to the number of samples in the signal as \[\frac{n}{N} \quad or \quad \frac{0}{N}, \frac{1}{N}, \frac{2}{N},…,\frac{N-1}{N},\] and that the kth bin, X[k], has both real and imaginary components.

Looking at a few examples helps to show what’s going on. Starting with k=0, the 0Hz component, we get a flat line - DC.

For k=1, we get a low frequency component

For k=5, we get a higher frequency component

For k=15 the frequency component is higher still. The apparent modulation of the data is an artifact of plotting a small number of points.

k = 16, or N/2, corresponds to the highest frequency component. Note that the imaginary component here is 0j

At k=17, the plot looks the same as k=15

k = N-1 looks very similar to k = 1, except that the phase of the sin component is inverted.

Back to the DFT… I’m going to use a signal with frequency 5 cycles per time unit and sample rate 32 samples per time unit for x[n]. I say “per time unit” because although 1 second is what you see most often, any unit of time could be chosen. It looks a bit ragged, but that’s just because of the low number of samples per cycle:

The DFT Calculation looks like this:

Here’s another way of looking at it, showing the dot product of the signal x[n] with each basis frequency. As in the plot, the only basis frequency that is correlated with the signal is at k = 5.

That worked nicely for an integer signal frequency. If you use a float, like f = 5.4, the DFT spreads out and doesn’t give such a precise result (the blue trace in the plot). It also doesn’t make much sense that frequencies on either side of 5 would be correlated and then anticorellated with the basis sinusoid at 5. Things make a lot more sense if you look at the magnitude by taking the absolute value of each X[k] (green trace).

Using a real input signal generates two peaks, each only half as high

Recall that the transformation kernel generates frequencies in the pattern low - high - low, with the high being at k = N/2 or 16 in this case. The peak at 27 can be thought of as having a frequency of -5 relative to k=N-1 Shifting the indices around centers the plot around 0 and shows the peaks as frequency components at +/- 5. The idea of negative frequency is pretty weird. About the best explanation I have found is that it is like a spinning wheel in that you can think of it spinning clockwise at x rpm or counterclockwise at -x rpm, which is . There is also the explanation that \(e^{-j{\omega}} = cos({\omega}) - jsin({\omega})\), and that the imaginary component is orthogonal to the real component and cancels out in the dot product when the signal is complex, but not when it is real.

The same plot as above, with a non-integer frequency of 5.5

Being an infinite series of odd harmonics in decreasing proportion (1, 1/3, 1/5, 1/7 etc.), a square wave is a little more interesting than a sine, as the DFT shows. The function below generates one for a given frequency and number of harmonics.

A cleaner square wave can be generated with a number sequence [1,1,1,…-1,-1,-1]. The DFT for the first ten harmonics is identical.

A sawtooth wave has both even and odd harmonics in descending proportion of (1, 1/2, 1/3, 1/4, etc.)

A triangle wave:

Finally, an actual audio file. I used a 1 second snippet taken from freesound.org, and the DFT took over five minutes to calculate, which is no doubt why the FFT is used. I used a 5000 sample slice to speed things up.

There is plenty more to look at here - more in a subsequent post.

comments powered by Disqus