Top: FFT on an FPGA

Previous: Derivation

Implementation of Decimation-in-Time Fast Fourier Transform

The purpose of this section is to give a basic overview of how the FFT was implemented. Hopefully it will make the source code easier to follow.

The source consists of three files, dit.v, butterfly.v and twiddlefactors_N.v where N is the length of the fourier transform. dit.v contains a module dit which implements most of the algorithm. butterfly.v contains a module butterfly which takes inputs A, B, and T and outputs A+TB and A-TB, to take care of equations 3 and 4. Finally twiddlefactors_N.v contains previously calculated values of \(T_{n}\).

The implementation uses 5 buffers large enough to hold N complex numbers. Two of these are input buffers, two are working buffers, and one is an output buffer. The module takes inputs x_in and nd_in, where x_in is wide enough to hold a complex number and nd_in indicates whether new data is on x_in. The outputs are x_out and nd_out which act in a similar fashion along with overflow which is set to 1 if the module cannot keep up with the input sample stream.

When data arrives in x_in (nd_in == 1), it is copied into the active input buffer. When that input buffer is full, and the previous FFT is finished, the contents on the input buffer are used and stage 1 of the FFT is generated and written to a working buffer. The data then moves back and forth between the two working buffers as sequential stages of the FFT are processed. Finally the last stage of the FFT is written onto the output buffer and then streamed out in x_out. The second input buffer is used to collect new input data while the first input buffer is being read during the calculation of the first FFT stage. In this way, the input buffers swap roles back and forth.

The real number-crunching is the calculation of each stage from the previous one using equations 3 and 4 which were,

\begin{equation*} \begin{aligned} P_{s,kS+j} & = P_{s-1,2kS+j} + T_{kS} P_{s-1,2kS+S+j} \\ P_{s,kS+j+N/2} & = P_{s-1,2kS+j} - T_{kS} P_{s-1,2kS+S+j} \end{aligned} \end{equation*}

To do this we step \(i\) from 0 to \(N/2\) and let \(i=kS+j\). On each step we calculate the input addresses (\(2kS+j\), and \(2kS+S+j\)), output addresses(\(kS+j\), and \(kS+j+N/2\)), and twiddle factor address (\(2kS\)). Calculation of the addresses is helped by realising that the integer \(i\) is \(log_2(N)\) bits long, and that smallest \(log_2(S)\) bits give \(j\) while the remainder give \(kS\). Calculating \(2kS+j\) is then done by shifting the high \(log_2(N/S)\) bits up one and leaving the lower bits alone. Similarly calculating \(2kS\) to get the address of the twiddle factors is similar but we set the lowers bits to 0. \(kS+j+N/2\) is found by setting the high bit of \(i\) to 1. \(2kS+S+j\) is calculated by adding \(S\) to \(2kS+j\). Once we have these addresses we can assign the inputs and outputs of the butterfly module appropriately to update the outputs.

Verification using MyHDL

My first attempt at verification was done using verilog, however using a hardware-definition language (HDL) like verilog for verification was painful to me, so I looked around and discoved the python package MyHDL.

MyHDL allows one to interface standard python code with a HDL like verilog so that the standard python unittesting framework can be used to test verilog code via the opensource iverilog simulator.

Next: Modifying the FPGA image of the USRP B100.