Top: FFT on an FPGA

Previous: Implementation

Modifying the FPGA of the USRP B100

The first thing to do before attempting to modify the FPGA for the USRP B100 is to make sure that we can compile the existing FPGA code into a correct image. To create the image we need the free ISE WebPACK software from xilinx. On linux, that will install xtclsh which is called by the uhd build scripts so make sure that it is in your $PATH.

Then go to uhd/fpga/usrp2/top/B100/ and you should then be able to run make -f Makefile.B100 bin to generate a binary image at build-B100/B100.bin. You need to then manually copy the image to a location on your computer where uhd will find it. On my computer that is /usr/local/share/uhd/images/usrp_b100_fpga.bin. If you don't know where to find it the following gnuradio script will display where it is getting the FPGA image. It is also a simple way of testing the image.

grab_data.py:

from gnuradio import gr, uhd

# Grabs 100 samples from the USRP.
tb = gr.top_block()
stream_args = uhd.stream_args(cpu_format='fc32', channels=range(1))
src = uhd.usrp_source(device_addr='', stream_args=stream_args)
src.set_samp_rate(2000000)
src.set_center_freq(5000000, 0)
src.set_gain(0, 0)
head = gr.head(gr.sizeof_gr_complex, 100)
snk = gr.vector_sink_c()
tb.connect(src, head, snk)
tb.run()
data = snk.data()
print(data)

So, assuming that all went well, we're ready to make some modifications to the FPGA verilog source code. First we make copies of Makefile.B100 -> Makefile.B100_b and u1plus_core.v -> u1plus_core_b.v.

In Makefile.B100_b change BUILD_DIR := build-B100 to BUILD_DIR := build-B100_b and u1plus_core.v to u1plus_core_b.v.

Now running make -f Makefile.B100_b bin will generate an image in build-B100_b/B100.bin but it should still be the same image since we haven't changed anything.

The default image for the USRP B100 FPGA is pretty full. The limiting factor is that it used 99% of the available slices. To make some room we need to remove some of the existing content. The bulk of the FPGA is filled with processing two receive streams. If we know we will only need to receive one signal, then we can remove one of these signal processing chains with no ill effects and free up a lot of room.

In u1plus_core_b.v we replace

// DSP RX 1

wire [31:0]   sample_rx1;
wire          strobe_rx1, clear_rx1;
wire [35:0]   vita_rx_data1;
wire          vita_rx_src_rdy1, vita_rx_dst_rdy1;

ddc_chain #(.BASE(SR_RX_DSP1), .DSPNO(1)) ddc_chain1
  (.clk(wb_clk),.rst(wb_rst), .clr(clear_rx1),
   .set_stb(set_stb),.set_addr(set_addr),.set_data(set_data),
   .set_stb_user(set_stb_user), .set_addr_user(set_addr_user), .set_data_user(set_data_user),
   .rx_fe_i(rx_fe_i),.rx_fe_q(rx_fe_q),
   .sample(sample_rx1), .run(run_rx1), .strobe(strobe_rx1),
   .debug() );

vita_rx_chain #(.BASE(SR_RX_CTRL1), .UNIT(1), .FIFOSIZE(10), .PROT_ENG_FLAGS(0), .DSP_NUMBER(1)) vita_rx_chain1
  (.clk(wb_clk),.reset(wb_rst),
   .set_stb(set_stb),.set_addr(set_addr),.set_data(set_data),
   .set_stb_user(set_stb_user), .set_addr_user(set_addr_user), .set_data_user(set_data_user),
   .vita_time(vita_time), .overrun(rx_overrun_dsp1),
   .sample(sample_rx1), .run(run_rx1), .strobe(strobe_rx1), .clear_o(clear_rx1),
   .rx_data_o(vita_rx_data1), .rx_dst_rdy_i(vita_rx_dst_rdy1), .rx_src_rdy_o(vita_rx_src_rdy1),
   .debug() );

with

// DSP RX 1

wire [35:0]   vita_rx_data1;
wire          vita_rx_src_rdy1, vita_rx_dst_rdy1;
reg [35:0]    vita_rx_data1_r;
reg           vita_rx_src_rdy1_r, vita_rx_dst_rdy1_r;
assign vita_rx_data1 = vita_rx_data1_r;
assign vita_rx_src_rdy1 = vita_rx_src_rdy1_r;
assign vita_rx_dst_rdy1 = vita_rx_dst_rdy1_r;
initial
  begin
     vita_rx_data1_r <= 36'd0;
     vita_rx_src_rdy1_r <= 1'b0;
     vita_rx_dst_rdy1_r <= 1'b0;
  end

Which is setting what would have been the outputs from the receive chain to 0 instead.

Now when we run make -f Makefile.B100_b bin we see in stdout that the fraction of used slices has descreased from 99% to 54% and we have enough room to add some additional code of our own.

We edit u1plus_core_b.v to insert our FFT module into the first receive stream between ddc_chain0 and vita_rx_chain0.

We replace:

vita_rx_chain #(.BASE(SR_RX_CTRL0), .UNIT(0), .FIFOSIZE(10), .PROT_ENG_FLAGS(0), .DSP_NUMBER(0)) vita_rx_chain0
  (.clk(wb_clk),.reset(wb_rst),
   .set_stb(set_stb),.set_addr(set_addr),.set_data(set_data),
   .set_stb_user(set_stb_user), .set_addr_user(set_addr_user), .set_data_user(set_data_user),
   .vita_time(vita_time), .overrun(rx_overrun_dsp0),
   .sample(sample_rx0), .run(run_rx0), .strobe(strobe_rx0), .clear_o(clear_rx0),
   .rx_data_o(vita_rx_data0), .rx_dst_rdy_i(vita_rx_dst_rdy0), .rx_src_rdy_o(vita_rx_src_rdy0),
   .debug() );

with

wire [31:0] sample_ab;
wire        strobe_ab;
wire        overflow;

// The FFT module itself.
// Here we are doing a FFT of length 16.
dit #(16, 4, 16, 16, 0) dit_0
  (.clk(wb_clk), .rst_n(~wb_rst),
   .in_x(sample_rx0), .in_nd(strobe_rx0),
   .out_x(sample_ab), .out_nd(strobe_ab),
   .overflow(overflow)
   );

// vita_rx_chain0 now takes sample_ab and strobe_ab as inputs.
vita_rx_chain #(.BASE(SR_RX_CTRL0), .UNIT(0), .FIFOSIZE(10), .PROT_ENG_FLAGS(0), .DSP_NUMBER(0)) vita_rx_chain0
  (.clk(wb_clk),.reset(wb_rst),
   .set_stb(set_stb),.set_addr(set_addr),.set_data(set_data),
   .set_stb_user(set_stb_user), .set_addr_user(set_addr_user), .set_data_user(set_data_user),
   .vita_time(vita_time), .overrun(rx_overrun_dsp0),
   .sample(sample_ab), .run(run_rx0), .strobe(strobe_ab), .clear_o(clear_rx0),
   .rx_data_o(vita_rx_data0), .rx_dst_rdy_i(vita_rx_dst_rdy0), .rx_src_rdy_o(vita_rx_src_rdy0),
   .debug() );

Since we are now using our dit module we need to include the necessary files. dit.v, butterfly.v and twiddlefactors_16.v are placed in uhd/fpga/usrp2/sdr_lib/ and uhd/fpga/usrp2/sdr_lib/Makefile.srcs is edited to include the new files. I found it necessary when building to delete the build-B100_b directory before running make -f Makefile.B100_b bin again. The fraction of used slices increased from 54% to 63%. Increasing the size of the FFT from 16 to 128 increased the slices used from 63% to 84%.

Now hopefully once the USRP is running with the new FPGA image the output will already have been passed through a FFT. This can be checked by applying a FFT of length 16 to the output with the orignal FPGA image, to the output with the modified FPGA image.