One big advantage that DSP radio design has over analog radio design is the considerable degree by which the design flow can be automated, i.e. carried out by computer algorithms. We illustrate this by considering the design of a digital down-converter (DDC) intended for an RF-sampling receiver based on the IEEE 802.22 standard. The receiver operates on channels 21 - 35 in the VHF band (512 - 602 MHz) and must therefore support a bandwidth of 90 MHz. See the white paper Cascading Analog and Digital Noise Figures here for more details on this receiver. The most essential specifications for the DDC are given in the table below.
The synthesizer SFDR (Spurious Free Dynamic Range) is defined as the ratio of the power in the desired spectral component to the power in the worst-case spur. The stopband attenuation refers to the decimation filters in the DDC. For information about how to define the noise figure of a DSP system, see white paper Noise Figure Calculations in DSP Systems here.
Figure 4 below shows a breakdown of the signal path in the DDC. It consists of CORDIC rotator block serving as a complex synthesizer/mixer, a CIC decimation stage with decimation factor D = 3, an FIR decimation stage with D = 3 and two FIR decimation stages with D = 2. The total decimation factor is 3*3*2*2 = 36. The CORDIC rotator block, in combination with the phase accumulator, acts as both synthesizer and mixer, down-converting the desired channel to baseband by phase-rotating the input samples. The design flow for the DDC is outlined below.
It is clear that steps 1, 3 and 5 are best performed manually. In fact, this is where the designer can leverage his or her experience in arranging the order of the decimation factors, choosing the filter type and datapath architecture that best fit the timing requirements for each stage, and so on. However, steps 2 and 6, which represent the bulk of the work, are almost entirely carried out by computer algorithms. Consider for instance setting the parameters for the CORDIC rotator in step 2 based on the required SFDR. When a CORDIC rotator is used as a synthesizer, the SFDR is determined by two parameters: the bitwidth of the input phase (the rotation angle) and the number of CORDIC iterations. The design algorithm for the CORDIC block operates by evaluating the SFDR for specific combinations of these two parameters, starting with small parameter values (which yield low SFDR) and gradually working its way upwards until the SFDR requirement is met. The same approach is used in the design of the FIR decimation filters: specific combinations of two parameters, the number of taps and the coefficient bitwidth, are tested by generating thousands of frequency responses and evaluating the stopband attenuation of each response. The two parameters are gradually increased until a frequency response with the required stopband attenuation is found.
The procedure for setting the vector of inter-block bitwidths in step 6 is straightforward. First, the designer specifies a set of feasible vectors. This is done by constraining the bitwidth in each inter-block node in Figure 4 to a small range of values. With some experience, it becomes a simple task to define the bitwidth ranges so that the optimum vector is contained within the resulting set. Next, a design tool is used to search the set for the vector that meets the noise figure requirement with minimum hardware cost. This part is completely automated. Because the hardware cost of a DSP block grows monotonically with its input bitwidth, the search algorithm is able to eliminate most of the remaining vectors once an initial solution (vector that meets the noise figure requirement) has been found. The results of the DDC design are presented below. Starting with step 2, the automated design of the CORDIC rotator and the decimation filters produced the following parameter values (to save space, coefficient values are not reported):
Based on these parameter settings, and taking into account the speed requirements, the following architectural decisions were made in step 3. For the CORDIC rotator, a fully pipelined architecture is used, with one pipeline stage for each iteration. The CIC filter is implemented in recursive form (rather than feed-forward/FIR form), but the integrators are pipelined to meet timing at the relatively high input sample rate. A "constant multiplier" architecture is used in all three FIR decimation stages. This is a type of architecture where the multiplications in the filter taps are carried out using dedicated shift-and-add logic (rather than a generic multiplier in a multiply-and-accumulate loop).
The hardware cost function chosen in step 5 was tailored for an FPGA target where logic cells consists of a 16-bit LUT and a single flip-flop. Examples of FPGA families that employ this structure are Xilinx Virtex-4 and Altera Cyclone IV. The cost function is given by C = logicSize + 0.25*regSize, where logicSize is the total amount of combinational logic in the DDC and regSize is the total amount of register storage. This is based on the assumption that only about 25% of the register storage will be implemented in logic cells that are not used for any other purpose. Since the combinational logic is measured in full-adders and one full-adder is implemented by one logic cell (in this particular type of FPGA), the chosen cost function roughly corresponds to the number of logic cells required to implement the DDC. Note that we could have made the cost function much more complicated. For instance, we could have included constraints on the amount of single-port RAM or dual-port RAM (however, this design uses very little RAM).
In step 6, the search algorithm was first run with the range for w1, w2, w3 and w4 set to 13 - 17 and the range for w5 set to 14 - 18. These bitwidth ranges define a set of 55 = 3125 feasible bitwidth vectors. The optimum vector within this set was found to be (15,15,15,13,16) with a noise figure of 11.93 dB and a hardware cost of 8206. Because the w4 value (13) in this vector touches the lower end of the range for that bitwidth, the algorithm was run again, this time with the w4 range set to 11 - 13. However, this search produced the same result, thus verifying that (15,15,15,13,16) is indeed the optimum bitwidth vector. Both searches took a few seconds to run on a typical Wintel desktop. Notice how the algorithm ended up choosing a smaller value for w4 than for the other bitwidths. This is because w4 is the input bitwidth of the third FIR decimation stage, the block that (with 45 taps and a coefficient bitwidth of 15) is the "hardware hog" of the system. Since the hardware sizes of a DSP block are roughly proportional to the input bitwidth, reducing w4 was in this case the most effective strategy for minimizing the hardware cost of the DDC. |
Custom DSP Solutions >