Digital Down-Converter Design

One big advantage that DSP radio design has over analog radio design is the considerable degree by which the design flow can be automated, i.e. carried out by computer algorithms. We illustrate this by considering the design of a digital down-converter (DDC) intended for an RF-sampling receiver based on the IEEE 802.22 standard. The receiver operates on channels 21 - 35 in the VHF band (512 - 602 MHz) and must therefore support a bandwidth of 90 MHz. See the white paper Cascading Analog and Digital Noise Figures here for more details on this receiver. The most essential specifications for the DDC are given in the table below.
 
 Input Sample Rate  246.816 MHz
 Decimation Factor  36
 Input Bitwidth  14
 Synthesizer SFDR  80 dB
 Stopband Attenuation  80 dB
 Noise Figure  12 dB
 
The synthesizer SFDR (Spurious Free Dynamic Range) is defined as the ratio of the power in the desired spectral component to the power in the worst-case spur. The stopband attenuation refers to the decimation filters in the DDC. For information about how to define the noise figure of a DSP system, see white paper Noise Figure Calculations in DSP Systems here.

Figure 4 below shows a breakdown of the signal path in the DDC. It consists of CORDIC rotator block serving as a complex synthesizer/mixer, a CIC decimation stage with decimation factor D = 3, an FIR decimation stage with D = 3 and two FIR decimation stages with D = 2. The total decimation factor is 3*3*2*2 = 36. The CORDIC rotator block, in combination with the phase accumulator, acts as both synthesizer and mixer, down-converting the desired channel to baseband by phase-rotating the input samples.

 
 
The  design flow for the DDC is outlined below.
  1. Break the system into individual DSP blocks and specify the basic structure of each block. This step has already been carried out in Figure 4.
  2. Specify filter parameters (number of taps, coefficient values etc) and synthesizer parameters that meet the stopband attenuation and SFDR requirements. Basically, this step deals with all the parameters in the DDC that are not signal bitwidths. The objective is to find the smallest parameter values that satisfy the stopband and SFDR requirements, since this will have a big impact on the hardware cost of the DDC.
  3. Choose the datapath architecture for each block. The design software normally supports at least two architectural choices for each DSP structure, one designed for speed and one designed for low hardware cost. It is important that the choice of architecture takes into consideration the system parameters calculated in the previous step. The architectural information is used for hardware size estimation.
  4. Express the gain, noise figure and hardware sizes in each block as a function of its input and output bitwidths. This step is already implemented in the design software, so there is usually very little for the designer to do here. Note that the gain and noise figure are not dependent on the datapath architecture chosen in step 3, but the hardware sizes are. Examples of included hardware sizes are the amount of combinational logic (measured as an equivalent number of full-adders) and the amount of register storage (measured in bits).
  5. Define a hardware cost function to drive the setting of the inter-block bitwidths in the next step. In the Wireless Modems section, we described a very simple hardware cost function that often yields decent results. However, in DSP radio applications like the present example, we use a considerably more sophisticated approach that takes into account the actual hardware sizes in each block. By carefully choosing the weight factor for each hardware category, the designer can tailor the cost function towards a specific target, such as ASIC, FPGA or a specific FPGA family. For example, the weight factor applied to the amount of register storage in an FPGA design would normally be much smaller than in an ASIC design, since most registers in an FPGA design are implemented using flip-flops that are located in logic cells that have already been allocated for arithmetic operations. The cost of such registers is zero, since allocating these flip-flops does not require any additional logic cells. The hardware cost function can also be used to implement hardware resource constraints, by returning a cost of +infinity when the total amount of a certain hardware category exceeds the amount that is available for the design.
  6. Set the inter-block bitwidths (w1,w2,w3,w4,w5) so that the hardware cost function is minimized, with the required noise figure acting as a constraint. Note that due to step 4, for each choice of vector  (w1,w2,w3,w4,w5)  we can calculate the gain, noise figure and hardware sizes of each individual DSP block, and therefore also the total noise figure and hardware cost of the whole DDC.
  7. Compute all internal bitwidths in the DSP blocks. This is done automatically by the design software, since all bitwidths inside a DSP block are functions of its input and output bitwidths, which have already been determined in step 6.
It is clear that steps 1, 3 and 5 are best performed manually. In fact, this is where the designer can leverage his or her experience in arranging the order of the decimation factors, choosing the filter type and datapath architecture that best fit the timing requirements for each stage, and so on. However, steps 2 and 6, which represent the bulk of the work, are almost entirely carried out by computer algorithms. Consider for instance setting the parameters for the CORDIC rotator in step 2 based on the required SFDR. When a CORDIC rotator is used as a synthesizer, the SFDR is determined by two parameters: the bitwidth of the input phase (the rotation angle) and the number of CORDIC iterations. The design algorithm for the CORDIC block operates by evaluating the SFDR for specific combinations of these two parameters, starting with small parameter values (which yield low SFDR) and gradually working its way upwards until the SFDR requirement is met. The same approach is used in the design of the FIR decimation filters: specific combinations of two parameters, the number of taps and the coefficient bitwidth, are tested by generating thousands of frequency responses and evaluating the stopband attenuation of each response. The two parameters are gradually increased until a frequency response with the required stopband attenuation is found.

The procedure for setting the vector of inter-block bitwidths in step 6 is straightforward. First, the designer specifies a set of feasible vectors. This is done by constraining the bitwidth in each inter-block node in Figure 4 to a small range of values. With some experience, it becomes a simple task to define the bitwidth ranges so that the optimum vector is contained within the resulting set. Next, a design tool is used to search the set for the vector that meets the noise figure requirement with minimum hardware cost. This part is completely automated. Because the hardware cost of a DSP block grows monotonically with its input bitwidth, the search algorithm is able to eliminate most of the remaining vectors once an initial solution (vector that meets the noise figure requirement) has been found.

The results of the DDC design are presented below. Starting with step 2, the automated design of the CORDIC rotator and the decimation filters produced the following parameter values (to save space, coefficient values are not reported):
 
 Block  Parameter  Value
 CORDIC  Phase bitwidth  16
 CORDIC  Number of iterations  14
 CIC  Cascading Factor  3
 FIR 1  Number of Taps  13
 FIR 1  Coefficient Bitwidth  11
 FIR 2  Number of Taps  9
 FIR 2  Coefficient Bitwidth  13
 FIR 3  Number of Taps  45
 FIR 3  Coefficient Bitwidth  15
 
Based on these parameter settings, and taking into account the speed requirements, the following architectural decisions were made in step 3. For the CORDIC rotator, a fully pipelined architecture is used, with one pipeline stage for each iteration. The CIC filter is implemented in recursive form (rather than feed-forward/FIR form), but the integrators are pipelined to meet timing at the relatively high input sample rate. A "constant multiplier" architecture is used in all three FIR decimation stages. This is a type of architecture where the multiplications in the filter taps are carried out using dedicated shift-and-add logic (rather than a generic multiplier in a multiply-and-accumulate loop).
 
The hardware cost function chosen in step 5 was tailored for an FPGA target where logic cells consists of a 16-bit LUT and a single flip-flop. Examples of FPGA families that employ this structure are Xilinx Virtex-4 and Altera Cyclone IV. The cost function is given by C = logicSize + 0.25*regSize, where logicSize is the total amount of combinational logic in the DDC and regSize is the total amount of register storage. This is based on the assumption that only about 25% of the register storage will be implemented in logic cells that are not used for any other purpose. Since the combinational logic is measured in full-adders and one full-adder is implemented by one logic cell (in this particular type of FPGA), the chosen cost function roughly corresponds to the number of logic cells required to implement the DDC. Note that we could have made the cost function much more complicated. For instance, we could have included constraints on the amount of single-port RAM or dual-port RAM (however, this design uses very little RAM).
 
In step 6, the search algorithm was first run with the range for w1, w2, w3 and w4 set to 13 - 17 and the range for w5 set to 14 - 18. These bitwidth ranges define a set of 55 = 3125 feasible bitwidth vectors. The optimum vector within this set was found to be (15,15,15,13,16) with a noise figure of 11.93 dB and a hardware cost of 8206. Because the w4 value (13) in this vector touches the lower end of the range for that bitwidth, the algorithm was run again, this time with the w4 range set to 11 - 13. However, this search produced the same result, thus verifying that (15,15,15,13,16) is indeed the optimum bitwidth vector. Both searches took a few seconds to run on a typical Wintel desktop. Notice how the algorithm ended up choosing a smaller value for w4 than for the other bitwidths. This is because w4 is the input bitwidth of the third FIR decimation stage, the block that (with 45 taps and a coefficient bitwidth of 15) is the "hardware hog" of the system. Since the hardware sizes of a DSP block are roughly proportional to the input bitwidth, reducing w4 was in this case the most effective strategy for minimizing the hardware cost of the DDC.