

# **Sournal of Engineering and Development**

www.jead.org Vol.19, No. 06, November 2015 ISSN 1813-7822

# DESIGN AND FPGA IMPLEMENTATION OF WIRELESS BASEBAND MODEM FOR WIMAX SYSTEM BASED ON SDR

Dr. Sabah Nassir Hussein<sup>1</sup>,\* MSc. Student. Raghad Saad Majeed<sup>2</sup>

Assist Prof., Computer Techniques Engineering Department, Middle Technical University, Baghdad, Iraq.
MSc. Student., Computer Techniques Engineering Department, Middle Technical University, Baghdad, Iraq.

(Received:1/11/2015; Accepted:26/5/2015)

**Abstract:** This paper present a design and implementation of Wireless modem with 16-QAM modulation, convolution and differential coding based of SDR using MATLAB system generator model, for WiMAX applications. The key difficulties used in the modulation schemes for WiMAX design, as mentioned by IEEE 802.16 standard, needs ample hardware if all the modulation schemes and code rates have to be designed on FPGA. The implemented modem performance has been checked using the error rate calculation as an indicator, with different channel noise environments. The results of resource estimator showed the use of an appropriate number of slices, and other resource of FPGA for the completion of this design and this is encouraging for the use of this design in a variety of other applications of wireless communications.

Keywords: WiMAX, SDR, FPGA, system generator, 16-QAM.

# تصميم وتنفيذ منظومة لاسلكية للارسال والاستقبال لنظام واي ماكس بأستخدام تقنية الSDR

الخلاصة يتناول هذا البحث تصميما لبناء دائرتي تظمين واسترجاع البيانات في الشبكات اللاسلكية (IEEE 802.16) في جهتي الارسال والاستقبال بأسلوب التصميم البرمجي للمراحل SDR وباستخدام برمجيات (system generator) وقد تم التحقق من كفاءة أداء التصميم باعتماد نسبة الخطأ كمؤشر، ولمختلف انواع ضوضاء القالة .أظهرت نتائج كذلك من خلال استخدام (resource estimator) ان التصميم استخدم عدد قليل من الشرائح، والموارد الأخرى من FPGA لإنجاز هذا التصميم وهذا أمر مشجع لاستخدام هذا التصميم في معنو من كفاءة أداء التصميم استخدم عدد قليل المرائح في الفرائح والمتخدم عدد قليل من الشرائح، والموارد الأخرى من الأخرى من الموارد الأخرى من المادي بالطرق التقليدية.

# 1. Introduction

At present Field Programmable Gate Array (FPGA) get more effective role in designing, simulating, and implementing the new communication system design and consider the best choice of use for the implementation of most of the hardware sections[1], to reduce the complexity in the construction of modern communication devices, especially if the system requires a change in the design and style of parts of the system (as in adaptive communication systems), Also with the active role in the use of a FPGA chip, the eliminate using the same hardware parts for the different designs are possible. The Xilinx ISE design suite package produce a new powerful blocks in MATLAB Simulink called system generator blocks for

<sup>\*</sup>Corresponding Author raghad eng89@yahoo.com

DSP design, using the system generator blocks the design task becomes easier than it has been before, and moving from the Simulink design Blocks to the (SG Blocks), design becomes smooth[2]. Since both of the designs in the same environment the Simulink, so performance difference between them can be checking directly. The Software Defined Radio (SDR) can be use the same technique to implement the radio components, and to make the communication system more flexible and user friendly, the SDR technology which is depending on radio communication components that have been typically implemented by means of software on embedded system [3]. One of the widely used modulation techniques in communication is SDR technology 16- QAM (Quadrature Amplitude Modulation) [4], because of its high efficiency in power and higher data rate that is increased by a factor of four. Transmission efficiency and reliability can be improved by using convolutional and differential coding [5]. One of the powerful forward error correction codes that use the input data to create a continuous flow of bits protected from errors is called convolution code [6]. With digital modulation a differential encoder is usually used to prevent the data inversion [4]. System was built in system generator instead of VHDL code, because of its more flexible, more vision, more control, more optimal, easy in design, easy in simulation and easy in update variable and function in the system.

#### 2. The 16-QAM System

In the quadrature amplitude modulation technique, the modulating signal (data) has been used to change both the amplitude and phase of the carrier signal and produce a complex modulated signal that contain areal and imaginary components. Therefore the output of the QAM modulator contains two carrier waves; these orthogonal waves form the constellation diagram. In a 16-QAM modulation the constellation diagram contain 16 different symbols each having a different real and imaginary component, selecting one symbol out of 16 required a 4 bit data input to the modulator [3]. To reduce the error between the consecutive symbols the input data converted to Gray code data to void the smooth change that produce Consecutive constellation diagram symbols as shown in Figure (1).

The 4 bits Gray coded that represent one point in the constellation diagram can be regarded as two of two bits words one word represent on I-axis, and the other on Q-axis respectively as shown in table (1) and represented in eq(2) [5].



Figure 1. The 16 QAM constellation diagram with Gray code input mapping [5]

(4)

| $D_0 D_1$ | I-axis | $D_2D_3$ | Q-axis |
|-----------|--------|----------|--------|
| 00        | -3     | 00       | -3     |
| 01        | -1     | 01       | -1     |
| 11        | +1     | 11       | +1     |
| 10        | +3     | 10       | +3     |
|           |        |          |        |

Table (1): The Gray coded constellation mapping for 16 QAM

The received complex coded sequence is;

$$y = x + n \tag{1}$$

Where x is the data complex sequence in the form of;

$$\alpha_{16\,QAM} = \begin{cases} \overline{\mp}1 + \overline{\mp}1j, \overline{\mp}1 + \overline{\mp}3j \\ \overline{\mp}3 + \overline{\mp}3j, \overline{\mp}3 + \overline{\mp}1j \end{cases}$$
(2)

And n is the Additive White Gaussian Noise (AWGN) following the probability distribution function;

$$p(n) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{1}{2} \left(\frac{n-\mu}{\sigma}\right)^2}$$
(3)

With mean value  $\mu=0$ , and variance  $\sigma^2$ 

In demodulation the **Maximum a Posteriori Probability** (MAP) method has been used as soft bit detection for 16-QAM. This method usually maximizes the probability that assume the bit  $b_m$  was transmitted given y received;

$$P(b_m/y) = \frac{P(y/bm)P(bm)}{P(y)}$$
(5)

Where P (bm/y) is the probability of transmitted bit bm given received bit y, P (y/ bm) is the probability of received bit y given transmitted bit bm, P(bm) is probability of transmitted bit bm and P(y) is probability of received bit y.

The detail description of soft bit detection is summarizing in [5].

The soft bit for bit  $b_0$  is;

$$sb(b_0) = \begin{cases} 2(y_r + 1), & y_r < -2 \\ y_r, & -2 \le y_r < 2 \\ 2(y_r - 1), & y_r > 2 \end{cases}$$
(6)

And the soft bit for bit  $b_1$  is;

$$sb(b_1) = \begin{cases} y_r + 2, & y_r \le 0\\ -y_r + 2, & y_r > 0 \end{cases}$$
 (7)

The soft bits for the rest of bits ( $\mathbf{b}_2$ ,  $\mathbf{b}_3$ ) are identical to soft bits for  $\mathbf{b}_0$  and  $\mathbf{b}_1$  respectively except that the decisions are based on the imaginary component of the received vector  $\mathbf{y}_i$  [5].

#### 2.2.1. The Convolutional Codes

In convolutional code, an encoder process a sequence of input data bits and generates a sequence of output bits, these bits depends on the current data bit and previous input bits, the number of previous input bits is called the constraint length of the code. Usually the convolution codes are specified in terms of constraint length and their rate. Different size of the convolutional code are widely used in GSM, 802.11, and 802.16 networks, one of a popular convolutional code is shown in Figure (2), with the code rate r=1/2 and code length K=3. From the Figure (2), each input data bit produce two output puts that are XOR sums of the input bit and internal memory state that specified by the generator polynomials as shown in Eq(8) [6]. The first and second sequence generators respectively are;



Figure 2. The convolutional code with the generators sequence G1and G2 [6]

$$G_1 = [110]$$
 and  $G_2 = [111]$  (8)

#### 2.2.3. The Decoding of the Convolutional Codes

The best decoding of convolution codes can serve the maximum-likelihood or the maximum a posteriori principle that include a search through the trellis for the most probable sequence. Depending on whether the detector following the demodulator performs hard or soft decisions, the corresponding metric in the trellis search may be either a Hamming metric or a Euclidean metric, respectively [6].

In general, when a binary convolutional code with constraint length K is decoded by means of the Viterbi algorithm, there are 2K-1 states. Hence, there are 2K-1 surviving paths at each stage and 2K-1 metrics, one for each surviving path. Furthermore, a binary convolutional code in which k bits at a time are shifted into an encoder that contains of K (k-bit) shift-register stages generates a trellis that has 2k(K-1) states. Consequently, the decoding of such a code by means of the Viterbi algorithm requires keeping track of 2k(K-1) surviving paths and 2k(K-1) metrics. At each stage of the trellis, there are 2k paths that merge at each node. Since each path that converges at a common node requires the computation of a metric, there are 2k metrics computed for each node. Of the 2k paths that merge at each node, only one survives, and this is the most probable (minimum-distance) path. Thus, the number of computations in decoding executed at each stage increases exponentially with k and K as

shown in Figure (3). The exponential increase in computational burden limits the use of the Viterbi algorithm to relatively small values of K and k [6].



Figure 3. The state sequence and corresponding path in the Viterbi algorithm [6]

#### 2.2.4. The Differential Coding

To demodulate 16 QAM one requires making a local oscillator synchronous with the remote one. This is done by a carrier recovery circuit [4]. For this coding, if a carrier is recovered wrongly, the received data is inverted [4]. Supposing that  $x_i$  is a bit intended for transmission, and  $y_i$  is a bit actually transmitted (differentially encoded), if

$$y_i = y_{i-1} + x_i (9)$$

Is transmitted, then on the decoding side,

$$x_i = y_{i-1} + y_i \tag{10}$$

Can be reconstructed, where + is modulo-2 addition. Now  $x_i$  depends only on a difference between  $y_i$  and  $y_{i-1}$  and not on their values. So, whether the data stream is inverted or not, the decoded data will always be correct [4].

#### 3. The 16 QAM SDR Block Diagram

WiMAX is a form of wireless Ethernet and therefore the whole standard is based on (OSI) reference model, and the lowest layer of the model hardware part of is the physical layer, that specifies the frequency band, the modulation scheme, synchronization between transmitter and receiver, error-correction techniques, multiplexing, and the data rate techniques. Also the layer is characterized by Orthogonal Frequency Division Multiplexing (OFDM), Time Division Duplexing (TDM), Frequency division Duplexing, Quadrature Amplitude Modulation (QAM) and adaptive antenna systems as shown in Figure (4). This work has been

focusing on the design and implementation of Base Band WiMAX Modem for used as a part of Figure (4) as shown in the solid line blocks of the receiver and transmitter parts.



Figure 4. The block diagram of transmitter and receiver in the WiMAX syste

The blocks of the Base Band WiMAX Modem shown in Figure (5), that contains coding and modulating sections.



Figure 5. The block diagram of the Baseband WiMAX Modem

The design of blocks of Figure 5, can be explain based on system generator of Xilinx that work under the environment of MATLAB Simulink for FPGA design. The past experience with Xilinx FPGA or Hardware properties Languages (HDLs) is not needed when using system generator. All of the downstream FPGA implementation procedures including synthesis, position and route are automatically executed to generate an FPGA programming.

# A. The main blocks of transmitter section

It has six main elements are

1. Random Binary Signal Generation

The Random integer block in Mat lab Simulink is used as a stream binary signal with 2 M-arry number .

#### 2. Convolutional Encoder

The convolutional encoder Xilinx IP core of system generator has been used that have nativerate of 1/2, a constraint length equal to 3, and generator polynomials codes G1=110 and G2=111. Fig (6) shows convolutional encoder system [9].



Figure 6. The convolutional Encoder Xilinx IP code

# 3. Parallel to Serial

The parallel to serial conversion has been done using a special converter serial to parallel available in system generator whose block layout is shown in Figure (7). In this block, the parallel two bits output data of convolutional encode are converted to serial streams bits.



Figure 7. The parallel to serial converter

# 4. Differential Encoder

This encoder has been done using one delay with the logical (exclusive or) components as shown in Figure (8).



Figure 8. The differential encoder circuit

# 5. Serial to Parallel Converter

The serial to parallel conversion is used to convert the Din serial bits to D4 bit parallel 4 bits of data in the form d(3:0) using shift registers as shown in Figure (9).



Figure 9. The serial to parallel converter block.

#### 6. 16-QAM Mapping

Each parallel four bits generated from the serial to parallel section are mapped using the 16-QAM constellation. The "In-phase" and "Quadrature" signals are generated from the first and the last two bits respectively according to Table (1). The four coding values ( $\pm 1$  to  $\pm 3$ ) are stored in a ROM memory block. The block diagram of 16-QAM mapping is demonstrated in Figure 10[7].



Figure 10. The 16-QAM Mapping

#### B. The main blocks of receiver section

It has five main elements are:

# 1. 16 QAM De-Mapping

The De-Mapping has been performed by assigning the received I-Q signals location to the nearest point in the I-Q constellation using soft bit algorithm that is discussed in section (2.2b). Figure 11 shows a 16 QAM De-Mapping block set. The circuit has four blocks of soft bit decision one for each bit to detect the received bit. Figure (12 and 13) shows the soft bit decision circuit for b0 and b1 respectively. Soft bit decision of b2 and b3 has been built in the same manner as b0 and b1 respectively [8].



Figure 12. The Soft bit decision of 16 QAM de-mapping for  $b_{\rm 0}$ 



Figure 13. Soft bit decision of 16 QAM de-mapping for b1

## 2. Parallel to Serial Conversion

The parallel four bits output from 16 QAM De-mapping has been converted to serial stream bits using parallel to serial converter as shown in Figure (14).



Figure 14. The parallel 4 bits to serial converter

#### 3. Differential Decoder

In Figure 15 shows the differential decoder circuit. Using one bit delay block, and one block exclusive OR components [4].



Figure 15. The differential decoder circuit

#### 4. Serial to Parallel

The serial to parallel conversion has been used to convert serial data to parallel two bits streams d(1:0). The slice blockset has been used to select one bit, the upper slice select d(0) and the lower slice select d(1) as shown in Figure(16).



Figure 16. The serial to parallel convertor

# 5. Viterbi Decoder

Viterbi decoder Xilinx IP core version7 has been used to recover information bits as shown in Figure (17). Viterbi decoder has the same parameter setting that has been used in convolutional encoder [9].



Figure 17. The Viterbi decoder circuit

# 4. The Simulation Results and Hardware test

The verification of the implementation has been done via MATLAB package. Figure (18) and (19) shows the simulation results of transmitter and receiver side respectively. The output of the transmitter is passing through AWGN channel.



Figure 18: Time waveform of transmitter side

|                                    |            | (a) QAM <u>demapping</u>         |
|------------------------------------|------------|----------------------------------|
|                                    | 300 350    | (b) Output of parallel to serial |
|                                    | 300 350    | (c) Output of differential       |
|                                    |            | (d) Output of serial to parallel |
|                                    |            | (e) <u>the</u> first bit $d(0)$  |
|                                    |            | (f) the second bit $d(1)$        |
|                                    |            | (g) received stream bits (output |
| 0 20 40 60 80 100 12<br>Time (sec) | 10 140 160 | of Viterbi decoder)              |

Figure 19. Time waveform of receiver side



Figure (20) constellation diagram of 16 QAM at transmitter side.



Figure (21) constellation diagram of 16 QAM at receiver side.

Table (2) shows resource utilization and operating frequency.

| Device Utilization Summary                     |      |           |             |  |  |
|------------------------------------------------|------|-----------|-------------|--|--|
| Logic Utilization                              | Used | Available | Utilization |  |  |
| Number of Slice Flip Flops                     | 269  | 11,776    | 2%          |  |  |
| Number of 4 input LUTs                         | 231  | 11,776    | 1%          |  |  |
| Number of occupied Slices                      | 282  | 5,888     | 4%          |  |  |
| Number of Slices containing only related logic | 282  | 282       | 100%        |  |  |
| Number of Slices containing unrelated logic    | 0    | 282       | 0%          |  |  |
| Total Number of 4 input LUTs                   | 270  | 11,776    | 2%          |  |  |
| Number used as logic                           | 197  |           |             |  |  |
| Number used as a route-thru                    | 39   |           |             |  |  |
| Number used as Shift registers                 | 34   |           |             |  |  |
| Number of bonded IOBs                          | 22   | 372       | 5%          |  |  |
| Number of BUFGMUXs                             | 1    | 24        | 4%          |  |  |
| Number of MULT 18X 18SIOs                      | 4    | 20        | 20%         |  |  |
| Number of RAMB16BWEs                           | 3    | 20        | 15%         |  |  |
| Average Fanout of Non-Clock Nets               | 2.44 |           |             |  |  |

Table (2) Resource utilization and operating frequency

Figure (20 and 21) show the constellation diagram of 16 QAM for transmitter and receiver side

The hardware test for SDR system is depicted in Figure (22). The output of Viterbi decoder is display on the oscilloscope via RS 232 DTE with pin 3 (F16) as shown in Figure (23). The plot display shown in laptop screen represents the source stream binary bits while the plot display shown in oscilloscope represents the output of the system. When comparing between the two plots see they are identical.



Figure 22: Hardware test for SDR system



Figure 23: RS232 DTE serial port interfaced with FPGA

# 4. Conclusions

A Baseband WiMAX Modem has been implemented using system generator. Software Defined Radio (SDR) has the flexibility to modify the characteristics of a transmitting and receiving radio device, without physically modifying the hardware. The system generator has used to generate VHDL code for the implemented modem. The system generator gives flexibility and optimal in communication system design. The hardware has been implemented on the Xilinx Spartan 3an FPGA using VHDL. Comparison of our proposed work with a conventional LUT-based method and also with a recent work show significant improvement on resource utilization and operating frequency as shown in Table (2). Also the impact of using coding has been checked using a different level of noise channel before and after using the convolutional, and differential coding and the result shows good result as shown in the scatter plots of Figure (19).

# 4. References

- 1. Murali Krishna1, Ramesh,( 2014)," Efficient Implementation of Address Generator for WiMAX Deinterleaver on Xilinx FPGA", International Journal of Application or Innovation in Engineering & Management (IJAIEM), Issue 5, Vol. 3, pp. 451-455, May.
- 2. Sanket Prakash Joshi,( 2012), "Integrating FPGA with Multicore SDR Development Platform to Design Wireless Communication System", MSc. Thesis, California State University, Northridge, May.
- Raghunandan Swain, Ajit Kumer Panada, (2012), "Design of 16-QAM Transmitter and Receiver: Review of Method of Implementation in FPGA", International Journal of Engineering and Science ISSN, Vol. 1, Issue 9, pp. 23-27, November.
- 4. Robert F. H. Fischer, Lutz H. J. Lamp, and Stefano Calabro, (2000), "Differential Encoding Strategies for Transmission over Fading Channels", AEU International Journal of Electronics and Communications, Issue 1, Vol. 54, pp. 59-67, Germany.
- 5. Filippo Tosato and Paola Bisaglia,(2002),"Simplified Soft-Output Demapper for Binary Interleaved COFDM with Application to HIPERLAN/2", IEEE International Conference on Communication (ICC), Vol. 2, pp. 664-668.
- 6. John G. Proakis and Masoud Salehi,(2008),"Digital Communication", McGrw-Hill, Fifth Edition, New York.
- 7. Khaled Sobaihi, Akram Hammoudeh, David Scammel,(2010),"FPGA Implementation of OFDM Transceiver for a 60GHz Wireless Mobile Radio system", International Conference on Reconfigurable Computing on source IEEE Xplore.
- 8. Yusep Rosmansyah, (2003), "Soft-Demodulation of QPSK and 16-QAM for Turbo Coded WCDMA Mobile Communication Systems", PhD Thesis, Surrey Guildford University, United Kingdom, July.
- 9. System Generator for DSP Reference Guide, UG638 (V 12.2), July 23, 2010.(www.xilinx.com).