Abstract: The filter whose response to the impulse signal is bounded, popularly known to be the Finite Impulse Response Filter, (shortly an FIR filter) is an important component for designing an efficient digital signal processing system. In different DSP applications FIR filters are widely used. Many applications in digital communication uses the techniques for power reduction are developed particularly in, words processing involving adaptive sound cancelation, seismic signal handing out (noise cancelation), requires large order FIR filters for many other synthesis operations of signal. So, an FIR filter is being constructed, which is efficient in terms of power. Here, the new implementing approach was adopted by the adder and multiplier for designing of FIR filter. The multiplier used is Modified Wallace Multiplier, will reduce the number of partial products. The Carry skip (carry by pass) adder attenuates the unwanted addition and thereby lessening the dissipations of switching power. This paper presents an efficient implementation and analysis for performance evaluation of multiplier and adder to minimize the consumption of power during multiplication and addition by comparing with different adders and multipliers. The proposed design has compact power and improved performance when compared with the proposed FIR filter with comprises of conventional FIR filter. This work evaluates performance of FIR filter in terms of speed and power and synthesis are executed in Xilinx FPGA environment. The result analysis shows that the proposed FIR filter consumes low power than conventional FIR Filter.

Keywords: Finite Impulse Response Filter (FIR), Multiplier and Accumulator (MAC), Modified Wallace multiplier, Carry skip adder

I. INTRODUCTION
Finite impulse response (FIR) filters are highly used in most of the DSP applications. Many high sampling rate FIR filters are usually employed in different applications. But in some applications the filter circuits with sampling tares using low power are desired. Either increase the efficient speed or to reduce the power consumption of the original digital filter Parallel processing techniques are applied to FIR filters. From many years, parallel processing is applied to an FIR filter that uses the units of hardware with replications that exist in the original filter. The choice of the multiplier circuit also affects the resultant power- consumption. If the multiplier is choose wisely with less number of calculations, both speed and power can be optimized. Multipliers play an important part in digital signal processing systems. Multipliers consume considerable power and occupy large area and takes long latency. Therefore, multiplier with low power design is an important part in VLSI system design concerned with reduction in power. The main research of this work is optimization of multiplier design to produce more power-efficient solutions at low levels. Specifically, emphasis is laid on in optimizing the internal algorithm and architecture of multipliers and to control active multiplier resource. The objective of the work is reduced power, as desired. Using new architectures or algorithms, it is possible to optimize power, speed and area. This work in brief is arranged as follows. SectionII deals with the basic performance of the FIR filter. Section III deals with clocked regenerative comparators. Section IV represents the working of multiplier and accumulator (MAC) unit. Section V represents Modified-Wallace multiplier, section VI represents carry skip adder. In Section VII Project implementation methods and software design tools are explained and finally the paper is concluded in section VIII. Ruan,A.W., Liao,Y.B., Li,P., Li,J.X., (2009)[4] An ALU using universal FIR filter with different filters of finite impulse response running by soft ware in the ROM with concerned hard ware architecture. The proposal of Arithmetic Logic Unit (ALU) based common FIR filter suitable for implementation in FPGA is done in this paper. The proposed ALU design implements FIR filter functions which in turn is done by using the controlled execution of accumulators and shift-registers. Nekoei,F., Kavian,Y.S., Strobel,O., (2010)[5] presented understanding of digital FIR filters on field programmable gate array devices. Two common architectures called direct and transposed architectures were employed for implementing FIR filters on a Xilinx SPARTAN2-XC2S50-5I-tq144 FPGA using Verilog hardware description language codes. X.Jiang, Y.Bao(2010) proposed the configured structures of digital FIR filter which gives an efficient FIR filter design. It is based on FPGA, MATLAB FDA Tool to determine filter coefficients, is used in designing a constant coefficient FIR filter of 16th order using VHDL language of Quartus-2.

II. FIR FILTER
An FIR filter is extensively used in several application fields. It is a memory chip having a unit of generating address and a circularly accessing memory unit called the
modulo unit. A simple FIR filter is described by a conventional operation.

![Figure 1: Block Diagram of FIR Filter](image)

**Difference Equation:**

\[
Y(n) = \sum_{k=0}^{M-1} w_k x(n-k)
\]

Where \(w_k\) is the set of filter coefficients.

Digital FIR filters are implemented by using programmable DSP with MAC. FPGA technology can provide multiple MACs to achieve the desired throughput for high-bandwidth signal processing applications. FIR filter is a linear-invariant filter whose relation with respect to its output and input is always linear with constant coefficients. If \(x[n]\) be an input time series to an \(N\)th order FIR filter, then the output series is:

\[
Y(n) = x[n] * f[n] = \sum_{k=0}^{N-1} f[k] [n-k]
\]

The above expression unveils that, the output is a convolution sum of the input series the (filter function) impulse response of the filter is \(f[0] \neq 0\) through \(f[n-1] \neq 0\) are the filter’s \(N\) coefficients.

### III. CLOCKED REGENERATIVE COMPARATORS

In many applications like high-speed ADCs, clocked comparators with regenerative feedback with fast decision analysing capabilities are used. In research of various analysis of filters that have done recently in terms of noise, offset, and errors. This work presents the comparative comprehensive analysis of two filter structures. They are the conventional dynamic comparator and dynamic double tail comparator.

### IV. MAC UNIT

MAC unit means multiplication and accumulator unit and plays a vital role in different digital signal processing applications include converters, removing undesired components, inner products, some non-linear functions such as discrete cosine transform (DCT) or discrete wavelet transforms are the most of utilized digital signal processing methods, they are basically proficient by multiplication and addition applications. The execution DCT (discrete cosine transform) or the DWT (discrete wavelet transform). The reason is due to the enhanced proficiency of circular execution of arithmetic operations like multiplication and addition by improving the performance in terms of aggregate execution rate.

Hence, a multiplication and accumulate unit is independent in its operation, enables the filtering operation with greater speed in other DSP applications like optical communication as well as FFT by reducing overload on central processing unit. MAC unit consists of a multiplier unit along with an accumulator unit to sum the previous successive products. The input of a multiply and accumulate unit retrieved from the memory location and are fed to the multiplier block. The design constitutes a 16 bit multiply and accumulator unit with 32 bit carry skip adder and a register.

**OPERATION OF MAC UNIT:**

In DSP applications and multimedia information dispensation and various other applications, multiply and Accumulate unit plays a key role in different aforementioned applications like FFT processing of multimedia information. A MAC unit comprises of a multiplier, adder along with a register/accumulator hardware circuit performing operations from the previous successive products. A Wallace multiplier with modifications done, to operate on 16 bit data is presented in this paper. The digital signal processor uses MAC unit for its execution, whose inputs are of 16 bit, retrieved from memory locations. As soon as the input value is given, the multiplier initiates the computation from the 16 bit information and generates 32 bit information as output. This output is fed as an input to carry skip adder to perform addition operation.

The MAC operation in mathematical representation is given by the following equation:

\[
F = \sum P_i Q_i
\]

Actually carry by pass adder generates a 33 bit information of which 32 bit is the sum and the 1 bit is carry. This data is given to the accumulator as an input and is loaded in the accumulator register using PIPO. As the carry skip adder uses PIPO, it produces all the possible values simultaneously as output. The output of the accumulator register output is fed back or received as an input to the carry skip adder. The basic architecture of MAC unit is presented below.

![Figure 2: Basic Architecture of MAC unit](image)
V. CARRY SKIP ADDER

The carry-skip adder generates the output signals. The block diagram represents the signals and propagation of them. \(A\) and \(B\) be the inputs to the adder block. When all the signals given to input is equal to 1, the carry from one block is proceeded to the following block through all individual adder cells. The signals \(A\) and \(B\) are selected as such.

\[
\text{Carry Skip Adder Block Diagram}
\]

Figure 3: Block Diagram of Carry Skip Adder

When the propagating signals are Not equal to 1, then carry propagation is done via all the cells. The brief illustration is represented in figure 4. When \(P_0, P_1, ..., P_{31}\) are equal to 1, then without propagating through all the blocks, carry directly propagates to the output. Through a bypass route, to increase the speed of adder. In order to construct a carry skip adder with \(N\) number of bits, the structure is achieved by cascading \(N/M\) equal length stages, where \(M\) is the number of inputs to each stage. Thus, for a 32-bit bypass adder we have \(N = 32\) and \(M = 2\). The number of stages obtained is 16. The block diagram shown in Figure 4.

\[
\text{Carry Skip Adder Block Diagram (Multi Staged CSA)}
\]

Figure 4: Block diagram of multi staged CSA

Therefore, for an \(N\) input structure, with \(M\) be the number of inputs to each stage, the inputs are divided into an equal length of \(N/M\). The overall propagation delay of the adder block is expressed as:

\[
t_p = t_{\text{setup}} + M \cdot t_{\text{carry}} + (N/M - 1) \cdot t_{\text{bypass}} + M \cdot t_{\text{carry}} + t_{\text{sum}}
\]

where \(t_{\text{carry}}\) is the single bit propagation delay, \(t_{\text{bypass}}\) is the time taken when the carry passes through the bypass block, \(t_{\text{sum}}\) is the time taken in delivering the total sum out.

VI. MODIFIED WALLACE MULTIPLIER

A modified Wallace multiplier multiplying two integers, it is an efficient hardware implementation of digital circuit. In their reduction phase generally conventional Wallace multipliers uses many full adders and half adder. Partial products generation is not reduced by the half adders. By minimizing the half adder count in the multiplier will lesser the complexity. Hence, to establish the delay so the Wallace reduction same to that as a conventional delay, the reduction in half adders is improved by slightly increasing the number of full adders. Reduced complexity Wallace multiplier reduction contains three stages. During first stage, a matrix of \(N \times N\) order is formed and the product matrix is re-aligned in the shape of a pyramid before passing it to the next stage. During second phase, the matrix is grouped into a non-overlapping team of three as represented in the fig 5. The single bit and two bits are forwarded from the group to next stage as it is, and the full adder is given three bits. To calculate number of rows per each stage, the formula is

\[
r_{j+1} = 2 \cdot t_{\text{r/3}} + r_{j} \mod 3
\]

(5)

If \(r_{j} \mod 3 = 0\), then \(r_{j+1} = 2t_{\text{r/3}}\)

(6)

After calculating the value from the above formula, for number of rows per stage in the phase II and the row count per stage in phase II does not match, only then the half adder will be used. The result of 2\(^{nd}\) phase will be two bits higher and is passed onto the phase III. In phase III the output of phase III is fed to carry save adder to produce the final product.

\[
\text{Modified Wallace 10-bit by 10-bit Reduction}
\]

Figure 5: Modified Wallace 10-bit by 10-bit reduction

VII. RESULTS

The Analysis is done using the XILINX ISE tool to synthesize and simulate FIR filter. The program is written in Verilog HDL to optimize power and Delay and increase the speed of FIR filter using modified-wallace multiplier and carry skip adder. The Device utilization, power consumption and performance of proposed FIR filter and conventional FIR filter is shown in the table.

<table>
<thead>
<tr>
<th>s. no</th>
<th>FIR Filter</th>
<th>Dynamic Power(mw)</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Proposed FIR Filter</td>
<td>15</td>
</tr>
<tr>
<td>2</td>
<td>Conventional FIR filter</td>
<td>29</td>
</tr>
</tbody>
</table>

Table1: Dynamic power comparision of different FIR Filters
The resultant output screen shots of the 16 bit conventional FIR filter is as shown in the Fig.6. The corresponding output wave forms (zoomed) are given in the Fig.7. Relative change at the output for the corresponding input can be observed. Similarly the modified wallace carry skip is proposed. The corresponding results to support the methodology are given in the Fig 8 and Fig.8 The clear out put wave forms for the corresponding input are presented in the Fig 9. The performance of the proposed model can be inferred from these wave forms.

**Figure 6: Conventional FIR Filter**

**Figure 7: 16 bit Conventional FIR Filter waveforms**

**Figure 8: proposed FIR Filter**

**Figure 9: 16bit proposed FIR Filter wave forms**

**VIII. CONCLUSION**

Low power utilization is the most important criteria for the high performance DSP system. High feat system can be achieved by reducing the dynamic power which in tern reduces the total power dissipation. This paper implements a better performance FIR filter using low power adder and multiplier. In this work the two different FIR filters with their dynamic powers are analyzed. Carry skip adder and modified-wallace multiplier are consuming low power among all adder and multiplier circuits respectively. By using carry skip adder and modified-wallace multiplier FIR filter was implemented and analyzed. The performance curve of power dissipation by adders and multipliers was derived from analysis of different adders and multipliers. On the basis of power consumption results of proposed and existing FIR filters. The conclusion come out that projected FIR filter consume lesser power than existing FIR filter. So according to the result proposed FIR filter is the best for DSP system.

**REFERENCES**

[12] Jagadeesh, p. ravi, s. mallikarjun, kriti, harsh circuits, power and computing technologies[iccpt], 2013 international conference on digital object identifies.