The Power Consumption In The Interconnection Links

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Introduction

Now-a-days, the emergence of new technology in semiconductor industry has stipulated the designer to integrated large number of cores such as DSP processor, memory block and interface cards on a System-on-Chip (SoC) [2]. When the number of cores on a single chip is increased, the communication between these cores becomes a bottleneck and affects its performance. Network on Chip (NoC) is a paradigm provided to overcome the drawbacks in SoC communication architecture. NoC provides flawless integration of large number of cores on a single chip. In NoC, routers are placed to connect the cores and the routers communicate through interconnection links [1]. There are two main issues in the design of NoC. (i) Area and power consumption of the NoC [3]. (ii) Reliability of the NoC interconnection links [4-5]. The number of routers and interconnection links present in the NoC play an important role in the area and power consumption of the NoC. The choice of the topology decides the number of routers and interconnection links present in the NoC. Standard topologies like Mesh, Torus, Star and Ring topologies connect single core to a router. Therefore these standard topologies require more number of routers and interconnection links and hence, consume more area and power consumption. NoC designed for a specific application is called Application specific NoC (ASNoC). In ASNoC some knowledge of traffic pattern is available prior to the design of NoC. For these ASNoC, the standard topologies would result in poor performance and increases area and power consumption. Hence, it is desirable to design a custom topology based on the traffic pattern of the application. The custom topology consumes less area and power as it requires less number of routers and interconnection links [3]. With the shrinking geometry size and layout dimension on chip interconnect wires are placed closely. This increases crosstalk between adjacent wires. Crosstalk can cause timing violations which leads to logical errors. Crosstalk avoidance codes (CACs) are used to reduce the effect of crosstalk [6]. In addition to the crosstalk, on chip interconnection wires are exposed to other noise sources such as supply voltage fluctuation, electromagnetic interference (EMI), radiation, temperature variation. These noises affect the reliability of the on chip interconnection link [7]. In this paper, we propose an algorithm using genetic algorithm optimization to generate a custom topology for ASNoC to minimize the area and power consumption of the NoC. Secondly, a novel low complex error correction code is proposed to correct the errors in on chip interconnection links to improve its reliability. Together, the proposed algorithm and the error correction code generate the error resilient custom topology. The proposed error resilient custom topology is generated in two steps. In the first step the custom topology is generated using the proposed algorithm and in the second the error correction code is placed in the router. The rest of the paper is organized as follows: Section 2 presents related work for custom topology and error correction coding for on chip interconnection links. In section 3, generation of custom topology is proposed. We present the design of error correction code in section 4. In section 5 generation of error resilient custom topology is proposed. Results are discussed in section 6. Finally conclusion is presented in section 7.

Related work

In [10-11], the authors generate custom topology for application specific NoC using only 4 port routers. Therefore, they connect only 3 or 2 cores to a single router. This limitation forces them to use more number of routers. To overcome this drawback, in this paper we propose a custom topology generation algorithm using genetic algorithm that uses router with variable number of ports. The router used in the proposed work can be varied upto seven ports. Hence upto six cores can be connected to a single router (leaving one port for interconnection). This reduces the number of routers needed for a given application. The increase in the number of ports from 4 to 7 to a single router increases the power consumption of the router. Even though the power consumed by the single router is increased, the total power consumption will be reduced as the proposed custom topology uses less number of routers compared to other custom topology. Table 1 compares the total router power consumption of custom topology for the proposed algorithm for MPEG 4 decoder with the work proposed in [10-12].

Table 1 Router power consumption in the custom topology for MPEG 4 decoder.

7 port router

4 port

2 port router

Total Router Power in mW

The work proposed

in [10]

----

5

(0.5715)

----

2.8579

The work proposed

in [11]

----

5

(0.5715)

----

2.8579

The work proposed

in [20]

2

(0.9994)

----

3

(0.2869)

2.8596

Our proposed

algorithm

2

(0.9994)

----

----

1.9989

Generation of Custom Topology

The custom topology consumes less area and power consumption as the custom topology requires less number of resources like routers and interconnection links. In NoC, power is consumed mainly due to two components: (i) Interconnection links (ii) router [8].

Power consumption in the interconnection links

The power consumed on interconnection wires between two neighbor routers is given as in ( 1).

Pdyn = α C f V2dd. (1)

Where, C is the capacitance of interconnection wire, α is the average switching activity in the interconnection wires, f is the frequency, Vdd is the supply voltage [8]. The values of C, Vdd, and f depend on the process technology. But, the value of switching activity α depends on the data passed through the wires. Hence, power reduction is attained by minimizing the switching activity on the wires [8].

Design of the router

Router is another main component that consumes power in NoC. In NoC router, power is consumed on three components: (i) internal nodes like crossbar switch and arbiter (ii) buffers (iii) interconnect wires inside the switch. In our previous paper [9], the design of the five port router is proposed. The router consists of four parts: (i) Input port (ii) Arbitration unit (iii) Crossbar switch (iv) Output port. Figure 1 shows the basic five port router diagram. The five port router consists of five input ports called East, West, North, South, Local and five output ports East_output, West_output, North_output, South_output and Local_output. The input port consists of buffer and header decoder unit. The buffer stores the data temporarily and header decoder decodes the header flit to find out the destination address. When the flits arrive at the input port the input port stores the flits in the buffer. The packet size is 128 bits (16 flits each of 8 bit width). The input buffer is 8 flit width and we use wormhole switching to reduce the buffer size and latency. The header decoder unit decodes the head flit to find out the destination address. Based on the destination address, the input port sends the request to the output port. Each output port has arbiter unit. When more than one input port send the request to the same output port the arbiter unit presents in the output port gives access to only one input port using round robin algorithm. The input port which receives the access from the output port sends the flit to the output port through crossbar switch. The basic five port router is designed using verilog HDL and is implemented in TSMC 0.18 µm technology using cadence tool. We modified the router architecture by changing the number of ports in the router as 2, 3, 4, 5, 6, 7 and 12.

Figure 1 Five port router

The number of ports in the router is decided based on the topology. Area and power consumption of all the routers are measured using TSMC 0.18µm technology in cadence RTL encounter tool. Table 2 shows the area and power consumption of the routers. When the number of ports are increased in the router the area and power consumed by the router gets increased. This is because when the number of ports in a single router are increased, crossbar size is increased as it has to handle more number of interconnects. This increases the area and power consumption of the router.

Table 2 Area and power consumption of routers with different number of ports

Two port

Three port

Four port

Five port

Six port

Seven port

Twelve port

Area

(µm2)

14622

21707

28805

35660

43484

51555

63765

Power

(µW)

287

420

572

696

751

999

1219

Proposed custom topology generation algorithm

The aim of the proposed custom topology generation algorithm is to reduce the area, power consumption and hop count of the ASNoC. The average power consumption for transmitting one bit of data from core Ci to Cj [13] is given by,

P Ci , Cj bi t = n hops . Pr bi t + Pc b i t (n h o p s – 1) (2)

where,

Pr bi t is the router power consumption.

Pc b i t is the power consumption of communication link

n h o p s is the number of routers the bit travels from source core to destination core.

In the proposed algorithm, the clusters are formed by grouping the cores that have larger traffic between them. The formation of clusters creates shortest communication distance between the cores that have larger traffic. This reduces the number of hops the data bits take to travel from source core to destination core. This in turn reduces the power consumption as given in equation 2. We use one router for one cluster and the number of ports for a router is decided based on the number of cores in the cluster. The bandwidth BWr, of the router is assumed to be 2400MB/sec (assuming the router is operated at 300MHz and the port width is 8bit). The data travel from one input port to one output port is considered as one hop. In [14], the authors propose that the router performance can be increased by the use of multi local port router compared to single core connected to a router. However when the number of ports to a single router is increased, the router size will become larger which increases the power consumption largely. Hence to have a compromise between the router size and number of ports, we select seven ports to a router. Leaving one port for interconnection remaining 6 ports are used for connecting cores. Hence, we fix the cluster size as 6.

Proposed algorithm

The proposed algorithm for custom topology generation has three stages. They are: (i) Cluster Formation (ii) Cluster optimization using genetic algorithm & router core graph (RCG) generation (iii) Generating the custom topology.

Proposed Algorithm:

Input : Communication Task Graph (CTG), G(V,E)

Output : Custom topology.

Phase I (Formation of Clusters)

Step 1: Arrange the traffic between the cores (edge weights) in ascending order.

Fix the number of clusters p =. (cluster size is 6)

Step2: Assign Vmax= Maximum traffic value.

Vmin = Minimum traffic value.

Step 3 : Find upper limit traffic value for each cluster

for = 1 ; ≤ p ; ++

Assign the cores that have traffic value in the range ( [ -1] +1 :) to

cluster p to cluster 2 as shown below

for = p ; ≤ 2 ; --

cluster [] = [-1]+1 : [])

(iii) Assign the cores that have traffic value in the range ( Vmin : [1] )

for cluster 1 as shown Below. If the cores are already assigned in one cluster neglect it.

Cluster [1] = (Vmin : [1])

Phase II (Cluster optimization using genetic algorithm & router traffic graph (RTG) generation)

The router traffic graph (RCG) is a fully connected graph with p vertices, where p is the number of clusters. The edge weights (traffic) are set to zero.

Step 4: The initial clusters formed generate uneven traffic between the clusters. Hence, the clusters are optimized using Genetic Algorithm (GA) such that the following two conditions are met up.

The summation of the traffic between all the cores must be less than the bandwidth constrain BWr of the router.

Difference of the traffic between the clusters is minimized to avoid congestion in one router.

RTG is generated by choosing the routers with different number of ports based on the number of cores in the clusters. The number of routers is decided by the number of clusters. Lastly the routers are interconnected to build RTG.

Phase III (Final Topology Generation)

Step 5: The final custom topology is built by connecting each of the core in the cluster to each input port of the router in the RTG.

GA based optimization:

Genetic algorithm is applied in the second phase of the proposed algorithm to optimize the cluster. genetic algorithm requires representation of population for the application of genetic operators [15]. For the proposed topology generation algorithm, the population size is set as shown below

k= m ×p (3)

where,

m is the number of cores in the application

p is the number of cluster.

The population size k is divided equally between the clusters. The population in each cluster is characterized as strings of chromosomes and is represented as 1s and 0s. This is called as cluster array. In a cluster array a 1 in the location represents the presence of the core in that cluster and 0 absence of the core in that cluster. Crossover is applied between the cluster arrays as shown below.

Figure 2 Application of Crossover

In each cluster if the number of 1s are less than or equal to 6 then calculate fitness function and continue performing crossover till the conditions mentioned in step 4 of the algorithm are met. If the number of 1s are not less than or equal to 6, perform crossover without calculating fitness function till the two conditions mentioned in step 4 of the algorithm are met.

In the proposed custom topology generation algorithm, the data bits travel to the destination with less number of hops. This reduces the overall energy consumption as well as the number of routers used. This in turn reduces the area and power consumption of the overall topology design. The proposed custom topology generation algorithm is tested for MPEG 4 decoder and PIP application. The benchmark core graphs [17] for the MPEG 4 decoder and PIP applications are shown in figure 3. Figure 4 shows the custom topology generated by the proposed algorithm for the two benchmark applications MPEG 4 decoder and PIP.

5

1

2

6

33

8

7

4

128

128

64

64

64

64

64

64

64

64

64

60

190

0.5

910

500

250

173

40

32

670

600

0.5

40

1

2

3

4

5

9

10

11

12

7

6

8

(b)

Fig. 3. CTG for (a) MPEG 4 video decoder (b) PIP. The Edge values are given Mb/s.

Router 2

Router 1

1

5

6

4

9

8

11

3

2

10

7

12

3

4

6

5

7

Router 1

8

1

2

Router 2

(b)

Fig.4 Custom topology generated by the proposed algorithm. for (a) MPEG 4 decoder (b) PIP

Design of error correction coding

In this section we propose the design of encoder and decoder to improve the reliability of on chip interconnection link. As the technology scales down and shrinking geometry size on chip interconnection wires are placed closely. This increases the crosstalk between adjacent wires which will cause timing violations and logical errors. CACs are used to reduce the effect of crosstalk and to correct the errors in on chip interconnection wires. Hence, incorporation of CAC codes in NoC router increases the reliability of the on chip interconnects. In this paper we propose crosstalk avoidance enhanced double error correction code (CAEDEC) which corrects all the double errors and detects burst errors of three. In CAEDEC code a novel Triplicated Add Parity (TAP) code is used as CAC code. TAP scheme increases the error correction capability of the proposed CAEDEC code compared to other crosstalk avoiding error correction codes. The proposed CAEDEC code corrects all the one bit, two bit error patterns. The code also corrects three bit error patterns except for four error patterns as mentioned in Table 1.

Design of CAEDEC encoder

The proposed CAEDEC code uses standard triplication error correction scheme [16] and a parity bit calculated from the triplicated bit is added (TAP) to enhance the error correction capability. Figure 3 shows the 32 bit encoder for the proposed CAEDEC code. As shown in figure 3, the message bits are triplicated and a parity bit calculated from the triplicated message bit is added before transmission. Triplication of the message bit avoids the crosstalk by avoiding the transition of a bit in adjacent wire in opposite direction. This avoids the worst case transition of a bit pattern 101  010 and vice versa. The added parity bit enhances the error correction capability.

Figure 5. 32 bit CAEDEC encoder

Design of CAEDEC decoder

The decoder for the proposed CAEDEC code is shown in figure 6(a). The proposed CAEDEC decoder has low complexity as it has only simple xor gates in the design. The group separator shown in figure 6(b) divides the received data bits into three groups as first_group, second_group and third_group. The parities for the three groups are calculated and are denoted as p_1, p_2 and p_3 respectively. The parity added at the encoder is separated and is denoted as p_encoder. Table 1 shows the possible distribution of one bit, two bit and three bit errors among the three groups. If burst error of three occurs in interconnection link, they are separated as single bit error distributed in three groups as shown in column 16 in Table 1. When all the two bit errors occur in any one of the three groups as shown in column 7, 8 and 9 in Table 1, the parities of these groups will not expose the occurrence of two bit errors (as even number of errors cannot be found using parity computation). For this case, the received group which has error will not be same as the group that is error free. Hence, the received data (32 bit data) from three groups are compared to select the error free group.

Table 1 Possible Distribution of one bit, two bit and three bit errors among three groups.

Column No

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

One bit Error

Two bit Errors

Three bit Errors

Error Correction

Error Correction

Error Correction

Uncorrectable Error

Group A

1

0

0

0

1

1

2

0

0

0

0

1

0

0

3

1

2

2

1

Group

B

0

1

0

1

0

1

0

2

0

1

2

0

3

0

0

1

1

0

2

Group C

0

0

1

1

1

0

0

0

2

2

1

2

0

3

0

1

0

1

0

Figure 6. (a) 32bit CAEDEC Decoder (b) group separator

4.2.1 Decoding Algorithm

The decoding procedure of the proposed CAEDEC code is explained with the help of Table 1 and flow diagram shown in Figure 7. The decoding procedure is explained as follows:

The received bits are grouped into three as first_group (A), second_group (B) and third_group (C).

The parities of the three groups are computed separately and are indicated as p_1, p_2, p_3 and the sent parity is indicated as p_encoder.

Change in the calculated parity bit reveals the occurrence of single error in that group.

For the occurrence of two bit errors in any group, the computed parity will not reveal the errors (as even number of errors cannot be found using parity computation). Hence, two bit errors are identified by comparing the received data as the received data will not be equal when errors present in it.

As shown in Table 1, for the columns 1, 4, 12 and 15 p_1 ≠ p_2 & p_2 = p_3. For this condition, to select the error free group, p_1 is compared with the sent parity p_encoder (Is p_1 = p_encoder?). If p_1 = p_encoder, then first_group is selected as error free output. Otherwise second_group is selected as error free output.

For the column 3, 6, 11 and 14 in Table 1, p_1 = p_2 & p_2 ≠ p_3. To select the error free group, p_1 is compared with the sent parity p_encoder (Is p_1 = p_encoder?). If p_1 = p_encoder, then first_group is selected as error free output. Otherwise second_group is selected as error free output.

In Table 1 for the column 2, 5, 10 and 13 p_1 ≠ p_2 & p_2 ≠ p_3. For this condition, to choose the error free group, p_1 is compared with the sent parity p_encoder (Is p_1 = p_encoder?). If p_1 = p_encoder, then first_group is selected as error free output. Otherwise second_group is selected as error free output.

For the columns 7, 8, 9 and 16, p_1 = p_2 & p_2 = p_3. For this condition, there are two possibilities: (i). All the three groups may have single bit errors as shown in column 16. (ii) Two bit errors may have occurred in any one of the three groups as shown in column 7, 8 and 9. To detect the burst error of three, p1 is compared with p0. If p_1 ≠ p_2, it reveals the occurrence of burst error of three. Otherwise two bit errors might have occurred in any one of the three groups. For this condition, error free output is selected by comparing the received data as shown in Figure 6(a) and Figure 7.

Start

Receive the bits

Group into three groups

first_group, second_group, third_group

Compute the Parities

p_1, p_2, p_3 &

p_encoder

Select first_group (A)

Select second_group (B)

Select third_group (C)

Is

p_1 = p_2 & p_2 ≠ p_3?

Is

p_1= p_0?

Is

p_1 ≠ p_2 &

p_2 = p_3?

Is

p_1 = p_0?

Is

p_1 ≠ p_2 &

p_2 ≠ p_3?

Is

p_1 = p_0?

Burst error of three

Is

A_eql_B

?

Is

B_eql_C

?

Y

N

Y

Y

Y

Y

Y

Y

Y

N

N

N

N

N

N

N

Figure 7. Flow diagram of CAEDEC Decoding Scheme

Generation of error resilient custom topology

In this section generation of the error resilient custom topology is proposed. To generate error resilient custom topology first the custom topology is generated by applying the proposed topology generation algorithm. Then the proposed CAEDEC error control codec is placed in the router to increase the reliability of the on chip interconnection link. In NoC there are two different types of interconnection links: (i) Global interconnection links (ii) Semi global links [8]. The global interconnection links are the links that connects the two routers. The semi global links connect the core to the router. The global links are mainly affected by the crosstalk effect and other noise sources. Hence, the CAEDEC encoder is placed in the output port of the router through which data travels over the global interconnection link. The CAEDEC decoder is placed in the input port of the router through which data is received from the global interconnection links. The error resilient custom topology for the two benchmark applications MPEG 4 decoder and PIP are shown in figure 8.

Figure 8 error resilient custom topology

Results and Discussions

In this section, the results for the custom topology, error correction code CAEDEC and the error resilient custom topology are presented.

Custom topology

The proposed custom topology generation algorithm is applied to two different benchmark video applications MPEG 4 decoder (12 cores) and Picture-in-Picture (8 cores). We have applied the proposed algorithm to two different benchmark video applications MPEG 4 decoder (12 cores) and Picture-in-Picture (8 cores). For comparison, we have also generated standard topologies like Mesh, Star, Ring and Binary tree for the two benchmark video applications. We generated the standard topologies by connecting each core to a single router and the number of ports in the router is varied from two ports to maximum of twelve ports depending on the topologies. For mesh topology single core is connected to a router, for Star topology all the cores are connected to a single router and for Binary tree topology 2 cores to 4 cores are connected to a single router. We have implemented the custom topology generated by the proposed topology generation algorithm for MPEG 4 application and PIP application on TSMC 0.18μm technology from TSMC library using Cadence tool. We used Cadence RTL compiler tool for synthesize. We applied a synthetic traffic generator that resembles the application’s traffic nature. The area, power consumption, average number of hop counts and number of global links for the standard topologies and the custom topology generated by the proposed low power topology generation algorithm for the two benchmark applications are shown in fig. 9. The topology generated by the proposed algorithm results in1.68× improvement in area and 1.69× improvement in power consumption for MPEG 4 decoder compared to the hybrid topology generated by Hytham Elmiligi [18]. The topology generated by the proposed algorithm also results in 1.74× improvement in area, 1.61× improvement in power consumption and 2.6× improvement in hop count for the bench mark application MPEG 4 decoder compared to the Mesh topology which consumes less area and power consumption amid standard topologies. For the bench mark application PIP, the proposed algorithm results in 1.48× improvement in area, 1.24× improvement in power consumption and 4.5× improvement in hop count compared to the Mesh topology.

(b)

(d)

Figure 9. Comparison of standard topologies and custom topology for ( a ) Area ( b) Power

(c) Number of global links (d) Average hop count

Error correction code CAEDEC

The proposed CAEDEC error correction code corrects all one bit errors, two bit errors and three bit errors except for 4 error pattern as given in Table 1. We have implemented the proposed encoder and decoder in verilog HDL and implemented in cadence RTL compiler tool using TSMC 0.18µm technology. Table 2 compares the area and power consumption of the coder and decoder for the proposed CAEDEC code and different error correction codes. Table 2 also compares error correction capability for different error correction codes, number of interconnection wires needed and the crosstalk avoidance property. The proposed CAEDEC code has less area and less power consumption compared to other error correction codes except DAP and SCG codes which have low error correction capability compared to the proposed code.

Table 2. Silicon area and power consumption for different error correction schemes in TSMC 0.18 µm

Codec Scheme

Error correction capability

Crosstalk Avoidance

Codec

Area in µm2

Power in µW

DAP

single error

Duplication

341

16.22

Hamming

single error

No

1515

49.30

Hamming + Interleaving

single error & burst error of two

No

1810

33.66

CADEC

Corrects all the one bit & two bit error patterns

Duplication

685

26.77

SCG coding

Single error

Triplication

288

12.86

Proposed CAEDEC code

Corrects all the one bit & two bit error patterns and majority of the three bit error patterns

Triplication

649

18.23

Residual flit error rate of the proposed code:

Residual flit is the number of flits that are left uncorrected at the decoder. The proposed CAEDEC code corrects more number of errors compared to other error correction codes DAP, SCG, Hamming code, Hamming + Interleaving and CADEC code. Hence, proposed CAEDEC code has low residual flit error rate compared to these codes. Figure 10 shows the probability of residual flit error rate for different error control schemes. The residual flit error rate increases with noise voltage deviation. The proposed error correction code low residual flit error rate compared to other error correction codes.

Figure 10. Probability of residual flit error rate against noise voltage deviation for different error control codes

Error resilient custom topology

To generate the error resilient custom topology, we have placed the proposed CAEDEC encoder in the output port of the router through which data bits travel over the global interconnection links. The decoder is placed in the input port of the router through which data bits are received over global interconnection links.

Table



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now