Highly Efficient Routing Mechanism For Errorprone Systems

Published Date: 02 Nov 2017

For the last few years microprocessor manufacturers follow the trend of producing CPUs made of multiple cores in the same die. This trend today is available on a big variety of machines like Desktop PCs, Servers, mobile devices and embedded systems. As a result Network-on-Chip (NoCs) have introduced to handle the communication between large amount of processor elements (PE) offering better scalability and flexibility (Dally & Towles, 2004; Duato, Yalamanchili, & Ni, 2003). This new microprocessor architecture is forcing transistors scaling into nanoscale regime which led to higher number of faulty transistors per die. Higher number of transistor failure also translated into increasing on-chip faulty components (Aisopos, DeOrio, Li-Shiuan Peh, & Bertacco, 2011). From the network-on-chip prospective this faulty components can be ether links between routers or router components, leading to lower network performance, lower throughput, higher congestion and even worst, in case of non fault-tolerance routing algorithm the whole system may fail.

Designing a routing algorithm under this facts can become very challenging. Few necessary conditions that routing algorithm need to satisfy is deadlock and livelock freedom, that's no cycle dependencies inside given routing paths and adaptiveness in the presence of new faults that can change network topology. Also the algorithm must be distributed to avoid single point of failure and global information exchange between nodes for current network state. Preferably it must be also able to balance network traffic, find shortest paths between source destination nodes and handle a reasonable number of faults (Vitkovskiy, Soteriou, & Nicopoulos, 2012).

Research Scope and Objectives

In this thesis the research scope is to design a routing algorithm to satisfy most of the necessary conditions emphasizing on fault-tolerance, performance, reliability and low area overhead. Furthermore load balancing techniques, decentralize algorithm architecture and network partitioning are taken into consideration. The propose algorithm is based on current knowledge, combining advantages of different kind of routing algorithms to archive its goals. The algorithm is targeting on mesh topology NoC's since they are gaining popularity over other network topologies. By targeting specific network topology further customizations can be made to optimize the algorithm performance and reduce the overall area overhead.

Thesis Outline

Since this thesis is targeting on routing algorithm on network-on-chip, in chapter 2 we are discussing some aspects and issues concerning NoCs like network topologies, switching techniques, router architecture and routing algorithms, referencing to their pros and cons. In chapter 3 we are introducing HERMES a new fault-tolerance routing algorithm in two variations HERMES-XY and HERMES-01TURN. Chapter 4 discuss the base simulator environment and setup. Additional custom developed simulator plug-ins are also demonstrated. Analytical results presentation can be found in chapter 5. Conclusion and future work presented in chapter 6.

Background on Network On Chip (NoC)

In a multi-core system, NoC is the communication layer that provide connectivity between system cores. Main components are network interfaces, routers and links. The combination of network topology, switching technique and routing algorithm is also an essential decision that will affect the overall system stability and performance.

Network Topologies

The wiring shape layout that connect the routers inside the network is called network topology. Many NoC topologies have been proposed until now. Some examples are mesh-based, tree-based, ring-based and other irregular topologies (Cormen, 2001; Duato, Yalamanchili, & Ni, 2003; Patterson & Hennessy, 1993).

Meshes

In mesh topology each router has 4 bidirectional links East, West, North and South, each one consist of two unidirectional physical ports connected to a neighbor node. One link, Local Port connected directly to the core. At the edge of the network routers have less links according to their position. The orthogonal shape of mess topology is offering equal link length between router which equalize propagation delay evenly to all routing directions. This topology is providing better application scalability and is widely used and preferable. Simple example is presented in figure 1.

Figure : Example of 4-by-4 mesh with basic NoC components

Routing algorithms based on mesh topology can be kept simple due to mesh orthogonal shape, reducing complexity and area cost. This lower area cost and complexity can be use to increase algorithm reliability and fault-tolerance. Based on this properties mesh is also preferable for HERMES implementation.

Tori

Torus is also mesh-based topology. The only different from regular mesh is at the edge node switches which are connected with wrap-around links to the opposite site switches.

img547.gif

Figure : Example of 2D-Torus

This topology is avoiding the central hotspot area that mesh topology produce but the wrap-around links are significantly longer than the other links increasing packet transmission delay. This constrain can be overcome by folding the torus reducing propagation delay to fit within a single cycle since the wrap-around links length become shorter.

Tree-based Topologies

In a tree-base topology all PEs are placed at the leaves and routers placed as vertices. At each level a router has one parent router and two child routers except the root router. This kind of topology is preferable for packet routing algorithms keeping their complexity simple by using the direct routing method.

tree-topology.png

Figure : Tree-based Topologies

Quad-tree and fat-tree is two other tree-base variation topologies. Fat-tree topology utilize more links at the center of the topology to avoid the bottleneck throughput constrain that the other two variations have.

Switching Techniques

Switching Techniques determine when and how a packet should be routed in the network. Switching defines the transport of data and routing determines the path of data in the network. Mainly we have three classes, called store-and-forward (SAF), virtual cut-through (VCT) and wormhole switching.

Store and Forward

In store-and-forward switching (Duato, Yalamanchili, & Ni, 2003) the entire packet must delivered to next hop switch before it can be forwarded to the next one. This technique demands big switching buffers to fit the largest possible packet in the network.

Figure : Store-and-forward data flow diagram

An example of store-and-forward packet flow is presented in figure 4. Next hop start sending the packet after the whole message arrives. SAF eliminates dependencies between source and destination paths but the packet latency is higher than other techniques. The latency of a packet varies according to distance between source and destination. The longer the distance the greater is the latency.

Circuit Switching

The circuit switching is a connection oriented protocol for data transfer. Initially the head flit is forwarded throw the network and allocate resources on each switch towards destination. The destination switch replies to source switch and then data can be forwarded to destination in a pipeline manner. When data transmission end a control message follows to release all resources in the path between the two switches. Figure 5 shows an example of a circuit switching data exchange steps. At the beginning head flit request connection (R). The destination accept the connection (A) and data flow follows (D). Connection close with termination message (T).

Figure : Circuit switching data flow diagram

This technique is advantageous when the packets are long and infrequent due to set up time overhead. From the other hand establishing dedicated connection between source and destination and reserve resources can block other messages causing unnecessary delays to other packets.

Virtual Cut-Through Switching

The head flit of a packet contains the addressing information's. As soon as the head flit arrives virtual cut-through switch can begin forwarding the packet to next hop without waiting for all packet to arrive. Buffer size is still the same as store-and-forward since the availability of resources varies due to traffic bandwidth and packet may blocked.

Figure : Virtual cut-through switching data flow

In figure 6 (a) we have a typical virtual cut-through switching data flow were no packet block is present. In figure 6 (b) the second packet is blocked for 3 clock cycles before it continue toward destination.

Wormhole Switching

Initially a packet splits into small equal size bit groups, the flit (flow control unit), creating the head, body and tail flits of a packet. Same as virtual cut-through flits are forwarded as soon as routing information inside head flit arrives and recourses to next hop switch are available. After the head flit is forwarded the rest of the flits are following the same path as the head flit without requesting recourses from next hop switch. This pipeline behavior reduce buffers size since there is no need to store the whole packet in the same switch and also packet latency is less compare to the other switching techniques. Considering buffers high cost and low packet latency wormhole switching is employed in many high performance NoC architectures including HERMES.

wormhole.png

Figure : Wormhole switching head flit forwarding

From the other hand wormhole switching is susceptible to deadlocks since packets can hold as many resources as their flits are, while requesting for more to be able to move forward (Duato, 1994; Duato, 1997). This issue must be considered from the architecture point of view by adding additional virtual channels to increase the number of queue on each port. By doing that if one packet is blocked then other one from separate queue can be advanced towards its destination. An example of head flit moving towards destination displayed in figure 7.

Table 1 shows a brief summary of switching techniques.

Table : Summary of switching techniques

Switching

technique

Communication

entity

Buffer size

Path reservation

Resources utilization

Store-and-forward

Packet

Large

Good

Circuit switching

Flit

Small

Yes

Poor

Virtual cut-through

Packet

Large

Good

Wormhole

Flit

Small

Yes

Moderate

Routing Algorithms

A routing algorithm determines a path for a specific packet on how to reach its destination. Additionally it must be fast and reliable without increasing its complexity and area cost. Generally the selection of a routing algorithm depend on network topology. This section will present the basic concept and background about routing algorithms.

Deterministic Routing Algorithms

In deterministic routing algorithms the computed paths between source and destination will always be the same as they do not take into account network conditions. Deterministic algorithms are profitable and progressive meaning that the header always move forward reserving new channel bringing the packet closer to the destination. For example XY routing algorithm (Duato, Yalamanchili, & Ni, 2003) is deterministic since for every pair of source-destination there is a path that will never change. This kind of routing algorithms are not suitable for error-prone networks since they are unable to handle any network faults.

Adaptive Routing Algorithms

A routing algorithm is denoted as adaptive if the rooting path between source and destination changes according to the network state. In an adaptive routing algorithm the routing path is decided in each hop. The routing decision may take into consideration the link congestion of the neighbor nodes, the links state (faulty or not) of the neighbor nodes or even worst a local faulty link that cannot be used. This adaptiveness has resulted that node implementation become more complex but this is the tradeoff to achieve load balancing and fault tolerance. RCA (Gratz, Grot, & Keckler, 2008) is one example of adaptive routing algorithm offering load balancing based on global network traffic state.

Minimal Adaptive Routing

Minimal adaptive algorithms are able to find the shortest available path to reach destination by applying a level of restrictions to avoid deadlocks in the presents of faults. Using minimal path routing, livelocks avoidance can be achieved. In combination with wormhole switching, performance also increases due to lower resources consumption (Duato, Yalamanchili, & Ni, 2003).

Deadlock, Livelock and Starvation

The main challenges in the design of routing algorithm targeting NoC is to avoid deadlock and livelock. Deadlock occurs as the number of resources in the network are finite. Packets during routing hold resources while requesting others towards destination. The combination of more than one packet path can lead to cycle dependencies between available resources (figure 8).

deadlockfree.gif

Figure : Deadlock due to cycle dependencies

Other reasons that may lead to deadlock is the presents of faults. Faults happens during chip fabrication or during their lifetime. Routing algorithms must be able handle this faults and extend the lifetime of the chip by avoiding them.

Livelocks occurred only if non minimal paths are allowed during routing decision. A packet may never be able to reach its destination because resources around it are occupied by other packets, leading it to travel around destination consuming network resources without making any progress. Example of a routing algorithm that leads to livelock is the Hot Potato routing algorithm. Instead of blocking the packet, when the desire channel is not available is forwarding it to any available one. Those alternative channels may misroute a packet around its destination producing livelock.

Starvation is happening when a packet request a resource but always is assign to other packets. As a result the packet stops permanently in traffic without being able to get the resource it needs. This is happening if no resources assignment scheme exist in case of conflicts. A general classification of the situations that may prevent packet delivery presented in figure 9 (Duato, Yalamanchili, & Ni, 2003).

Duato1.png

Figure : Situations that may prevent packet delivery

Turn Model

Routing algorithms based on Turn Model are trying to avoid cyclic dependencies by restricting specific turns. Clock-wise and counter clock-wise turn directions can be used to design of a deadlock free routing algorithm. An example algorithm is proposed in (Vitkovskiy, Soteriou, & Nicopoulos, 2012).

Other examples are X-First, West-First, Negative-First and North-Last. Each one of them provide deadlock freedom by restricting specific turns. A brief example of turn restrictons is presended in figure 10. Solid lines are the available turns and dashed lines are the restricted.

turnModels.png

Figure : Turn restrictions for deadlock free algorithms

The restricted and allowed turn in a clock-wise and counter clock-wise turn direction must be selected correctly to guaranty the deadlock freedom of the algorithm. Incorrect restriction choice is shown in figure 11. By restricting one turn in each cycle is not eliminating the deadlock. The combination of the two can still produce cyclic dependency leading to deadlock-prone algorithm.

TurnDeadlock.png

Figure : Turn restrictions that produce deadlock

Virtual Channels

Virtual channels are used to provide additional queues inside input channel buffers by dividing them into different virtual buffers queues. Each of the queues is allocated to a certain packet and store only flits of that same packet. All queues share the same physical channel which receives their flits alternately. If a packet is blocked then the other one can utilize the channel. It is like adding additional lanes to a street. From the other hand if no virtual channels exist, a message can reserve that single buffer until it can be forwarded. If the desire output port is not available then the packet remain blocked. Since no other input buffer queue is available other packets cannot be forwarded through the remaining output ports. Virtual channels can be used to avoid deadlocks, improve network throughput and message latency. In figure 12 we graphically present how virtual channels partition the link input buffer into two virtual channel buffers. If the desire output port of VC0 is blocked then the message in VC1 can be forwarded.

Figure : Channel Buffer partition in virtual channels

Using virtual channels increases design complexity and area overhead since additional hardware is required for virtual channels multiplexing and arbitration. The number of virtual channel that a designer can use depends on a trade-off evaluation between his goals and the additional area overhead on the network. Figure 13 demonstrate the connection between two routers with virtual channels.

VCRouter.png

Figure : Connection between two routers with 2 VCs at each port.

HERMES

In this chapter, the new routing algorithm is described in details. The proposed algorithm is based on ARIADNE (Aisopos, DeOrio, Li-Shiuan Peh, & Bertacco, 2011) a general fault-tolerance algorithm for irregular topologies that can handle a reasonable number of faults within a low area overhead.

Basic Concept

Designing a general fault-tolerance algorithm additional complexity and area overhead required to handle that generality. Additionally some performance issues arise since the algorithm have to handle deadlock freedom in all these variety of topologies. ARIADNE's main idea is to utilize up*/down* algorithm (Flich, Malumbres, Lopez, & Duato, 2000; Puente, Gregorio, Vallejo, & Beivide, 2004; Schroeder et al., 1991) to discover new paths in the presents of fault and update nodes routing tables. Then all routing decisions are made based on the updated routing tables. The main drawback of this algorithm is that after the first fault detection, up*/down* algorithm restrict a number of turns to guaranty deadlock freedom to the remaining valid links, which leads in some cases to non-minimal paths. Since routing decision is based only in routing table after the first fault detection the network performance is affected dramatically.

The main idea is to remove ARIADNE's generality and target mesh-like topologies since they are more preferable. Removing that generality and replacing that chip area with virtual channels, performance issues can be improved. Targeting specific topology in combination with virtual channels the new algorithm can combine dual routing strategies in the presents of faults, improving networks overall performance. The proposed algorithm is also able to detect network partitioning in cases where the faults placement forms sub-networks.

Reconfiguration Algorithm

In this section the reconfiguration algorithm's steps are going to be presented in details. The algorithm's main idea is based on up*/down* marking algorithm. The algorithm begins upon new fault detection. The first node that detects the fault become the root node and broadcast 1-bit flag (if the link to next hop is healthy) or 2-bit flag (if link to next hop is faulty) through the overlay 2-bit network to inform its neighbor for the new fault presents. Upon flag reception each node perform the following steps (figure 14):

reconfflow_v2Costas.emf

Figure : HERMES Reconfiguration algorithm steps

Flag reception

Sub-Network Detector

Recovering State

Tagging Link Directions

Update Routing Tables

Flag Forward

Deadlock freedom

Duato Dependency Graphs

Deadlock free explanation based on Virtual Channels

Hardware Modifications

Overlay network

Atomic Broadcasting

Sub-Network Detection Mechanism

Walkthrough Examples

Normal Example

Sub-Network Example

Applications and Simulations Presentation

Simulator

POPNET and Linux OS environment

GUI Plug-in (extension). POPNet customizations

Support Scripts and Tools

Automation Result Processing Scripts

Results Presentation

Network Traffic Models

Uniform Traffic

Transpose Traffic

NetTraces - PARSEC (Applications Dependency)

Classic Plots

Graphs and comments

NetTraces

Graphs and comments

Saturation Graph

Graphs and comments

Zero Load Graph

Graphs and comments

Dynamic Faults

Graphs and comments

Conclusions and Future Work

UP/DOWN Load Balancing - Path Selection Improvements

Round-Robin

Random Path Selection

Self learning capabilities for minimal paths establishment

Minimize misrouting hops

REFERENCES

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now

Highly Efficient Routing Mechanism For Errorprone Systems

Research Scope and Objectives

Thesis Outline

Background on Network On Chip (NoC)

Network Topologies

Meshes

Tori

Tree-based Topologies

Switching Techniques

Store and Forward

Circuit Switching

Virtual Cut-Through Switching

Wormhole Switching

Switching

technique

Communication

entity

Buffer size

Path reservation

Resources utilization

Routing Algorithms

Deterministic Routing Algorithms

Adaptive Routing Algorithms

Minimal Adaptive Routing

Deadlock, Livelock and Starvation

Turn Model

Virtual Channels

HERMES

Basic Concept

Reconfiguration Algorithm

Flag reception

Sub-Network Detector

Recovering State

Tagging Link Directions

Update Routing Tables

Flag Forward

Deadlock freedom

Duato Dependency Graphs

Deadlock free explanation based on Virtual Channels

Hardware Modifications

Overlay network

Atomic Broadcasting

Sub-Network Detection Mechanism

Walkthrough Examples

Normal Example

Sub-Network Example

Applications and Simulations Presentation

Simulator

POPNET and Linux OS environment

GUI Plug-in (extension). POPNet customizations

Support Scripts and Tools

Automation Result Processing Scripts

Results Presentation

Network Traffic Models

Uniform Traffic

Transpose Traffic

NetTraces - PARSEC (Applications Dependency)

Classic Plots

Graphs and comments

NetTraces

Graphs and comments

Saturation Graph

Graphs and comments

Zero Load Graph

Graphs and comments

Dynamic Faults

Graphs and comments

Conclusions and Future Work

UP*/DOWN* Load Balancing - Path Selection Improvements

Round-Robin

Random Path Selection

Self learning capabilities for minimal paths establishment

Minimize misrouting hops

Our Service Portfolio

Want To Place An Order Quickly?

Do not panic, you are at the right place

Get 20% Discount, Now £19 £14/ Per Page14 days delivery time

Get An Instant Quote

UP/DOWN Load Balancing - Path Selection Improvements

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time