A Routing Algorithm In Interconnected Networks

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

The rapid increase in scale of silicon technology has made the massive transistor integration densities possible. As the feature size is constantly decreasing and integration density is increasing, interconnections have become a dominating factor in determining the overall quality of a chip. Due to the limited scalability of system bus, it cannot meet the requirements of current System-on-Chip (SoC) implementations, where only a limited number of functional units can be supported. Long global wires also cause many design problems, such as routing congestion, noise coupling, and difficult timing closure. Network-on-Chip (NoC) architectures have been proposed to be an alternative way to solve the above problems by using a packet-based communication network. The processing elements (PEs) communicate with each other by exchanging messages over the network and these messages go through buffers in each router.

However, a single link failure can prevent or pause the communication procedure, which can render the entire Chip Multiprocessor (CMP) useless. In this dissertation, a routing algorithm capable of handling large numbers of link failures that can occur either at manufacture-time or at run-time (dynamically) is presented. As opposed to marking the entire CMP as faulty, whenever some links fail, the proposed algorithm routes the packets around the faulty link until a new healthy link is available towards destination. This enables the continued transfer of information – at a degraded mode – but maintains the network connectivity.

The proposed algorithm uses distributed graph tables, which are held by each PE in the NoC, in order to determine the best choice in routing the packet. Every PE has a graph table for each of its healthy links. Each of those tables, have as many entries as the links of the PE it leads to. Every entry holds the status of the link, as well as the link’s source and destination. Given this information, routing is made possible by manipulating either the coordinates of packet’s destination or packet’s destination distance, and next available PE’s number of faulty links, for each link available from the current PE to the next ones. Graph tables that hold links’ status of the following node, gives the foreknowledge of a deadlock routing in case of choosing a next PE without any healthy links available, and thus the opportunity to avoide it.

TABLE OF CONTENTS

LIST OF TABLES

LIST OF FIGURES

ABBREVIATIONS

KEYWORDS

Introduction

System-on-Chip

According to Moore (1965), the exponential growth in the transistor count of a chip doubles every eighteen months. Thus a single chip will be able to integrade more than four billions of transistors, by the end of the decade. Due to this huge capacity of the billion-transistor chip, it can take on very complex functionalities with interconnected microprocessor-sized computing resources. Those resources can be programmable like CPUs or even passive like memories.

However, exploiting the capacity offered by this technology, is proven to be challenging. As technology advances further, SoCs have become too costly due to their poor scalability, low and unpredicatble performance, as well as high power consuption. As a result, platforms containing multiple processing elements (PE) began to emerge. The most important part of such multiprocessor SoCs was the communication infrastructure.

Rising problems

Although such multiprocessor SoCs had increased performance, problems started to rise. Buses could not provide the needed level of interconnect efficiency if the system exceeded a number of components, and as a result performance could be observed to significantly degrade. In addition, even fewer numbers of elements could be connected efficiently by point to point connections. In addition, point to point connections could connect even fewer numbers of elements efficiently. Thus, communication became the Achilles hilt in SoC design, and today chip design became communication centric rather than computation centric.

Networks-on-Chip

In 2002, Benini and Micheli proposed the Network-on-Chip (NoC) paradigm in order to eliminate bus and point to point limitations on the connection based communication infrastructures. The idea of NoC approach is to use packet-switched technique for communication between functional processing elements. The ideas of packet-switched communication and switch-based networks in general, were borrowed from communication networks, which demonstrated sustainable scalability, reliability, as well as improving performance over the years. Moreover, using NoCs made sharing wiring resources between a number of communication flows possible, which provided better wire utilization. In addition, using NoCs instead of buses, allowed multiple concurrent communications, since NoCs have a higher bandwidth than buses.

Challenges ahead

NoC architectures come with important features such as network topologies and routing algorithms. However, buffer size and area overhead should also be considered in order to improve network performance. Increasing buffer size leads to increasing area overhead. Furthermore, buffers consume the most power of all elements placed in on-chip routers and thus their usage must be as limited as possible. On-chip routing algorithms and switching techniques should be chosen as well, in order to limit the huge buffer usage.

Networks-on-Chip Metrics

As Duato, Yalamanchili and Ni (2002) stated, the main performance metrics in evaluating NoC designs and routing algorithms are throughput and average packet delay. Especially for the performance evaluation of routing algorithms, load distributions are to be considered.

Motivation

Background on Networks on Chip (NoCs)

As with all types of networks, on-chip networks share the same characteristics in topology, switching, routing, as well as flow control. In addition, NoCs have to provide high and predictable performance with small area overhead and low power consumption. In the following chapter, the basic network topologies, switching techniques, routing algorithms as well as Network traffic modules are described.

Network Topologies

Network topologies, refer to the physical structure of the network graph, for example the way network nodes (either routers or switches) are physically connected. Topologies also define the connectivity between nodes, which has a fundamental impact on network performance. However, network topologies come in two types; general topologies and customized ones. General topologies were created for being re-usable and scalable, as opposed to customized topologies, which were created to aim for performance and for optimizing resources. Both general and customized topologies have been used in NoCs. On the one hand, in general topologies the power consumption is scaling predictably according to the size of the topology. Meshes and tori are the most popular ones, because of their use of symmetric-length wires. On the other hand, customized topologies had been created for specific applications that require flexibility. In addition, examples of topologies between general and customized also exist, such as the Octagon. Moreover, as Ogras et al. stated, general topologies can be customized by using application-specific long-range links to improve performance in exchange for a small area penalty.

Meshes

Tori

Torus architecture is defined as a regular mesh, with the exception that edge switches are not only connected to two neighboring switched but are also connected to the two opposing edge switches, forming a donut shaped network. As shown in figure 3, the torus architecture uses a number of switched equal to the number of IP blocks and every switch has five ports. However, the long wrap-around channels significantly delay the packet transmission and it is required to use repeaters.

FIGURE 3. 2D TORUS

Solution to this problem is given by a different distribution of tori nodes, as shown in Figure 4. This is called the folded torus. Folding can be done by shifting all nodes on even rows to the right and the nodes in even positions of each row down. Then connect the neighbor nodes in newly gained rows and columns and pair-wise connect edge nodes. The new wrap-around links are shorter and the link propagation delays can fit in a single clock cycle.

FIGURE 4. 2D FODLED TORUS

Octagon

The Octagon architecture is defined by pairs of nodes, which have a maximum two-hop paths to communicate with each other. As shown in figure 5, the basic model consists of eight IP blocks and 12 bidirectional links, where nodes are placed in a ring and there is a central interconnection point in the center of the ring. In addition, each node is connected with its two neighbor nodes. Every node consists of an IP block and a switch and has three connection ports. The implementation of the octagon topology though, requires a development of a good interconnection scheduler.

FIGURE 5. OCTAGON

Switching Techniques

Switching determines how a message traverses its route. This can be achieved either by circuit switching or by packet switching. On the one hand, circuit switching is a connection oriented method, which means there has to be a connection established before sending data. This is done by reserving an end-to-end path from the source to destination before it transmits the data. On the other hand, packet switching segments the message into a sequence of packets. The sequence consists of a header, which carries the routing and sequencing information, the payload, which carries the actual data to be transmitted and the tail, which contains error-checking code. In addition, packet switching can either be connection-less, where packets are routed individually in the network to achieve best effort, or connection-oriented, where resources are being preserved by header and all other packet-members follow the same path. Connection-less message delivery is subject to the dynamic contention scenarios in the network, as opposed to connection-oriented, where there is a degree of commitment in message delivery bounds. The most known packet switching techniques are store-and-forward, virtual cut-through and wormhole switching.

Store and Forward

In store-and-forward technique, a network node has to receive the entire packet in order to forward it to the next node. The latency L for transmitting F flits is shown in Equation 2.1 below. Flit is the minimum unit of information that can be transferred across a link and either be accepted or ejected. In Equation 2.1, B is the link bandwidth; RD is the routing delay per node hop and HP is the number of hops between source and destination nodes, which is the communication action from node to node.

Eq. (2.1): Latency for transmitting F flits with store-and-forward

Virtual Cut-Through Switching

Virtual cut-through switching, works like store-and-forward. The main difference is that a node does not wait for the entire packet to be received in order to forward it. "In fact, the message does not even have to be buffered at the output and can cut through to the input of the next router before the complete packet has been received at the current router" (Duato, Yalamanchili, & Ni, 2002). This way, the latency for transmitting packets is reduced to

Eq. (2.2): Latency for transmitting F flits with virtual cut-through

Wormhole Switching

In wormhole switching, packets are also pipelined through the network. However, the requirements for buffers inside the routers are reduced over the Virtual cut-through switching requirements. The message is split into flits. The flit is a unit of message flow control. Input and output buffers have enough size to hold a few flits. The message is pipelined throughout the network at the flit level but typically it cannot be buffered within a single router, due to its size. Consequently, in a time instance a message occupies buffers in more than one router. In figure 6, the transmission of flits through the routers R1, R2 and R3 is observed. However, if the required output channel is busy the message is blocked. As shown in figure 7, at router R3, the message A requires the output channel, which is used by message B. Thus, the message A blocks. This is a result of the small buffer sizes at each router, which cause the message to occupy buffers in multiple routers. Although, this complicates the issue of deadlock freedom, the small buffer requirements and the message pipelining have enabled the creation of routers small in size, fast and compact.

Fig. 6: An example of a wormhole-switched message.

Fig. 7: An example of a blocked wormhole-switched message.

Deadlock, Livelock and Starvarion

Nodes in interconnected networks send and receive packets through the network interface. In switch-based networks, packets traverse several switches before reaching destination. However, some packets may not be able to reach their destinations, even if a fault-free path exists that connects both source and destination of the packet. Considering that a routing algorithm can use those paths, there are several situations where packet delivery may be prevented. Since the header of the packet has not already arrived at destination, it will keep requesting some buffers, while keeping the already reserved buffers; in this case a deadlock may arise. Since a packet, whose header has not already arrived at destination, will keep requesting some buffers while keeping the already reserved buffers, a deadlock may arise.

Deadlocks occur when packets cannot move towards destination because the buffers they requested are full. "All packets involved in a deadlocked configuration are blocked forever" (Duato, Yalamanchili, & Ni, 2002). Another interesting situation arises when some packets are not able to reach their destination, even when are not in a deadlock. This happens because the packet is traveling around destination and cannot reach it, since the required channels are occupied by other packets. This situation is called Livelock and can occur when packets are allowed to follow non minimal paths. Furthermore, a packet can permanently stop if traffic is intense and the requested resources are always given to other packets. This situation is called Starvation.

In the following subsections, three ways to remove deadlocks are presented: deadlock prevention, deadlock avoidance, and deadlock recovery. Moreover, ways to remove livelocks and starvation will be discussed. In figure 8, a classification of the situations mentioned above, as well as the techniques to eliminate them is shown.

Fig. 8: A classification of the situations that may prevent packet delivery

Deadlock Prevention

Deadlock prevention techniques consist of requesting resources in a way that no deadlock can arise. A simple technique of achieving this is by sending a message that sets up the whole path for the packet. Once a path is established, the packet flits are forwarded into the network. No deadlock can arise because all resources have previously been reserved. However, if the message cannot advance in the network, it is allowed to backtrack and release some previously reserved resources. If the "connection" cannot be established due to the lack of path from source to destination, the message returns to source and releases all the previously reserved resources. (Duato, Yalamanchili, & Ni, 2002)

Deadlock Avoidance

As mentioned in section 2.3, deadlocks arise because of the limitation of the network resources. Duato et al. (2002) stated that if critical resources are categorized in stages, where a packet is first routed at a switch from the first stage and then routed at the second stage and so on, until destination is reached, deadlocks are avoided. For example a packet is routed from left to right, or from up to down. Since there is no recirculation, once a packet reserves an output channel from the first stage it will not request another output channel from the first stage. In the same concept, once a packet reserves an output channel from first stage, it will not request an output channel from a previous stage. Thus, the dependencies are all from channels in a stage to channels in the next stages. As a result, there is not any cyclic dependency between channels and deadlocks are avoided. In this thesis, this concept to avoid deadlock creation in the network, by forcing a similar packet flow for all packets in the network is used.

Deadlock recovery

Deadlock recovery techniques consist of mechanisms to detect and resolve a deadlock situation. Thus, there is no restriction on routing decisions and allow deadlocks to form. When a deadlock is detected, one or more packets have to release the resources they allocate, and allow other packets to use them, which as a result will break the deadlock.

However, deadlock recovery techniques are only useful if deadlocks appear rarely, because of the overhead that is produced by deadlock detection, which considerably degrades network performance. Moreover, deadlock recovery techniques should help network recover from deadlocks faster than deadlocks occur (Duato, Yalamanchili, & Ni, 2002).

Livelock Avoidance

The easiest way to avoid livelock is to limit the packet misrouting. Minimal routing algorithms prevent misrouting and the number of channels a packet reserves is upper bounded by the network size. However, misrouting prevention is futile when packets are routed around faulty components. Thus, in this thesis, prevention of some nonminimal paths in order to achieve livelock freedom, is used. This is based on Gaughan and Yalamanchili (1995), who mentioned that if there is a maximum number of faults that do not disconnect the network, limited misrouting is enough to reach all destinations.

Routing Algorithms

A Routing algorithm is a set of rules that each router has to follow in order to send a packet towards destination. There are many interesting concepts about the ways a packet is routed, but two of the most interesting consist of either looking at a routing table or executing a routing algorithm according to states. Routing algorithms can be divided in two categories based on the decisions they make; Deterministic and Adaptive. An algorithm is called deterministic if for all the identical pairs of source and destination, it supplies the same path. Opposed to deterministic, an algorithm is called adaptive if the information about the network traffic and/or channel status is being used, in order to avoid faulty or congested regions of the network.

Adaptive algorithms can be divided into progressive or backtracking. Progressive adaptive algorithms move the header flit forward to reserve a new channel at each routing action. Likewise, backtracking also moves header flit forward but also allows it to backtrack and release previously reserved channels. Fault-tolerant routing algorithms mainly use backtracking. The algorithm proposed in this thesis, also uses backtracking for fault-tolerance.

In case of fault-tolerant algorithm implementations, properties that need to be taken into consideration are connectivity, deadlock and livelock freedom, as well as fault tolerance. Connectivity is the ability to route packets from any source node to any destination one. Fault tolerance is the ability to route packets through alternative paths when it comes across faulty components. By having deadlock and livelock freedom, we assure the packets will not block or wander across the network forever.

Deterministic Routing Algorithms

Deterministic routing algorithms connect source and destination nodes by establishing a path between them. Moreover, the same path is given to identical pairs of nodes. Thus deterministic routing algorithms had to be simple and fast.

Determinist routing became well known when wormhole switching was invented. Although, wormhole routers are fast and compact, pipelining would not have worked efficiently if one of its stages was slower than the others. In order to achieve this speed and steadiness, wormhole routers had the routing algorithm be implemented in hardware. As a result, simple enough and fast algorithms had to be chosen. Because of their ability to take decisions faster than adaptive routing algorithms and their lack of complexity, which led to minimal hardware requirements for implementation, Deterministic routing algorithms were chosen widely.

The most known deterministic routing algorithms are the simplest ones, and that does not come as a surprise. Since lots of topologies can be split into orthogonal dimensions, such as meshes and tori, it makes it easier to compute distance between current and destination nodes as the sum of the offsets in all dimensions. Progressive routing algorithms reduce the offset of a dimension in each routing action. The most popular progressive algorithm is called XY. As shown in figure 9, XY first calculates the offset of both dimensions (X and Y) by subtracting current node’s coordinates from destination’s ones. Then XY selects to send the packet either right of the X axes or left. This continues until Xoffset becomes zero. XY then continues by sending the packet on the Y axes until again Yoffset becomes zero. If both Xoffset and Yoffset are zero, that means the packet has reached destination. In this algorithm though, it is assumed that the packet header carries the absolute address of the destination node.

Fig. 9: The XY routing algorithm for 2-D meshes.

Adaptive Routing Algorithms

Unlike deterministic routing algorithms, adaptive routing algorithms do not use a single path for the same pairs of nodes, but instead the network current conditions are taken into consideration in order to choose the appropriate path. Thus, the routing becomes more flexible and any unnecessary time delays are also reduced. In addition, fault-tolerance can now be provided, since the packet can take more than one path each time.

As Siegel (1990) stated, adaptive routing algorithms consist of two main states: the routing state and the selection state. At the routing state, the algorithm finds all the available output channels according to its set of rules. Upon finishing that state, selection state becomes active and the most appropriate output channel, from the previously chosen, is selected. As a result, the packet always follow alternative paths instead of waiting for a busy output channel.

Moreover, backtracking technique is allowed in this type of algorithms. Backtracking is the technique which enables the ability of header to go back and release previously reserved channels. Thus the opportunity for always finding an appropriate path is risen.

Virtual Channels

Network Traffic Models

Uniform Traffic

Transpose Traffic

NetTraces – PARSEC (Applications Dependency)

ANALYSIS OF THE PROPOSED ROUTING ALGORITHM

In this chapter, the standard router architecture and the simulator model used for this extracting the proposed algorithm results, is described. Furthermore, an analysis is done on the proposed algorithm, based on the theoretical background of the graphs used, the set of rules it implements for choosing the best path available, as well as on how livelocks and deadlocks are avoided.

Standard Router Architecture

As mentioned in the previous chapter, the performance of an NoC is strictly proportional to the performance of the routers, which NoC consists of. Thus, optimizing the router component, significantly increases the whole network throughput. In order understand when a router needs optimization though, the router delay has to be measured.

Chien (1998) proposed a router delay model, where he defined the router with five critical functions. As shown in figure 10, the five functions were the address decoding (AD), the crossbar (CB), routing arbitration (RA), crossbar traversal, and virtual-channel (VC) allocation. In this model, the routing latency was defined as the total delay of these five functions.

Figure 10: Standard router architecture proposed by Chien

Chien however, assumed that the crossbar should be able to provide each virtual channel a separate port, which made the crossbar design much more complicate as the virtual channels were increasing. Moreover, chien also assumed that the entire critical path fits inside a single clock cycle and did not consider the pipelining. Duato and Lopez (1994) proposed an extended version of Chien’s model, where pipelining was taken into consideration. The proposed model’s pipeline consisted of routing stage, switching stage and the channel stage.

Simulator Model

Proposed algorithm basic concept

-adaptive algorithm

-based on distributed graph theory

-choosing the best choice by sorting choices based on ….

-header holding paths passed in order to avoid livelocks

-knowing a step ahead, avoids deadlocks

Graph Theory Implementation

Performance Evaluation Of The Proposed Algorithm

Conclusion And Future Work

Appendices



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now