Prominent Trends In Information Service Technologies

Published Date: 02 Nov 2017

In the deep-submicron era, interconnect wires and the associated driver and receiver circuits are responsible for a large fraction of the energy consumption of an integrated circuit. The low-swing signaling technique is used for information transfers on long wires where wires achieve low-power consumption as they assume reduced voltage swings on the transmitted signals. Such technique requires special transmitter and receiver circuits to provide and sense the low swing voltages on signals flowing in these busses.

1.5.2 Logic Gate Level

Logic synthesis is the process by which a behavioral or RTL design is transformed into a logic gate level net list using a predefined technology library (Devadas et al., 1994). The trivial attempt for low-power design is to target a library in which the components are designed to be low power. However, other techniques are needed since there are opportunities that are not exploited by the use of a low-power library. Power minimization requires optimal exploitation of the slack on performance constraints. Reducing capacitances yields savings in power consumption. Resizing does not always imply downsizing. Power can also be reduced by enlarging heavily loaded gates to increase their output slew rates. Fast transitions minimize short-circuit power of the gates in the fan-out of the gate that has been sized up, but the gate's capacitance is increased. In most cases, resizing is a complex optimization problem involving a trade-off between output switching power and internal short circuit power on several gates at the same time with the constraint being set on performance by the critical path timing.

As discussed earlier that Spurious switching is due to the existence of unequal paths to outputs. Path equalization techniques help in reducing these glitches, consequently resulting in less power dissipation. Other logic-level power minimization techniques involve local transformations, including refactoring, remapping, phase assignment, and pin swapping. All techniques can be classified as local transformations. They are applied on gate net-lists and focus on nets with large switching capacitance. Most of these techniques replace a gate or a small group of gates around the target net in an effort to reduce capacitance and switching activity. Similarly, local transformations must carefully balance short circuit and output power consumption for resizing.

1.5.3 Circuit Level

Only local optimizations are possible at this level; more specifically, the low-power techniques are applied for simple primitive components that assume some specific input and output characteristics, such as input rise and fall times and output load capacitance. The techniques at this level are library cell design, transistor sizing, and circuit design style.

For path equalization, another alternative to lowering the supply voltage is to use gate resizing. Sizing focuses on the noncritical paths or the fast paths where gates along these fast paths are downsized to reduce library cell design. The low-power library cell design lies at the heart of the circuit level techniques to reduce power consumption. From the power point of view, the most critical cells in a design library are the timing elements-flip-flops and latches, because of two reasons. First, these timing elements are extensively used in almost all digital systems where many storage units and pipelining are used. Second, it is well known that a significant fraction of the system power is dissipated in the clock distribution network that drives all timing elements. Consequently, the design of low-power sequential primitives is of great importance where flip-flop design for low power focuses on minimizing clock load and reducing the internal power when the clock is toggled (Pedram and Rabacy, 2002). Another extensively used element in VLSI designs is the adder cell, which is considered the basic building block for all arithmetic and DSP functions. The literature describes many adder cells with different characteristics such as transistor count, speed, power consumption, and driving capabilities. The choice of a library element not only depends on its power characteristics but also depends on the other characteristics of the chosen component and the design specification such as area and speed. The choice also depends on the context in which such an element is used, which specifies the inputs driving the library element and the output loads of the driven components.

While building a library, it is quite useful to design the components with different sizes for a wide range of gate loads. Each component is thus sized to optimally drive a certain load. Transistor sizing at the circuit level complements design techniques at the gate level, where logic level sizing can help the synthesis tool in selecting the component with optimum sizing for low-power consumption. Another sizing example is to effectively drive a large load from a source with low driving capability.

1.5.4 Wire Sizing, Shaping, and Spacing

The wire width and space can be varied to satisfy different design criteria. By explicitly characterizing the relationship between the interconnect impedance and wire geometries, tradeoffs among the delay, bandwidth, and power of the global interconnect can be made.

Figure 1.11: Shaping Interconnect to Minimize Delay

It is known that the optimal shape of an RC interconnects that minimize the Elmore delay is an exponential taper, as shown in Figure 1.11. Wire tapering increases the wire width near the driver and decreases the wire width near the load. Since the near end resistance sees more downstream capacitance than the far end resistance, assigning less resistance to the near end than to the far end will reduce the total RC delay. The optimal shape of an RC interconnect is also shown to be exponential. Exponential shaping, however, is more difficult to implement than uniformly sized wires.

1.5.5 Repeater Insertion

The delay of an RC interconnect is RCl2, which is proportional to the square of the wire length l. By splitting interconnect into k segments with repeaters, the interconnect delay term is reduced to 0.377*RCl2=k. These repeaters introduce additional gate delay. The optimal number and size of the repeaters can be determined to achieve the minimum delay. As signals propagate along interconnect, sharper transition edges are regenerated by the repeaters, increasing the bandwidth of interconnect. By dividing interconnect into segments, the coupling between interconnects is also reduced due to the shorter length of coupling between neighboring lines. Inserting repeaters in long interconnects, however, introduces an area and power penalty. A trade off among different design criteria is required for an efficient repeater insertion methodology.

Figure 1.12: Staggering Repeaters to Reduce the Worst Case Delay and Crosstalk Noise

As shown in Figure 1.12, the repeaters in adjacent wires are interleaved. By placing a repeater in the middle of two repeaters in adjacent wires, a potential worst case capacitive coupling only persists for half the wire length. For the other half length, the capacitive coupling is the best case. Figure 1.12, for two simultaneously switching adjacent wires, the direction of current is the same for half the wire length and opposite for the other half length. Inductive coupling due to the current flowing in different directions in the neighboring wire can be cancelled. In [45], the optimum position of staggered repeaters is determined for RC interconnects to achieve the minimum worst case delay.

A first option for reducing RC delays is to use better interconnect materials when they are available and appropriate. However, for very long wires the delay can be substantially larger than the gate delay. It is possible to reduce the propagation delay by introducing intermediate buffers, generally known as repeaters, in the interconnect line. By breaking a long interconnect line into n smaller lines the propagation delay of each line is reduced quadratically. The total wire delay is as Equation-1.26

â€¦ (1.26)

As long as the gate delay is small the total wire delay is reduced substantially. This gain results at the cost of increased chip area occupied and extra power consumed by the repeaters.

1.5.6 Shielding Technique

Shielding techniques are widely used in ICs to reduce capacitive and inductive coupling. By inserting a shield line between signal lines, the effective capacitance of interconnect is almost fixed and no longer depends upon the signal switching activity. The shield line connected to the power or ground grid. With shielding, the normalized peak crosstalk noise can be reduced to less than 5% of Vdd for RC interconnect with lengths ranging up to 2 mm. Inductive coupling can also be reduced by inserting a shield line though not as efficiently as reducing capacitive coupling due to the long range magnetic coupling property. The shield line provides a nearby current return path which reduced the self and mutual inductance of the signal lines. Due to the importance of the on-chip clock signal, the clock distribution network in a high speed circuit is generally shielded on both sides in the same layer. Additional parallel shielding in the N-2 layer has been reported in [46] to further prevent inductive coupling from the lower layers. The primary drawback of the shielding technique is the overhead of the metal resources.

1.5.7 Encoders

The Bus encoding is widely used technique to reduce dynamic switching power and the effects of crosstalk (signal noise, delay) during data transmission on buses. Low power encoding techniques aim is to transform the data being transmitted on buses in such a manner, so that the self and coupling switching activity on buses reduced. Crosstalk aware encoding techniques also modify the switching patterns of a group of wires in such a manner so that the crosstalk coupling effect is reduced. These techniques are invaluable for reducing power consumption, improving transmission reliability, and increasing system performance. For any encoding scheme, the encoder and decoder functions are the inverse of one another. Bus encoding schemes can be classified according to several criteria, such as the Type of code used (algebraic, permutation, or probability based), the degree of encoding adaptability (static or dynamically adaptable encoding), the targeted capacitance for switching reduction (self, coupling, or both), the amount of extra information needed for coding (redundant or irredundant coding), and the method of encoding implementation (hardware, software, or a combination of the two). Encoding techniques are often aimed at power reduction, signal transmission delay reduction and reliability improvement, or a combination of these due to the reduction in the transition. Certain optimizations such as crosstalk reduction can have multiple benefits associated with them, such as power reduction, signal delay reduction, and noise reduction. The crosstalk, delay and power consumption in interconnects are based on the transitions in the lines. The encoder purpose is to avoid the transitions as much as possible so that minimum transitions have minimum crosstalk, delay and power consumption.

1.6 PROBLEM DEFINITION

Generally, the solutions for the interconnect performance optimization problem are based on insertion of buffers, insertion of repeaters, sizing of the wire and bus encoding method. Cascaded buffer insertion is widely adopted for minimizing the delay of a point-to-point connection which is modelled as a capacitive load. Capacitive coupling can be reduced by increasing wire-to-wire spacing and by inserting shield lines. The shield lines also provide a controlled current return path and make inductance calculation easier. Further, capacitive crosstalk-effects can be minimized by using a staggered configuration of buffers on adjacent lines. Other techniques which reduce the effects of crosstalk are low-swing differential signaling, wire-swizzling. Usually, the delay and crosstalk mitigation techniques involve buffer insertion. Inserting repeaters in long interconnects, however, introduces an area and power penalty.

Bus encoding is a widely used technique to reduce dynamic switching power and the effects of crosstalk (signal noise, delay) during data transmission on buses and for the delay minimization. Low power encoding techniques used to transform the data being transmitted on buses, so that the self-switching activity on buses is reduced. Crosstalk aware encoding techniques also modify the switching patterns of a group of wires in such a manner so that the crosstalk coupling effect is reduced. These techniques are invaluable for reducing power consumption, improving transmission reliability, and increasing system performance. Encoding techniques fit into a typical Electronic System Level (ESL) design flow. Encoding techniques are often aimed at power reduction, signal transmission delay reduction and reliability improvement, or a combination of these. Certain optimizations such as crosstalk reduction can have multiple benefits associated with them, such as power reduction, signal delay reduction, and noise reduction. One of the most common uses of encoding techniques has been to reduce the power consumption of bus wires. Low Power Coding (LPC) techniques attempt to modify the switching activity on the wires to reduce self-switching and coupling power, by transforming the transmitted data word in some manner. It is important that the savings obtained from reducing bus switching activity are not offset by the power dissipated by the encoding/decoding logic that needs to be added. Bus encoding method is very simple and efficient methods for handling interconnect issues like power dissipation, crosstalk and delay.

Further, many encoder are proposed for solving the issues of interconnects as mentioned in previous sections. It is also found that the VLSI chip performance increases with the increasing performance of interconnects in DSM technology as per Mooreâ€™s Law. The VLSI technology continuously increases. The encoders suitable for one technology may not suitable for another technology or may not give the same results for others. There are three parameters i.e. power dissipation, crosstalk and delay for the performance evaluation of interconnects. The encoder which gives better results in some parameters may not produce better results in other parameters.

The encoders will be design for specific technology and show the better results for all of the three parameters as far as possible. The encoder will to be design of which will have better performance based on the result analysis in the latest VLSI technology like 90nm, 45nm and lesser technology.

1.7 WORK CARRIED OUT

We have made a thorough analysis of the encoders proposed by the various researchers from the renowned literature. We have implemented and analyzed the various encoders and their decoders for reduction of power, crosstalk and delay for RC interconnects in various current VLSI technologies. We have also design encoders for the RLC interconnects. We have compared the performance and obtained results of our encoders with existing encoders proposed by various renowned researchers. The performance of the encoder is based on the power consumption by the encoder, delay reduction capability and worst case crosstalk reduction capability.

The proposed encoders I and II are based on identification of various crosstalk types based on identified types incoming data are inverted or non-inverted and send through interconnected. We compare the original input data with the previous bus states. Original input data may cause worst case crosstalk with previous bus states or Inverted original data may cause worst case crosstalk with previous bus states. In the former case, the encoder outputs invert of the original data and if the latter case is true it outputs the original data. Hence each time the output from the encoder is one which causes least coupling effect with previous states of bus. Proposed encoders is implemented in the VHDL and SPICE simulator for RC interconnects and evaluated on the random data streams having switching frequency of MHzâ€™s. The transitions in the state of buses decide the behavior of the switching and coupling activities. Reduction of transition reduces the coupling and switching activities. It is shown that reduction in the coupling and switching activity reduces the dynamic power Dissipations and Crosstalk. Hence the performance of the VLSI chip improved.

The next proposed encoding method III transforms the bus signals for the reduction or elimination of the worst case crosstalk by reducing the 7-bit lines into 4-bit lines. The encoder deals with the coupling transitions among the group of seven bits. The encoder identifies the higher number of 0â€™s and 1â€™s in current incoming data in the line. Output line is set according to higher number of 1â€™s or 0â€™s. Firstly, the number of 1â€™s and 0â€™s are counted using a counter module. If the number of 1â€™s is more than the number of 0â€™s, the comparator module sets output line in high state â€˜1â€™. If the number of 0â€™s is more than the number of 1â€™s this module sets output line in low state â€˜0â€™. There are two best cases possible; either all lines are in one state or all lines are in zero state. the positions of bits are flipped to either 0 or 1 depending upon the value of 1â€™s or 0â€™s. There are maximum three flips possible in worst case condition. Single output line of controller compared with initial seven-bit input line and finds the flipped bit positions. The number of flipped bits could be 0 (best case; when all the inputs are either 0 or 1), or 1 or 2 or 3 (worst case). After identification of the flipped bits positions, we have stored number of position in the three-bit registers. The number of the registers is chosen to be three in order to work for the best as well as the worst cases. Each of the register value sends in three different clock cycles. At each clock cycle, contents of one register are sent to the decoder. Another register contents are sent in the next clock cycle and so on. Since the maximum flips that can be possible are three so maximum three clock cycles are required for decoding seven-bit line at decoder side. The proposed encoder III is implemented in the VHDL and TSPICE simulator in the 45nm, 65nm and 90nm technologies. The circuit is tested for the random signals of 1-400 MHzâ€™s.

Another proposed encoder IV used three-bit incoming data. Initially, data fed to coupling transition identifier circuit. The coupling transition identifier indentifies coupling transitions among incoming data. The two-bit output and inverted form of the coupling transition identifier fed to two-bit adders. Two bit outputs of the first adder and second adder fed to the one-bit comparator. The comparator compares outputs of the two adders. Comparator generate control signal. The output of the three-bit multiplexer depends on the control signal value, i.e., if the control signal value is â€˜1â€™ then the output of multiplexer is output of the coupling transition identifier and 3rd bit is the XORâ€™ed of the third bit data and control signal and if the value is â€˜0â€™, output of the multiplexer is as original data. The encoder is implemented in the VHDL and TSPICE simulator in the 16nm, 22nm, 32nm, 45nm, 65nm and 90nm technologies. The circuit is tested for the random signals of 1-500 MHz.

It is found that proposed encoders I, II, III and IV best suited for the current technologies as per results obtained in most of the performance parameters.

As we have discussed in previous chapter about implementation of interconnects and their various issues. Some techniques are found to handle these issues at various levels. It is also found the most of the issues are dependent on switching activity of the data on the bus. The issues can handle through reducing the transition or switching activity in the bus. The Bus Encoding Method is a popular method for doing so. This chapter discuss about these bus encoders proposed by the various renowned researchers for handling various interconnect issues. The chapter is organized as section 2.1 describe introduction including historical perspective section 2.2 describe methods for power, crosstalk and delay, section 2.3 describe techniques for power reduction, section 2.4 describe encoding techniques for reduction of capacitive crosstalk delay, section 2.5 describe encoding techniques for reduction of power and capacitive crosstalk effect, section 2.6 describe techniques for reduction of inductive and capacitive crosstalk effect and finally 2.7 summarize the chapter.

2.1 HISTORICAL PERSPECTIVE

The electronics industry has achieved a phenomenal growth over the last two decades, mainly due to the rapid advances in integration technologies, large-scale systems design. The number of applications of integrated circuits in high-performance computing, telecommunications, and consumer electronics has been rising rapidly and consistently. Typically, the required computational power of these applications is the driving force for fast development of this field. Figure 2.1 gives an overview of the prominent trends in information technologies over the next few decades. The current leading-edge technologies already provide the end-users a certain amount of processing power and portability. This trend is expected to continue, with very important implications on VLSI and systems design. One of the most important characteristics of information services is their increasing need for very high processing power and bandwidth. The other important characteristic is that the information services tend to become more and more personalized, which means that the devices must be more intelligent to answer individual demands, and at the same time they must be portable to allow more flexibility/mobility.

As more and more complex functions are required in various data processing and telecommunications devices, the need to integrate these functions in a small system/package is also increasing. The level of integration, as measured by the number of logic gates in a monolithic chip have been steadily rising for almost three decades, mainly due to the rapid progress in processing technology and interconnect technology. The logic complexity per chip has been increasing exponentially. The monolithic integration of a large number of functions on a single chip usually provides:

Less area/volume i.e. compactness.

Less power consumption

Less testing requirements at system level

Higher reliability, mainly due to improved on-chip interconnects

Higher speed, due to significantly reduced interconnection length

Significant cost savings

Figure 2.1: Prominent Trends in Information Service Technologies

Figure-1.2

Figure 2.2: Evolution of Integration Density and Minimum Feature Size, As Seen in the Early 1980s

Therefore, the current trend of integration will also continue in the foreseeable future. Advances in device manufacturing technology and especially the steady reduction of minimum feature size i.e. minimum length of a transistor and interconnects on chip. Figure 2.2 shows the history and forecast of chip complexity and minimum feature size over time, as seen in the early 1980s. At that time, a minimum feature size of 0.3 microns was expected around the year 2000. The actual development of the technology however has far exceeded these expectations. A minimum size of 0.25 microns was readily achievable by the year 1995. As a direct result of this, the integration density has also exceeded previous expectations. The first 64 M-bit DRAM and the INTEL Pentium microprocessor chip containing more than 3 million transistors were already available by 1994, pushing the envelope of integration density. When comparing the integration density of integrated circuits, a clear distinction must be made between the memory chips and logic chips. Figure 2.3 shows the level of integration over time for memory and logic chips, starting in 1970. It can be observed that in terms of transistor count, logic chips contain significantly fewer transistors in any given year mainly due to large consumption of chip area for complex interconnects. Memory circuits are highly regular and thus more cells can be integrated with much less area for interconnects.

Figure 2.3: Level of Integration over Time, for Memory Chips and Logic Chips

2.2 METHODS OF MINIMIZATION OF POWER, DELAY AND

CROSSTALK

The most common practices to reduce crosstalk, propagation delay and power consumption are:

Insert repeater

Insert shielding between adjacent wires

Introduction of intentional delay among coupled signal transmission.

Bus Encoding methods

The use of tight geometry in most systems can reduce crosstalk significantly although it cannot eliminate it entirely. Some preventative design measures can be used to minimize crosstalk.

Use maximum allowable spacing between signal lines.

Minimize spacing between signal and ground lines.

Isolate clocks and other critical signals from other lines (larger line spacing) or isolate with ground traces.

In backplane or wire-wrap applications, use twisted pair for sensitive applications such as clocks and asynchronous set or clear functions. When using ribbon or flat cable, make every other line a ground line.

Terminate signal lines into their characteristic impedance.

Bus encoding is widely used technique to reduce dynamic switching power and the effects of crosstalk (signal noise, delay) during data transmission on buses. Low power encoding techniques aim to transform the data being transmitted on buses, so that the self-switching and coupling switching activity on buses is reduced. Crosstalk aware encoding techniques also modify the switching patterns of a group of wires in such a manner so that the crosstalk coupling effect reduced. These techniques are invaluable for reducing power consumption, improving transmission reliability, and increasing system performance. Encoding techniques fit into a typical electronic system level (ESL) design flow. Since data transformation through encoding requires additional hardware logic at component interfaces, the implementation spans multiple levels in the design flow, starting from the communication architecture (CA) exploration level where the effectiveness of encoding techniques can begin to be evaluated.

For any encoding scheme, the encoder and decoder functions are the inverse of one another. Bus encoding schemes can be classified according to several criteria, such as the type of code used (algebraic, permutation, or probability based), the degree of encoding adaptability (static or dynamically adaptable encoding), the targeted capacitance for switching reduction (self, coupling, or both), the amount of extra information needed for coding (redundant or irredundant coding), and the method of encoding implementation (hardware, software, or a combination of the two). Encoding techniques are often aimed at power reduction, signal transmission delay reduction and reliability improvement, or a combination of these. Certain optimizations such as crosstalk reduction can have multiple benefits associated with them, such as power reduction, signal delay reduction, and noise reduction. One of the most common uses of encoding techniques has been to reduce the power consumption of bus wires. Power is dissipated on bus wires due to the charging and discharging of capacitances associated with a wire. There are two major sources of power consumption on a wire: self-switching and coupling switching (crosstalk). Power dissipated every time there is a bit flip (i.e., a transition from 0 to 1, or 1 to 0) on a wire due to its ground capacitance is referred to as self-switching power. In contrast, coupling power is due to the coupling capacitance of the wire created by switching in adjacent wires. It is possible that coupling power is dissipated even when there are no bit-flips on the wire itself. Low Power Coding (LPC) techniques attempt to modify the switching activity on the wires to reduce self-switching and coupling power, by transforming the transmitted data word in some manner. The process of transforming transmitted data at the sender end is referred to as encoding. When the transmitted data reaches the receiver, a decoding step is performed, to retrieve the actual data. It is important that the savings obtained from reducing bus switching activity are not offset by the power dissipated by the encoding/decoding logic that needs to be added.

2.3 TECHNIQUES FOR POWER REDUCTION

Encoding techniques are used to reduce power consumption of bus wires. Due to charging and discharging of wire capacitance, power is dissipated on the bus wires. Two major sources of power consumption are: self-switching and coupling. Every time when there is power dissipation there is a bit-flip that is transition from 0 to1 and 1 to 0 on a wire, due to its ground capacitance which is referred as self switching power. Coupling power is due to coupling capacitance of the adjacent wires. Low Power Coding (LPC) techniques modify the switching activity on wires by transforming the transmitted data word in some manner. This is called encoding. It is important that savings due to reducing switching activity is not offset by the power dissipated by encoding/decoding logic that needs to be added. Decoding step is performed when the transmitted data reaches to the receiver to retrieve the actual data.

2.3.1 Schemes for Reducing Self-Switching Power

Self-switching can be described as the bit-flips from 0 to 1 and 1 to 0. On a wire, it can be described as charging and discharging of capacitance causing power dissipation on the wire that can be significant for on-chip and off-chip buses. Some methods are as follows:

2.3.1.1 Address buses

Encoding schemes for address buses are utilizing the behavior of regularity and sequentiality associated with address data. The effectiveness of the schemes depends upon the transmitted data, which consist of the spatial and temporal locality. The earliest works in the area of encoding for address buses is the use of a gray code. It is a permutation code and it guaranteed a single transition for every consecutive address.

Let the binary source word B= <bn-1, bn-2 â€¦ b1, b0> and the code word which is transformed gray sequence G=<gn-1, gn-2... g1, g0>. Then encoding function from binary to grey code is given as

gn-1 = bn-1

gi = bi+1 âŠ• bi (i = n-2, â€¦, 0)

The corresponding decoding function from gray to binary is given as:

bn-1 = gn-1

bi = bi+1 âŠ• gi (i = n-2, â€¦, 0)

Let B be a binary number <1, 1, 0, 1>. The value of b3, b2, b1, b0 is 1,1,0,1. The gray code representation is then equal to <b3, b3 xor b2, b2 xor b1, b1 xor b0 >. Gray code has less switching as compared to the original binary representation for consecutive addresses [9] transmitted over the bus. This code is optimum only in the case of irredundant codes. Irredundant codes are the codes that do not add extra or redundant bits to the original data word.

If redundancy is allowed, T0 code provides more power savings. It has an extra line (INC) for indicating a pair of consecutive addresses transmitted over the instruction bus. If INC is high, to avoid the unnecessary switching activity the current value on the bus is frozen. Now the receiver has the responsibility for computing the new address. Thus the decoder computes the current address as the addition of previous address word and a stride value. Stride value being the difference between the consecutive addresses in a sequential mode. When two addresses are not consecutive, the INC line remains low, and bus operates normally. The T0 code achieves a 60% reduction in switching activity on the address bus. The disadvantage of the code is it changes standard bus width and chip pin-outs because it needed an extra signal.

Aghagiri et al. proposed the T0-C encoding schemes which eliminates the additional signal line. It observed that a new address transmission is sufficient to indicate to the receiver that the address increment mode is no longer in effect. This approach would fail if a backward branch address is encountered whose value is same as that in the bus. For an address b (t) at the source to be transmitted at time t, the value on the bus B (t) is given as:

if (b (t) = = b (t-1) + S)

B (t) = B (t-1);

else if (B (t-1)! = b (t))

B (t) = b (t)

else

B (t) = b (t-1) +S;

It decreases switching activity on a bus by an additional 14% over the T0 code. Fornaciari et al. introduced the T0-XOR, OFFSET, OFFSET-XOR, and T0-OFFSET codes, also improve upon the T0 encoding by additional reducing signal transition for cases when there is a non-uniform address transition. The T0-XOR combines the T0 code with an XOR operation, so that the T0 code is used for in sequence values, while a xor of the previous and current values is sent for out of sequence values. T0-XOR code can be expressed as:

B (t) = b(t) âŠ• (b(t-1) + s)] âŠ• B(t-1) t > 0

bt t = 0

B(t) is the encoded value transmitted on the bus lines at time b (t) is the original encoded value at time t and S is the stride. For the T0 code the presence of the de-correlating xor function is necessary which obviates the redundant line INC. Parandeh-Afshar et al. presented the fast encoding and decoding architecture for the T0-XOR code. In the offset code the difference of the current value b(t) with previous value b(t-1) is transmitted on the bus. The offset code combines the offset codes with the xor operation. The offset-XOR can be expressed as:

B (t) = (b (t) â€“ b (t-1)) âŠ• B (t-1) t > 0

b t t = 0

T0-offset code combines to code with an offset operation so that the T0 code is used for in-sequence values, while the difference of the previous and current values is sent for out of sequence values.

Aghagiri et al. proposed ALBORZ CODE which introduces another encoding of the offset for the offset-XOR code [12]. This scheme further reduces the switching activity on the bus. The encoder uses a codebook [15], which stores the recently used offsets. A limited weight code (LWC) [16, 17] is extracted from the codebook, if the offset is present in the codebook. An N-LWC is defined as a codeword with N 1â€™s in it. There are two fields: the offset and its LWC code for each entry of the codebook. LWC chose such that they have the fewest number of 1s in them. The codebook can be implemented in a fixed or adaptive manner. The LWC is XORâ€™d with the previous value and the result is sent to the bus. An additional bit is added to the bus, to distinguish between the cases when the offset is being sent. Optimization is proposed to remove the extra bit required in the encoding to make irredundant code.

Komatsu and Fujita proposed, two irredundant codebook-based address bus encoding schemes. The first scheme extends T0-C code by adapting codebook to it called T0-CAC. In this scheme T0-C code is used for sequential address, while adaptive codebook is used when a jump or a branch operation results in a non-sequential address on the bus. The second scheme extends the offset â€“XOR code [12] with the adaptive codebook and is called OXAC as shown in Figure 2.4 (offset- XOR with adaptive codebook)

Figure 2.4: Block Diagram of OXAC Encoder [12]

Benini et al. proposed the beach code. This scheme makes use of other types of temporal correlations other than arithmetic sequentially between transmitted patterns on the address bus. Lines belonging to the same cluster are considered to be highly correlated, according to their correlations. An encoding function is then automatically generated for each cluster of bits in the original cluster is transformed into a new bit configuration. The algorithm has the goal of minimizing the switching activity which is responsible for the finding the encoding function. The output of this transformation is an encoded stream. In encoded stream average number of bus line transition between two successive patterns is minimized. Hamming distance is the small number of bit positions that are different in a code. Thus, codes with small hamming distance between them are assigned to data words that are likely to be sent on the bus in two consecutive clock cycles. As we know, the computation of the encoding function is highly dependent on the selected execution trace, thus code perform best for special purpose system. In most embedded systems, the same piece of code is repeatedly fetched and executed. Ascia et al. improved upon this technique by using the heuristic method which is based on genetic algorithms. Genetic Algorithms is used to search for an encoder to minimize switching activity. This approach assumes that the trace of patterns transmitted on the bus is available in advance, which is similar to the assumptions made in the beach scheme. For genetic algorithm, the trace is used as an input represented as truth table finds an encoder. It minimizes switching activity on the address bus. For reasonable sized systems, the runtime for such an approach may be large.

Musoll et al. proposed the working zone encoding (WZE) scheme that exploits the locality of reference associated with most software programs. The proposed approach partitions the address space into working zone. In a set of register, it stores the starting addresses of each working zone. An address space for an application with accesses to the three vectors (or array) can be partitioned into three working zone. In the working zone every word being uniquely identified by its working zone address, and an offset within the zone. At any time, application typically favor a few working zone of their address space. To denote a hit or a miss of the working zone a bit is used. The full address is transmitted on the bus, in a miss. Otherwise, the bus is used to transmit the offset, which is one-hot coded. The receiver is notified with a special code word, when the data access is moved to a different working zone. The offset transmission can resume only when the receiver subsequently changes the different reference address. Replacement policy is implemented when the number of registers is lesser than the number of zones. While the scheme is more flexible than the GRAY and T0 schemes, it relies heavily on certain assumptions of the pattern in the string. The scheme losses its effectiveness, either the data access policy is not array based or the number of working zone is too large. The scheme also requires an additional wire for communicating a working zone charge like the T0 code.

Cheng and Pedram proposed the pyramid code proposed to reduce switching activity on dynamic RAM (DRAM) address bus. DRAM is often laid out in a 2-Dimension array. Two addresses are needed to identify a cell in the array i.e. row and column. These addresses are usually multiplexed. Firstly the row address is send followed by the column address. The approaches above cannot be effectively applied since the switching activity on a DRAM address bus occurs in a time multiplexed manner, but they are meant for non-time multiplexed buses. As the Eulerian cycle problem [24] on a complete graph, the pyramid code minimized switching activity on the DRAM address bus by formulating the problem. The pyramid technique is an irredundant coding scheme like GRAY coding scheme. The code remains quite effective reducing the power dissipation of the multiplexed bus event when the sequentiality of the address is interpreting every four address. When coding is applied to larger address space, the pyramid technique can have a significant delay.

Mamidipaka et al. proposed an adaptive self-organizing list code for reduction of switching activity on instructive and data address bus. It uses a list to create a one by one mapping between address and code. To reduce switching activity on the bus, the list is reorganized in every clock cycle to make the map the most frequently used address the code with the fewer 1â€™s. Ramprasad et al. proposed a scheme for multiplexed address bus, which is a combination of the approach with the INC-XOR approach. Between the current address and summation of the previous address and the stride, the XOR operation performed by INC-XOR approach and sends the result over the bus. No transition occurs on the bus, when consecutive address grows by stride. The size of the list has impact on performance. The large hardware overhead of maintaining long list, as well as the complex encoding/decoding hardware logic needed for the scheme make it practically quite costly to implement.

Givargis and Estein proposed the combination of Unit Distance Redundant Codes (UDRC) and an address referencing caching mechanism, which is to reduction of switching activity on address bus. As we know that there is at most one hamming distance between any pair of encoded symbols. Any arbitrary value can be encoded by a value of hamming distance at most one from each previous codeword. For a given set of symbols the construction uses an optimal number of bits. The fact is that the address references are likely to be made up of an interleaved set of short sequential address bursts being exploited by the address reference caching. By storing recently used address values reference caching isolates these streams. It also limits the communication to a UDRC encoded message that identifies a particular reference and the cost of at most a single bit transition. There is an average of 60% reduction in switching activity with the best case being 86% and worst case is 36% when the experiments on fourteen applications from the Power Stone bench mark have been conducted. For the encoder and decoder the maximum performance penalty, that is critical path delay are 16 and 14 gates respectively.

Compression techniques provide significant energy savings. In a compression scheme for buses, firstly the data to be sent is compressed and transmitted in a single cycle on a bus which is very narrow if the compression is successful, or transmitted in multiple cycles, if not. Early work proposed address compression schemes for processer memory compression, using a small compression cache at the sender end and a base register file at the receiver end. Original address is split into higher order and lower order components in the Dynamic Base Register Caching (DBRC) approach [31, 32], and the original address is stored in a compressor, which is a cache of base registers. The processor side is the sender side and the receiver side is the memory. In the case of a cache hit the index and entry number to the Base Register Caching (BRC) are transmitted on the bus with the uncompressed lower order part of the original address in the same cycle. While in the case of a miss, in the first cycle the processor (sender) BRC results in the sending reserved bit pattern on the bus, and in the subsequent cycles missed address are followed. The memory side is loaded with missed address which consists of register files. The bus extender (BE) scheme is similar to DBRC but acts differently on a cache miss. Instead of transmitting a reserved bit pattern, it starts sending the entire address immediately. A hit or miss in the sender cache is being indicated by a separate control signal line. Basu et al. studied the benefits of using these buses for reducing switching activity for off-chip buses. Liu et al. studied the energy benefits of using these compression schemes for on chip address compression.

Srinivasa R. Sridhara and Naresh R. Shambhag proposes a code construction that results in practical codec circuits with the number of wires being within 35% of the fundamental bounds. They have derived fundamental limits on the number of wires required for CACs. Code construction for joint crosstalk avoidance and error correction is shown as in Figure 2.5.

Figure 2.5: Code Construction for Joint Crosstalk Avoidance and Error Correction

They demonstrated the speedup and the energy savings achievable by the proposed codes using a standard 0.13-Âµm CMOS technology. All joint code presented in the paper trade off the delay and power dissipation in the codec. This trade off will be increasingly favorable due to the increasing gap between the gate delay and interconnect delay, and due to the longer bus lengths. This method is applied to a 10 mm 32-bit bus with low swing-signaling with one of the proposed codes provides 2.14Ã— speedup and 27.5% energy savings at the cost of 2.1 Ã— area overhead, but without any loss of reliability. They have derived bounds on the number wires required for joint correction and crosstalk avoidance. They described the code construction for practical encoding and decoding and derive several novel codes from the construction.

2.3.1.2 Data buses

Encoding for data buses are applicable to only for the reserved cases where the data which is to be transmitted is highly uncorrelated. One of the most widely used encoding scheme to reduce bit logging on data buses is the Bus Invert (BI) code by Stan and Burleson. In this approach, the authors calculate the hamming difference between the current value and the previously transmitted value, and perform the bit value negation of the current value when the distance is greater than half of bit width bus. An additional bit is required for the signaling. The BI code is considered as an example of LWC or a starvation code. For minimizing bus switching activity, this code is a simple and effective technique.

BI was extended by Shin et al. known as Partial Bus Invert (PBI) encoding which is also can be used for the data buses. The PBI encoding technique is effective only when bits in the source word in the data stream acquire the strong temporal correlations. The main method is to group them together. The groups x and y are made by analyzing the transition probability of the lines and applying the BI encoding only for these groups. It is used to reduce switching and encoding implement overhead. Further, extended this work by Yoo and Choi proposes a new method interleaving PBI code where the bit width and positions of the two parts x and y are dynamically changed. To dynamically change the x and y bit groupings, a device Field Programmable Gate Array (FPGA) is proposed. A given input scheme is divided into several subsequences by heuristic off line algorithm, while considering the run time and power cost overhead of reconfiguring the encoder circuit. When we add extra redundant bit, it can achieve the same effect which can be achieved by exchanging the positions of x and y. the next extended encoding technique proposed by Hong et al., which is Decomposed Bus-Invert Encoding (DBI). It decomposes the bus lines into any arbitrary number of groups so that each partitioned group is considered independently. Chang et al. proposed a look ahead bus invert code which extends the general ideas described by Stan and Burleson. To make better encoding decisions the proposed look ahead BI scheme uses a lookahead table.

Stan and Burleson proposed transition signaling. In transition signaling, when there is a transition on the bus is represented by 1 and for 0 there is no transition in state. A transition signaling produces the signal b(t) which is modulated to be transmitted from source word s(t) by using the XOR operation:

b(t) = s(t) â¨ b(t-1)

The decoder produces the output by doing the operation:

sâ€™(t) = b(t) â¨ b(t-1)

The switching activity is reduced on the bus by ensuring that the number of transmissions is equal to the number of 1â€™s in the input, and proves far more effective when combined with other techniques like LWC. The source words is assumed to have K bits, an M-LWC consist of a set of code words with total number of 1â€™s â‰¤ M as shown in Table 2.1. The minimum number of 1s corresponds to minimum number of switching activity. There is a trade-off between weight M and code N, the lesser the weight M (lower power dissipation), the more the length of the code (higher redundancy). Thus further reduction of average hamming distance between consecutive pairs of transmitted value is obtained by using transition signaling and redundant bus as proposed by LWC approach.

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now