Adaptive Codebook Scheme Encoder Implementation

Published Date: 02 Nov 2017

Figure 2.6: General Encoder Decoder

Hybrid Encoding Scheme, was a conglomeration of multiple techniques and proposed by Ramprasad et al. In this technique two step frameworks proposed by considering the transmission medium is noiseless as shown in Figure 2.6. In the first step, the data is passed through de-correlation source-coding function f1. In the second step, a variant of an encoding function f2 (that minimizes transition instead of average number of bits at output) is employed to reduce switching activity. In the proposed approach, f1 can be a xor, or a different mapping (dbm to the offset code) for the purpose of providing de-correlation f2 can be bus-invert (inv), probability-based mapping (pbm-codeword) with fewer 1â€™s are mapped to source words with higher probability of occurrence. On the assumption that smaller values are assumed to have higher probability of occurrence of value based mapping code (vbm- where code words with fewer 1â€™s are assigned). The xor pbm scheme can reduce switching activity on data buses from 30-40% while assuming that probability distribution of the data is priori available. The xor pbm scheme reduces transition activity by about 5% less than the dbm-pbm because dbm skews input probability distribution more than xor. These schemes were shown as to perform the BI and BI with compression schemes. Benini et al. proposed similar hybrid scheme that combines the BI and T0 codes to create the T0_BI code and the dual_T0_BI code (for time multiplexed buses), which improves the individual BI and T0 approaches in terms of switching activity reduction.

Adaptive Code Book Based Encoding Technique was an extension to Static Codebook Based approach. It was proposed by Komatsu et al. This scheme uses two parts first is index of the patterns in the codebook and second XOR of the source code with the pattern in the codebook. Firstly, a pattern is chosen so that the hamming distance between the source code and the best pattern in the codebook is minimized. This is done by an encoder. Secondly, the index bit (XOR of the current value and the previously transmitted value) is transmitted using transition signals along with the code word (XOR of the source code of the selected with the selected pattern). The codebook is regularly updated with new source code (encoder and decoder knowing the codebook). This approach though was an improvement over BI code, has an overhead time for using additional bus lines for sending code words and codebook tables at the transmitter and the receiver.

Figure 2.7: Adaptive Codebook Scheme Encoder Implementation

Figure 2.7 shows a block diagram of encoder, while Figure 2.8 shows the corresponding decoder. Both the encoder and the decoder know the code book which is updated dynamically with the most recent source word. This scheme has the overhead of using additional bus lines for sending the code word as well as the code book tables at the transmitter and the receiver.

Figure 2.8: Decoder Implementation [45]

Frequent Value Encoding (FVE) was proposed by Basu et al. and Yang et al. This approach makes use of Content Addressable Memory (CAM) and caches and the senderâ€™s and the receiverâ€™s end for storing frequently transmitted data which is maintained to be identical. The entire data is cached. A hit is returned by the index on the table using a one hot code while a miss is returned by value in its un-coded form. When the decoder receives a value and finds more than one hot wire then it concludes that the value is not coded. The decoder might confuse for an un-coded transmitted value to be with only a single high value and all other values to be zero to decode. For prevention, there is an addition of an extra bit control signal from the encoder to guide decoder to the value that must be decoded. The table to be used can be statistically implemented and updated on a miss using least recently used (LRU) or similar replacement policy.

The Tunable Bus Encoding (TUBE) proposed by Sridhara and Shanbagh. It is an improvement over FVE. It encoded the data using repetition of contiguous and non-contiguous bit positions. The chunks of bit positions (contiguous and non-contiguous) of the data value are selected. The tables are used for storing source data bits at encoder and decoder. M-hot code is a value representation which has only one high value. The code field is the segment entry that contains the M-hot code. The data field stores selected bit position in the incoming data value. Both of code and data field are in the tables. It is assumed that complete probability distribution of all pairs of consecutive values to be transmitted is available on the bus. A table is created which consist of three columns, current word, previous transmitted word, and the code. The un-encoded values are transmitted by the encoder over the bus and the bit patterns are stored in its segments for the occurrence of the full or partial bit-pattern. On receiving the un-encoded values, the bit patterns of the data values are stored in the segments of the decoder. The table content of the encoder and the decoder are similar at the end of the every bus cycle. TUBE sends M-hot code for the subsequent occurrence of the repeating bit-pattern. At the two ends of the bus a co-relater and de- co-relater are added for the reduction of the correlation between successive values as in the FVE scheme. A 3-bit time stamp is associated with the table entries and the LRU replacement policy is used for replacing the entries. An additional control signal is used for the indication of the encoded values on the bus in the TUBE scheme. This scheme improved up to 21% on an average in comparison to the FVE scheme for the Media Bench, Net Bench, , SPEC2000 benchmarks.

Probabilistic encoding scheme was proposed by Benini et al., was based on a detailed statistical characterization of the target data stream that aims to minimize the transition on the bus. But this kind of scheme is only feasible for the very small size buses. For large buses, table based scheme is applied by partitioning the bus into small clusters. But the complexity and accuracy are complementary to each other. This is due to the partial toss of the spatial correlation of the bits. On the contrary, if cluster size increases, it will lead to the complex designing of encoder and decoder, with which construction time for the code table also increases. However, we may also use an approximate algorithm for longer buses where only M most probable pairs of consecutive words in the code are considered, for a small manageable value of M. The approximate algorithm performed at least as wide as the approaches of XOR problem and which showed significant improvement. This is important because the limitations of practical applicability of xor-pbm and dbm-pbm. An adaptive version of these encodings was also introduced that didnâ€™t require any significant knowledge of input stream statistics. It works bitwise, rather than word-wise for the codes dynamically, based statistical information observed in windows of size. The proposed scheme works bitwise rather than word-wise for low cost implementation.

Limited Intraward Transition, (LIWT), this was proposed by Salero et al. It reduces switching on LCD buses. The scheme improved upon previous work in two ways. This was an improvement because the pixel differences were transmitted by considering the inter-pixel correlation between images instead of actual RGB values for the pixels as in. However unlike, only transmission is thus larger set by mapping increasingly large intra-word transition count with values of pixel differences with increasingly smaller occurrence probability. This allowed reduction in energy where smaller color depths or less redundancy is used & can be for different standards LCD protocol. Color depth refers to bits per pixel. Redundancy refers to extra bits. The power consumption could also be decreased in LCD buses by some software only encoding approach as proposed by Cheng et al. It used gray code (for addresses) and BI code (for irregular data).Thus the hardware implementation cost of these code saved by modifying software device drivers. Later, Bahari et al. proposed an interface bus encoding techniques where on bus was reduced by correlation between two consecutive frames.

Sector Based Coding. It was proposed by Aghagiri et al. It is somewhat similar to WZE scheme only that the data addresses are more effective. The sector heads were used as identifiers to partition the source word space into number of sectors. These sectors can correspond to the address spaces for code, heap and stack segment of one or more application programs. The source words were dynamically mapped to appropriate sector. The sector word can be determined a prior, or can be dynamically repeated based on the source code that was last encountered.

2.3.1.3 For serial buses

The serial buses consume less part compare to parallel buses [59-61] because the no. of buffers required are lesser besides the need of Serializer and de-Serializer. For reducing the no. of bus line of the conventional parallel-line bus [60] architecture by multiplexing each M-bits into a single line proposed by Hatta et al. The Figure 2.9 shows the bus architecture that transforms an N-bits conventional parallel line bus into an N/M line bus. Parallel link bus has 5.5times more power consumption and 17 times more area compared to serial link bus.

The SILENT technique proposed by Lee et al. to minimize transmission energy on a serial wire. The words are encoded as the XOR between successive data words. It is represented as,

B (t) = b(t) âŠ• b (t-1)

Due to correlation between successive data words the frequency of 0â€™s on the wire increases. The original data word from a sender unit can be recovered by XORing the encoded word and a previously decoded word after de-serialization at the receiver end.

Saneei et al. proposed another encoding technique based on a lookup table (LUT), for encoding the input data before the parallel to serial conversion and after serial to parallel conversion in the decoder for using the input symbols are replaced with code words that have some redundant bits compared to the original symbols, this maps the data with a higher probability of occurring two codes with lower transition.

Figure 2.9: Circuit Structure of Serialized Bus

Redundant bits are added to codes and the n-bits data is mapped to a k-bit code (k>n), to increase the transition reduction using a 2n*k LUT. To save energy during the data transfer on serial bus, data with a higher probability of occurrence is mapped to codes with lower transitions. The new data is mapped to a code word that starts with 0 if the previous code on the bus ends with zero. Transitions due to the first bits can be eliminated with this technique. Another technique called serial line encoding (SLE) by Keu et al. requires two additional bits during code mapping. These bits are appended to the packet to reduce transition.

2.3.2 Schemes for Reducing Coupling Power

In the deep submicron technology, inter-wire capacitance and crosstalk are the major problems which cause power dissipation. Coupling capacitance not only depends on the structural and other characteristics like wire length, wire spacing, wire width etc. but also on the data transitions (,) for below 0.25Î¼m technology. The crosstalk effect on the line depends upon the transition of its neighboring wires. In the Figure 2.10, r is the series resistance per unit length of the lines, CL is the capacitance to ground per unit length and CI is the inter wire capacitance per unit length between adjacent lines. Table 2.2 shows different transitions in the group of three wires. Depending upon the transitions crosstalk type can be defined. Let (Li-1, L, Li+1) be the three lines. There are four types of crosstalk (Type-1, Type-2, Type-3 and Type-4). Type-1 crosstalk happens if Li-1 or Li+1 changes state. Let the coupling capacitance for this type of crosstalk is C. A Type-2 crosstalk happens if Li is in opposite state transition with one of its neighbor wires. The coupling capacitance for type-2 is 2C. A type-3 crosstalk happens if Li goes to opposite state transition with one of its neighbors and there is no change in the other. The coupling capacitance for the Type-3 is 3C. A Type-4 crosstalk will happen if all three wires move to opposite state with respect to each other and with the previous state. Type-2, Type-3 and Type-4 are worst case crosstalk as we see that the capacitances are increases. With the advent deep submicron (DSM) technologies, inter-wire or crosstalk capacitance has been shown in Figure1.14 to lead to significant power dissipation. In addition to its dependence upon technology as well as structural factors such as wire spacing, wire width, wire length, wire material, coupling method, driver strength, and signal transitions and will increase and decrease depending upon the relative switching activity between adjacent bus wires. In fact, for a wire on the bus, simultaneous transitions to opposite values of two adjacent bus lines dissipates about four times as much energy than without considering coupling effects.

Figure 2.10: Coupled Transmission Line Model in DSM [65]

The coupling capacitance of a wire can be classified into four types 1C, 2C, 3C, and 4C, according to the coupling capacitance C of two wires as per Table 2.2. The crosstalk effect on a single wire depends on the signal transition of its neighboring wires. Different classes of crosstalk and their corresponding bit patterns.

Table 2.2 Different Bit Patterns in the Group of Three Wires

Crosstalk type

Time

Bit Pattern

Tt-1

2.3.2.1 for address buses

Macchiarulo et al. proposed scheme based on permutation know as permutation based encoding (PB) that reduces crosstalk for buses without involvement of any encoding/decoding circuitry. This scheme executes permutations of the address bus lines at the layout level in order to reduce coupling at the physical design stage as shown in Figure 2.11. About 26% of average energy is saved for address buses because of PB approach. Saving of average energy can be augmented if this scheme coalesces with encoding techniques that target energy dissipated due to switching of self capacitance. The PB approach is orthogonal with respect to encoding techniques. It is when combined with gray coding yield an average of 46% savings in energy.

A very similar approach as PB for reordering bus lines in order to minimize opposite phase transitions on adjacent bus lines for crosstalk power reduction was proposed by Shin and Saburai [67].

Figure 2.11: Permutation Based Coding

Another scheme known as adaptive address bus coding scheme proposed by Henkel and Lekatsas [68] utilize a similar concept to minimize self and crosstalk capacitance switching for address buses via constructed permutation-based routing.

Figure 2.12: Window Remapping

Initially encoding scheme identifies windows, which are collections of bus lines, after that remaps them so that windows with high transition activities and with little or no transition activities are adjacent to each other as shown in Figure 2.12. Remapping step results in a shielding of high transition lines from each other which reduces coupling between them while transition activities and coupling within a window remains unaffected. After Window remapping step, the self and coupling capacitances within a window are reduced by using a BI scheme [26] which inverts the bits in a window only if it reduces power due to both self-transitions and coupling.

By using an LRU and XOR encoding of transmitted words to reduce self energy, Liu et al. presented address bus encoding schemes with enhanced compression techniques, to reduce crosstalk coupling energy a bit rearrangement and idle bit insertion is used. Figure 2.13 (a) shows a dynamic address compression scheme while Figure 2.13 (b) shows the bus expander (BE) compression scheme. As given in the bus expander (BE) compression scheme the sender stores the higher order portion (tag or T field) of recently transmitted addresses in compression cache. Only the higher order portion of the address is cached for compression because of the highly sequential nature of addresses, their lower order portions change much more frequently than their higher order portions. Index (I field) is used to search the compression cache in BE .On a hit, in a compression cache the sender transmits a control bit(C field), index bit (I field), way bits (W field) and lower order bits (U field) of the original address. On a miss, the entire address is transmitted along with the control bit in multiple cycles. BE compression scheme is combined with bit rearrangement and idle bit insertion to reduce coupling power. Also XOR operations are performed on tag T, U and I fields in order to reduce self and coupling power as shown in Figure 2.13. To determine if a compress address has been received the receiver uses the control bit. To select a tag T, from its local copy of compression cache I and W fields are used. As percentage extra cycle penalty which is due to compression is found to be less than 1%. XOR operations are performed on the tag T in the case of miss for addresses and U and I fields are used to reduce self and coupling power. The W field which is a way number of the tag that hit in a particular line of the compression cache is also encoded using an LRU scheme. This proposed scheme was shown to yield about 14.7% energy reduction on average for compressed address transmission on a narrow bus, compare to uncompressed address transmission on a bus of original width.

Figure 2.13: Dynamic Address Compression (A) Schematic of Dynamic Address Compression (B) Address Compression in Bus Expander

2.3.2.2 for data buses

A large amount of work has been done to reduce the coupling power for data buses. Firstly, BI scheme was proposed to account for the power consumption due to coupling capacitances between wires on the bus. The CBI technique inverts the data to be transmitted, if the coupling effect of the inverted data is less than the original signals, the inverted data send on the bus line. In the Figure 2.14 the encoder consists of three components a predictor, a CBI encoder and a de-co-relater. The predictor is a function of the past k input values and is used by the CBI encoder. The CBI encoder is used to make a decision whether to invert the bits or not. The value of k is chosen as 1, to keep hardware cost low. The CBI scheme is shown in Figure 2.14, to have reduction in power consumption by about 30% with a one-cycle redundancy. Another one was the transition pattern coding scheme (TPC) proposed by Sotiriadis and Chandrakasan [71], which also tried to minimize the inter-wire coupling transitions. An additional bus line is added for transmission which creates a transition matrix to select code word patterns to reduce effective coupling capacitance. The author is proposed to split up a larger bus into smaller groups, as the proposed technique is practical only for smaller buses. The bus partitioning technique used in TPC studied by the Xie et al. and uses a genetic algorithm-based partitioning approach to obtain more energy savings than random partitioning approach.

Zhang et al. proposed another approach for coupling while extending the BI scheme is odd/even bus-invert code (OE-BI). This technique is based on the fact that capacitors are charged and discharged by the activity on the neighboring lines, where one line has an odd number and the other an even number. The coupling activity can be reduced by controlling these odd and even bus lines with the help of two additional lines- the odd invert and the even invert. So there are four possible cases- no bus lines are inverted (00), only odd lines are inverted (10), only even lines are inverted (01), or all lines are inverted (11). But even after this there can be toggling sequences 01â†’10 and 10â†’01 that result in the dissipation of energy four times more than other coupling events. So a targeted two-phase transfer method (TPTM) is employed to remove these toggling events, at the cost of extra delay. Experimental results show that the OE-BI can reduce the coupling transitions by 36% compared to only 17% for the original BI.

An adaptive dictionary based encoding scheme (ADES) given by Lv et al. works by considering the inter-wire coupling. It identifies recurring patterns in the source data and stores them in a dictionary. These patterns can then be represented by a fewer no of bits. The data word is divided into three parts: non-compressed part, index part and the user word. Whenever there is a hit, the index and the non-compressed parts are transmitted. The dictionary is automatically updated on a miss and remains intact on a hit.

An encoding scheme that extends work on bus wire reordering was given by Wong and Tsui. [74]. its aim was further reduction of coupling power and was applicable to data buses with fewer correlations than address buses. The optimal flipping and reordering pattern is proposed by a two phase algorithm. The optimum set of bit lines to be inverted finds in phase 1 and phase 2 rearranges the order of bit lines, due to this cross coupling switching is minimized. The whole two phase algorithm and encoding is performed at the time of compilation in the static variant of the encoding scheme. For the dynamic variant of the encoding scheme only the reordering phase is performed to reduce the overhead of storing the flipping pattern. The decoding information is stored in the program header for both the schemes and fetched/ loaded into an LUT before execution. Dynamic scheme obtains 15% improvement over [62]. Deng and Wong [75] proposed a method which combines BI coding with bus wire reordering [65] to optimize the energy consumption, which is caused by self and inter wire capacitance.

Petrov and Orailoglu [76] proposed an application specific encoding scheme for minimization of power consumption on buses communicating program code between programs and memories. This encoding scheme focuses on heavily executed program segments that are tractable due to their limited size and quite independent of each other. The Figure 2.14 shows the proposed encoding methodology.

Figure 2.14: Application Specific Encoding Methodology

When the compilation phase is over then the application binary code is an applied with particular emphasis on the major application loops (i.e. hot-spots). A cost-efficient, energy-aware encoding is identified and applied on the bit streams formed by each bit position of the particular instruction sequence, which has the objective of reducing the net effect of both the single and the coupled bit transition events. Binary code that has been transformed is stored in the instruction memory, while the particular application-specific functional transformations that have been identified by the proposed post-compile coding algorithm are used to create the decode circuitry on the processor end to restore the original bit sequence. Hence when all the required algorithmic support for the identification of the optimal transformers and obtaining the power optimized instruction is performed off-line while compiling and linking of the program code. The decoding circuitry on the processor front end is the only required hardware support. The approach takes an advantage in terms of the performance impact and hardware area overhead compared to the other low-power bus encoding techniques in the absence of any encoding hardware on the critical memory side and of any bus modifications. The hardware support is programmable and accessible via software, which enables the optimal per-application program encoding for power reductions achieved in a completely reprogrammable way with no need for costly design iterations.

Muroyama et al. proposed a variable length coding compression based scheme which extends the variable length coding scheme approach that has been proposed by the same authors, for the reduction of the self-capacitance switching power through additionally considering coupling power. The N-bit data is being compressed by the variable length encoding with the variable length codes ranging from a minimum length of 1 to a maximum length of M (M > N), and transmits it over the M-bit bus. While the assignment of the codes, the probabilistic information about the source data is being exploited, so as to assign the smaller length codes to more frequently occurring values.

An adaptive low power encoding algorithm based on the weighted code mapping (WCM) was proposed by Brahnbhatt et al. The n-bit original data is being transformed by the WCM into an m-bit low energy code, where m = n + a, with a being the redundant bits. The probability distribution of the source data stream is determined by the code mapping. For the improvement of the energy savings by adaptively changing the code mapping for different data probability characteristics, a window-based adaptive encoding algorithm is proposed. For the achievement of the further energy reduction, the WCM algorithm is combined with the delayed bus algorithm [81]. To avoid the opposite transitions on adjacent bus wires, the delayed bus algorithm delays all the 0-to-1 transitions on a bus by a certain amount of time.

A partitioned hybrid encoding (PHE) is proposed by Jayaprakash and Mahapatra [82], in which the bus is partitioned optimally and the most efficient energy scheme is independently applied to each partition. The BI, OEBI is considered by the hybrid technique and not encoding as the possible option. Using the dynamic programming, the partitioning and choice of applying a particular scheme to a partition is solved. It requires data traffic statistics for tuning. To determine energy savings for all the combinations of valid partitions and the encoding schemes applied to the partitions, the traffic from a set of SPEC2000 benchmarks is used. In maximum self and coupling energy reduction, a configuration is selected on average results.

T. Venkata Kapplyan, Madhu Mutyam and P. Vijaya Sankara Rao [83] propose a scheme which undermines both dynamic voltage scaling and variable cycle transmission mechanisms for minimizing on-chip interconnect energy consumption. They transmitted data using variable cycle transmission method and, based on the delay savings achieved through variable cycle transmission method at regular intervals, scale the voltage and frequency to obtain significant energy savings. They propose a scheme which undermines both dynamic voltage scaling and variable cycle transmission mechanism for on-chip interconnect energy consumption based on delay savings which are obtained by variable cycle transmission methods at constant burst. Using proposed technique for a 5mm interconnect wire the author achieved energy savings of 30% and 45% over the base case in the address bus and data bus, respectively. The technique also reduces the energy-delay-product by 34% and 52% for address bus and data bus, respectively. The approach is to transmit data using variable clock period as proposed in VCT (variable Cycle Transmission) techniques and undermine the delay savings obtained through VCT techniques to apply dynamic voltage scaling for significant power saving as part of this approach. The authors proposed efficient on-chip interconnect design by undermining VCT and DVS mechanism. Energy savings are almost uniform across different benchmarks in the address bus case and are ranging from 27.53% to 31.66% with an average value of 30%. Variation in the energy savings across different benchmarks is less as data transition pattern behavior is almost uniform across all benchmarks. Energy savings in the data bus case are also almost uniform across different benchmarks except for benchmarks Alu and Galgel. In the case of Alu and Galgel benchmarks, they validated the approach by focusing on L1 cache address/data buses microprocessor using SPEC CPU2000 benchmark suite. Delay savings provided by the variable cycle transmission mechanism are exploited while voltage scaling technique is applied so that we obtained significant energy savings as well as delay savings without impacting the throughput.

2.4 ENCODING TECHNIQUES FOR REDUCTION OF CAPACITIVE

CROSSTALK DELAY

Increasing chip complexity and faster clock speed, wire delay is becoming increasingly significant [84, 85] with the shrinking device size. The delay in the long buses highly depends upon the coupling capacitance between the lines. Crosstalk effect takes place when the neighbor lines signal transit in opposite directions simultaneously. Capacitive crosstalk occurs when the cross coupling capacitance is comparable to or higher than the loading capacitance. For such a transition, delay may be twice or more than that of wire transition. This delay penalty is known as capacitive crosstalk delay. It depends on transition activity of the adjacent signals. Type-4 and Type-3 has worst delay characteristics as compared to Type-2 and Type-1. Selective skewing of bus data signals [86], transistor sizing [87], and repeater insertion and sizing [88] is used by few techniques to reduce capacitive crosstalk. To reduce capacitive crosstalk delay encoding is one of the most effective technique. The techniques used for reducing the inter wire coupling capacitance in the previous section do not particularly eliminate any type of worst case crosstalk. Due to this, it will results low power but will not necessarily reduce the delay penalty. Earlier many works [89-94] have proposed the model based on delay on bus. In this model, delay is introduced. This kind of model is applicable on single or pair of bus line. Sotiriadis et al. proposed a crosstalk aware delay model. This model extends the Elmore delay model [96], to account for a distributed model with distributed coupling components. The impact of crosstalk on the delay is data dependent. A specific wire and its adjacent wires introduce different delay when there are different combinations of transition direction of specific wire. In this model a delay is introduced in the bus according to the individual line signal. This encoding scheme eliminates the type-3 and type-4 transitions.

Self-shielding code (SSC) was proposed by Victor and Keutzer [5]. It is for wire delay reduction by eliminating type-4, type-3 and part of type-2 crosstalk. The original data width â€˜nâ€™ is taken and data is encoded into the bit width of â€˜mâ€™ where m< 2n. This approach will be extended by introducing shield wires [97, 98] to minimize crosstalk. To eliminate crosstalk an additional shield wire would be needed for every bus wire. A data bit of 0 is encoded as 00 signal on the wire and 1 is encoded as 10. It is used to prevent adjacent wires from transitioning in opposite direction. This technique achieves its goal by forcing every other wire to a steady value. The purpose of this encoding scheme is to avoid transition in opposite direction in the adjacent lines.

The forbidden pattern coding proposed by Duan and Tirumala [6]. It reduces crosstalk delay by eliminating certain high energy patterns in the transmitted data and by increasing the bus width. For the n-bit bus b1, b2, b3â€¦ bn-1, bn forbidden pattern can be defined as:

b i = v

b i+1 = v

b i+2 = v

Where, 1â‰¤ iâ‰¤n-2, and v Ïµ {0, 1}. Type-4 and type-3 crosstalk delay can be eliminated, if forbidden patterns are eliminated from the bus. It can be done by creating codes of length n + m, when sum up m redundant bits to n bit bus as it does not contain forbidden pattern. Large buses were partition into groups for ease of implementation. To correctly decode the transmitted word a compliment bit is required. By a technique which contains less aggressive crosstalk delay reduction variant only type-4 crosstalk is eliminated. Further extension to this work in [99] in which similar crosstalk eliminates type-2 and type-1 crosstalk and also speed up signal propagation on buses. The bus can be speed up by a factor of 6 while using the type-2 crosstalk canceling technique with an overhead of 200%.

Crosstalk aware interconnects (DYN) is proposed by Li et.al [100]. It uses the faster clock and dynamically controls the number of cycles requiring transmission. The control of number of cycles is based on delay required for the transmission. This is done by the crosstalk analyzer. At the sender side of the bus a crosstalk analyzer circuit is incorporated which supports a variable cycle transmission mechanisms as represented in Figure 2.15 for 32 bit bus. It compares the previous data sent with the current data to be sent with the help of pattern recognition to find out the crosstalk types. To combine the signals indicating the pattern OR trees are used for each of bus wires into signals G4, G5, G6 as these signals denotes the class type. In this scheme, it uses the multiple short clock cycle in place of long clock cycle without using any encoding schemes. The cycles used to transmit data can be change dynamically by controlling the Ready_out. Once the Ready_out is placed on the bus next data to be transmitted to X-analyzer and process repeats but it provide any energy benefit. The DYN has less complexity and overhead as compared to SSC [5], a double spacing scheme [101] and shielding method SHD [101]. DYN provides better result as compared to SSC, DBS and SHD schemes. It has less area overhead as compared to SSC, DBS and SHD. Further, the performance of DYN can be increased by using BI coding.

Figure 2.15: DYN Encoder Implementation [100]

Sainarayanan et al. proposed a scheme to minimize the delay and energy consumption in interconnects. For reduction of crosstalk the authors used an encoding technique, which used 8-bit data and encode it into 9-bit data. It is stored in 9-bit register and XORâ€™d with forthcoming data bits for the reduction of worst crosstalk. For delay minimization the authors used wire tapering with encoding. The results show that encoding scheme reduce delay and the encoding with wire shaping, delay reduces further for different interconnect dimensions. The delay can be minimized by inserting the repeaters.

The aim of the encoders is to reduce the higher crosstalk classes as discussed in previous chapter. The authors proposed the encoder and decoder codes for reducing worst case crosstalk in the combination of three bits. The optimum numbers of the repeater are inserted after the optimum length of the total wires will significantly reduced the delay. Optimum length and number of repeaters can find out by the help of the equation shown as Eqn. (2.1) and Eqn. (2.2) respectively.

â€¦ (2.1)

â€¦ (2.2)

2.5 ENCODING TECHNIQUES FOR REDUCTION OF POWER AND

CAPACITIVE CROSSTALK EFFECT

Low energy set scheme (LESS) is proposed by Baek et al. In this scheme, Authors have used XOR-XNOR (XON type) or XNOR-XOR (XNO type) operation to transmit data. In XON technique, the bus is divided into the group of 4 bits and performs the XOR operation on most significant 2 bits of current data with the previous data. The XNOR operation is performed on least significant 2 bits with the previous data. The use of XON and XNO depends on the encoding rule defined by the behavior of bit sequence to minimize the energy-delay and self-switching on the bus. The LESS technique provides better result in power, energy and delay as compared to BI encoding scheme.

The encoding scheme EN_shield-Ip is proposed by Lyuh and Kim[103]. It extends the SSC [5] it is used to enable simultaneous power minimization and crosstalk delay reduction. This encoding scheme takes the probability graph as an input. Assuming a code length of m-bit (m > n), it creates a codeword graph ensuring that there is no type-4 crosstalk. For each input data the suitable code is found out, which is an NP-complete problem. To assign a code that minimizes transition cost a greedy heuristic technique is used as this an N- complete problem. This scheme consumes less power than original data or any other schemes like BI and Shielding scheme [101] while eliminating worst case crosstalk delay.

No adjacent transition coding scheme (NAT) proposed by Subrahmanya et al. [104] which reduce the power consumption and eradicate worst case crosstalk. A combination of transition signaling [16] and LWC schemes [16, 17] were used in NAT coding scheme. It limits the number of 1â€™s in every transmitted code word to a constant â€˜mâ€™ as this scheme is a variant of m limited LWC technique with adjacent 1â€™s all binary code words are avoided.

Figure 2.16: (n, b, t) NAT Encoder Decoder Schemes [104]

Table 2.3: (6,4,2) NAT codes [104]

Input

NAT code

Input

NAT code

0000

000000

0100

001000

0001

000001

0101

010000

0010

000010

0110

100000

0011

000100

0111

000101

1000

001001

1100

010100

1001

001010

1101

100001

1010

010001

1110

100010

1011

010010

1111

100100

This approach with transition signaling assures that number of transition per transmission is bounded by a constant. In Figure 2.16 a block diagram of the proposed (n b t) NAT code as shown in Table 2.3, in which b refers the number of bits in the input to be encoded, n is the number of bits in encoded output and t decides the maximum number of 1â€™s allowed in output. This scheme removes the worst case crosstalk completely unlike LWC scheme but it does not reduce power as much as LWC scheme.

Sridhar et al. proposed an overlapping coding. It is a type of partial coding technique which divides the bus into sub-channels. The two adjacent sub-channels overlap at their boundary. If â€˜mâ€™ and â€˜nâ€™ is the number of code bits and data bits in the sub-channel, then n data bits are mapped to the central m-2 bits of the code-words, and the boundary bits of data-words form boundary bits of code-words. This technique eliminates crosstalk delay by using forbidden pattern overlapping codes to avoid overlapping from causing crosstalk delay in the boundary bits. To ensure that a mapping with unchanged boundary bits exist from data words to code words, FTOC technique [5] is used. It does not use any shielding wire that is why it takes a less area.

An encoding scheme is proposed by Khan et al. [105]. In this scheme, the incoming data is encoded such that the type-3 and type-4 crosstalk can be eliminated. The crosstalk noise can be reduced. Energy consumption can be expressed by considering the lumped model of the SoC. bus as shown in Figure 2.17. The authors considered the three line of such lumped model and derived the energy function as given below.

Figure 2.17: Lumped Model of On-Chip Bus

Type1 crosstalk occurs when either line-1 or line-3 transit in state. For such crosstalk coupling capacitance is CI. The type2 crosstalk occurs when center wire is in opposite state with one of its adjacent wires or center wire change state. For such crosstalk coupling capacitance is 2*CI. If the center wire changes its state in opposite state of any of its neighbor and other is quit such situation is type3 crosstalk and its coupling capacitance is 3*CI. In case of type4 crosstalk all the wires change its state in opposite direction with each other and for such case the coupling capacitance is 4*CI. type4 type3 type2 are the worst case crosstalk and we have to avoid or minimize the crosstalk. The energy expression for the three bit bus can be depicted as Eqn. (2.3).

â€¦ (2.3a)

â€¦ (2.3b)

â€¦ (2.3c)

â€¦ (2.3d)

Where, are final and are the initial state of the three wires. And E1, E2, E3 are the energy of the three wires. Energy saving can be calculated as Eqn. (2.4).

Energy saving= â€¦ (2.4)

Where, and are the net switching activity of un-coded and coded data respectively. The block diagram for the encoding and decoding method is shown in the Figure 2.18 and 2.19 respectively.

Figure 2.18: Block Diagram of Khan Method

Figure 2.18 (a): N-4 count circuit diagram

Figure 2.18(b): N-2 count circuit diagram.

Figure 2.19: Decoder Block Diagram for Encoded 4 Bits

The encoding scheme is based on the intrinsic property of 4-bit sequence. A 4-bit bus can have sixteen 4-bit sequences (4-tuples). If any one 4-tuple is modulo-2 summed with the functions Z1 (0101) and Z2 (1010) and compared with remaining 4-tuples, it will be observed that one of the two XORâ€™d data will have no type-4 switching with respect to the remaining fifteen 4-tuples. This encoding scheme is implemented for 0.18-Î¼m CMOS technology.

Let An = {an-1, an-2â€¦â€¦a1, a0} be the data word. At any time instantâ€˜tâ€™ data-word can be defined as . The data transmitted on the bus is represented as Ek = {C (An), I} where C (An) represented as coded data and â€˜Iâ€™ denote as the extra bit added. Let the input of the algorithm at time instant â€˜kâ€™ be Ak+1 and Ek and Ek+1 be the output of the algorithm as follows:

Calculate the number of coupling transition in Ak+1 by affixing zero as the extra bit and denotes it as M.

Evaluate

Calculate the number of coupling transition in by post fixing â€˜1â€™ and denote it as

Else

The variable cycle transmission technique [100] is improved by Mutyam et. al. [106]. The authors aimed to minimize crosstalk delay by applying temporal redundancy (VCTR). This scheme reduces the cost of crosstalk analyzer hardware. When the crosstalk is detected, the data is encoded in two words; each word is transmitted with smaller delay. The overall delay produced by this scheme is less. VCTR also achieves lower delay and energy consumption as compared to original variable cycle transmission technique.

Wang et al. proposed a 6b9b encoding scheme to reduce the crosstalk and reduction of active power consumption single ended interconnect. This encoding scheme reduces the power consumption. This encoding scheme was developed to avoid toggling of neighbor signal at same time.

Srinivas et al. proposed encoding scheme using memory-less code to avoid crosstalk, error correction and energy saving. The number of wires being 35% of fundamental bound. It uses low swing signaling, 10mm 32-bit bus in 0.13-Âµm CMOS technology.

The hamming error control code is examined by Rossi et al. found that no power saving is possible by among different hamming codes. Then a novel technique, Dual Rail, is proposed. This technique provides the energy reduction with proper bus layout.

Nigussie et al. proposed Level Encoded two-phase Dual Rail (LEDR) encoding. It provides delay insensitive high performance long on-chip communication using LEDR encoding. Nigussie et al. present the method for the analysis of the crosstalk and signal propagation delay. The authors have seen that the differentially switching dual rail link is faster than two-phase dual rail. At high rise and fall time, and longer wires the inductive crosstalk also plays an important role in the in addition to capacitive crosstalk. There are two types of handshake protocol in the interconnect communication i.e. four phase and two phase hand shaking. The block diagram of conversion of single rail to dual rail encoding is shown in Figure 2.20.

Figure 2.20: Single Rail to Dual Rail and Back to Single Rail Conversion

Throughput of 1-Gbps is achieved per one pair of dual rail wire at 5mm length. Power is consumed less then 800ÂµW at 11mm length of wire. The delay due to worst case switching crosstalk is reduced 3 fold as compared to bundled-data encoded mode. This encoding scheme is simulated in 130 nm CMOS technology.

A spatio-temporal bus encoding scheme is proposed by Avinash et al. to minimize the crosstalk effect. The proposed scheme eliminates the crosstalk classes 4, 5 and 6 among the interconnect wires thereby reducing delay and energy consumption. This scheme has a built-in error detection capability without any performance overhead. The authors have focused on L1 cache address/data bus of microprocessor in 90-nm and 65-nm.

A new encoding scheme Phase-encoding proposed by Dâ€™Alessandro et al. Self timed communication is based on phase modulation of a reference signal. A reference signal can be sent on the number of transmission line and the data can be recovered by observing the sequence of events on the lines. The number of lines will increase. To reduce this problem the authors have proposed the new encoding algorithms which generates symbol dependent matrices. These matrices are used to control the phase of transmission line. Type-2 and Type-3 crossed are eliminated in this scheme.

Courtey et al., proposed a convolution encoder for crosstalk reduction (CECR). It is useful for reduction of delay, power and noise for on-chip buses. Results shows that power consumption reduction reach up to 12% for 10mm bus in 65 nm technology and more if buses are longer. It also allows increasing the data propagation of 20% and the reduction of overall worst noise case transitions of 51%.

Ge Chen, Steven Duvall and Saeid Nooshabadi [107] develop a mathematical model for a memory less encoding scheme in which the encoding and decoding circuitries are implemented in very simple combinational logic. Also, they propose a novel partitioning method for reducing the transition energy dissipated in significant amount by coupling capacitance between adjacent wires in on-chip buses. Specifically, for an 8-bit bus in 65 nm CMOS technology, they present an 11-wire solution that reduces energy dissipation by 22%. The proposed scheme achieves similar energy efficiency without increasing the complexity of the encoding and decoding circuitry when the bus is extended to 16, 32 and 64 bits. The energy model used in the proposed partitioning-based encoding technique is a bus model. For on-chip parallel buses, energy is dissipated in charging and discharging parasitic capacitances. Traditional energy models consider parasitic capacitances between the wires and other metal layers. Low-swing signaling schemes reduce power consumption by not requiring capacitors to be fully charged or discharged at each signal transition. Because, these techniques suffer from abated signal to noise ratio as the supply voltage scales in DSM technologies. In contrast, coding techniques reduce power consumption by reducing the effect of the coupling capacitance between adjacent wires through the addition of redundant bits. In this encoding scheme, encoder encodes m-bits into corresponding m + bits which are further decoded into original m bits. There m and (m + a) bits are the binary combination of 0 and 1 which have corresponding 2m and 2m+a patterns.

In this scheme, we define energy saving as Equation-2.8:

(2.8)

They deduce coding scheme to diminish bus energy consumption in DSM. Furthermore, encoder and decoder are based on combinational logic circuitry used to result in low power overhead.

A. Courtay, E. Boutillon, J. Laurent[108] proposed a new data-coding technique called convolution Encoder for Crosstalk Reduction (CECR). To deal with the issue that Interconnects are now considered as the bottleneck in the design of system-on-chip (SoC.) since they introduce delay and power consumption. It allows the reduction of delay, power consumption (including extra power consumption due to codec and noise for on-chip buses. The concept of the technique is to reduce the switching activity to its minimum considering the transmission of data on the encoded wires. Crosstalk is the effect of coupling capacitance between a victim wire and its neighboring wires and depends on their transition and states. Results presented in the paper show that the transition classification differs if power consumption is considered. Hence, a key point for optimization of power is to encode data such as falling transition is achieved with the lowest crosstalk capacitance and thus consume less energy as possible. Secondly, when the activity profile of data stimuli files is analyzed, it can be highlighted that applying performance optimization techniques on least significant bits has better impact in terms of power. Lastly, the energy saving is taken into account for total system.

The convolutional encoder for crosstalk reduction is newest power optimization technique which aims at lowering as possible switching activity on consuming wires for on-chip data buses. The problem is analyzed on how to encode a sequence of words of m bits so that two consecutive values minimize both the consecutive hamming distance and maximum delay. Architecture of the bus encoding/decoding scheme is shown below in Figure 2.21.

Figure 2.21: Architecture of Bus Encoder/Decoder

Where, Sk is a sequence of m bits and among Sk and Sk+1, hamming distance is calculated between two consecutive symbols and in terms of output. The technique has good efficiency for different technologies and bus lengths. The power consumption reduction can reach up to 12% for a 10 mm bus in the 65 nm technology and more if buses are longer. It also allows the acceleration of the data transmission of 20% and the reduction of the overall worst noise case transitions of 51%.

Ying Zhang, Huawei Li, Xiaowei Li and Yu Hu [109]present a new bus encoding method based on codeword selection for enduring crosstalk-induced effects, which can avoid crosstalk and provide error correction as well. This method finds a subset from crosstalk avoidance code (CAC) to provide error correction. It can avoid crosstalk induced by late signal transition on checking bits in the previous methods. Extra wires for checking bus are never required in the method. Experiment shows that the method reduces 6% wire overhead compared to the former methods and it can also improve bus performance and reduce power dissipation. It is a method which selects a subset with single error correction from CAC. The main algorithm used is Codeword selection algorithm as

It takes m bit input sequence for CAC generation.

If m bit matrix is not a new one, they enhance number of bits by 1. Otherwise, step 3.

A code selection.

If code is enough for selection, we output the code book. Otherwise, next step.

In this, we do matrix transformation and reiterate from step 2.

Unlike Hamming code, which has separate checking bits, there are no checking bits in the proposed code. So error correction should still depend on uniform check matrix. For example, supposing an error occurs on the last bit of the codeword 00111 during transmission, resulting in 00110 at the receiver. If this codeword is calculated according to H.XT=0T again, the result is (h3, h4), and the sum of them is (0, 0, 1) T. The sum is the same as the last vector in the matrix H, which indicates the error position. Therefore, if there is an error on any bit of this codeword, the sum according to H.XT=0T can indicate the error position. The method avoids the crosstalk induced by late signal transition on checking bits in the joint code methods and experimental results show that this method reduces 6% wire overhead compared to joint code methods. H is an mxn matrix, where n stands for number of bits in a codeword and m for dimension of column vectors in the matrix.

N. Satyanarayana, A. Vinaya Babu and Madhu Mutyam[110] propose three delay minimization techniques, namely, data packing (D-Pack), data permutation (D-Perm), and data replication with shielding and two-phase transmission (RESTP). They show that for a 5-mm 32-bit on-chip bus in 90nm CMOS technology, the D-Pack, D-Perm, and RESTP techniques achieve more than 25%, 32%, and 51% delay savings, respectively, in both address and data buses. For a 32-bit bus, the D-Pack, D-Perm, and RESTP techniques require 38, 34, and 48 wires, respectively. By focusing on L1 cache address/data buses, they evaluate effectiveness of these techniques using SPEC2000 CPU benchmark. Every now and then, in on-chip buses, the MSB 16 bit of 32 bit data to be transmitted is same as that of the present data on the bus. For instance, these CPU benchmarks, it happens nearly 99% and 40% of time in address and data buses respectively. One can exploit in spite of transmitting same data again, transmit new data so that the time required to transmit the entire data is reduced. Interleave both MSB and LSB in such a way that MSB is placed between every pair of LSB. Transcending from the above fact, they now propose delay minimization techniques namely D-Pack, D-Perm and data replication. With this, few redundant bus lines they here show that their techniques achieve enormous delay savings over the un-coded bus. By exploiting similarity in the MSB, n/2 bits of on-chip data transmitted on nâ€“bit bus. Authors consider two n/2 bit registers SUH and SLH.

Lingamneni Avinash, M. Kirthi Krishna and M.B. Srinivas [111] proposed a novel spatio-temporal bus encoding scheme to minimize the crosstalk in interconnects that simultaneously addresses error detection requirement also. Propagation delay of on-chip interconnects caused by resistances. Self and inter-wire capacitance of bus lines is becoming more prominent than the delay caused by the gates. A variety of sources of interference such as alpha particles, power grid fluctuations and electromagnetic interference are here becoming more prominent causing an increase in need of active error detection. The proposed scheme eliminates crosstalk classes 4, 5 and 6 among the interconnect wires, thereby reducing delay and energy consumption. Also, the proposed scheme has the feature of built-in error detection without any performance overhead. The algorithm of the method is

For an n bit bus, divide the total number of bits to be sent on the bus into two equal groups of n/2 bits.

Divide the data of two groups in blocks of four.

Apply 4-bit bus encoding technique to avoid crosstalk of type 4, 5 and 6.

The encoded data Tx from the group 1 is sent in first temporal cycle while the encoded data Tx+1 from second group in second temporal cycle.

The effectiveness of the proposed technique is evaluated by focusing on L1 cache address/data bus of a microprocessor using SPEC2000 CINT benchmark suites for 90nm and 65nm technologies. The proposed technique achieves an efficiency of 11% in energy consumption and a reduction of about 33% to 63% in delay when compared to the base case for data transmission and is shown to perform better than the existing bus encoding schemes in literature.

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now