Testable Design Of Array Based Fft Processor

Published Date: 02 Nov 2017

Modern nanoscale technologies and SRAM based field programmable gate arrays (FPGA) in particular, are prone to various types of failures, which effect the reliability and functionality of circuits implemented on them. This article analyses the fault tolerance and reliability, observed in the proposed parallel and adaptive self-healing system which is based on partial dynamic reconfiguration (PDR) and on-line built-in-self-test (BIST). A self test and repair process for FPGA based systems is discussed in the present paper. The proposed fault tolerant system, based on a 2D array of processing elements (PE) with on-line checkers, is tested with a fast fourier transform (FFT) application. Furthermore, to enhance the availability and reliability of the FFT system, a duplex architecture with online checkers based on PDR is designed. Reliability and availability parameters of the system are calculated using Markov reliability model. Results prove that, optimization is obtained in test length, test time, reliability and in hardware overhead due to design for testability (DFT).

Keywords: on-line BIST; parallel delay test; fault tolerance; partial dynamic reconfiguration; FFT; self healing; availability; duplex

Introduction

Field programmable gate array (FPGA) is highly suitable for mission critical, remote and long time automated applications. This is mainly attributed to the ability to integrate complex systems-on-chip (SoC) on FPGA, which can combine speedup of hardware, flexibility of software and unlimited reconfigurations. Furthermore, partial dynamic reconfiguration (PDR) feature in modern FPGAs, can aid in run-time adaptive functionality of these systems (Amudha and Venkataramani 2009; Nedjah and Mourelle 2005; Deliparaschos, Doyamis, and Tzafestas 2008). Hence, these reconfigurable systems can serve as a platform for the design of next-generation fault tolerant systems. SRAM-based FPGA are useful for implementing in-orbit design changes in remote aircraft/spacecraft missions. However, single event upset (SEU) effects can cause transient or permanent bit flipping on SRAM cells (configuration bits and user bits), which in turn can change the function of logic implemented in the FPGA. An SEU on user bits causes a transient error that affect user defined logic and flip-flops. An SEU on configuration SRAM cell leads to permanent error until the original configuration bitstream is re-downloaded into the FPGA. Permanent errors are classified into routing errors, look-up table (LUT) bit-flips, and control/clocking bit-flips (Asadi and Tahoori 2005). Efficient FPGA designs fabricated in the deep-submicron and lesser sub-micron domains are prone to large process variations. These variations lead to large spreads in circuit delay and power, resulting in these designs being more susceptible to permanent faults and delay faults. Hence, at-speed delay testing is crucial to ascertain system performance at its operating frequency. Therefore, many recent research projects focus on design of robust and dependable fault tolerant systems in FPGA.

Previous research works prove that triple modular redundancy (TMR) structure combined with scrubbing, can recover SEU and accumulated errors in the system implemented on FPGA. TMR redundancy introduces performance constraints and thrice the hardware overhead. But this technique is unable to protect the system when multiple failures occur in different redundant copies at the same time. Scrubbing cannot detect SEUs in memory and registers, as their contents change with time. Another disadvantage of scrubbing is the non-optimal error detection latency (Bolchini, Miele, and Santambrogio 2007; Heiner, Sellers, Wirthlin, and Kalb 2009). Many C-testable and built-in self test (BIST) approaches for array based systems which deal with combinational fault testing are discussed (Yamashita et al. 1998; Antola and Sami 1991). Pseudorandom patterns are used for testing iterative systems (Al-Arian, Landis, and Nienhaus 1991). But this scheme is not recommended for complex very large scale integrated systems (VLSI) systems, due to its worst controllability and non-optimal fault coverage. C-testable and M-testable techniques for combinational fault detection of iterative logic array (ILA) based fast fourier transform (FFT) design, with single module fault assumption, are proposed by Antola and Sami (1991) and Feng, Muzio, and Lombardi (1993). But, in these works fault location requires the repeated application of test patterns. Various testing strategies for ILA structures like FFT processors are discussed in the literature (Li, Lu, Hwang and Wu 2000; Jain, Al-Arian, Landis, and Nienhaus 1991; Antola and Sami 1991; Feng, Muzio, and Lombardi 1993). In all the above discussed works, the testing strategy is carried out offline and delay fault detection is not considered. Furthermore, though optimal fault coverage is achieved, efficient fault recovery methodologies are rarely discussed. Many researchers have suggested various reconfiguring mechanisms for fault tolerance of processor arrays (PE) which are based on time redundancy or hardware redundancy (SÃ¡nchez1 and Tyrrell 1998). In time redundancy, the tasks performed by faulty cells are distributed among its spare neighbours. However, this technique offers degraded performance (Sharma, Demara, and Sarvi 2007; Koopman 2003). Fitness-based/population-based evolutionary approaches for fault tolerance create alternative configurations for anticipated faults, as well as at runtime for observed faults (Salvador et al. 2011). This method provides good resource coverage and passive runtime operation, but with high fault recovery time. Drawbacks of these approaches are increased computational capacity, added hardware complexity and system performance degradation.

None of the previously discussed methods deal with on-line detection and recovery of delay faults/ stuck-at faults, along with SEU effects. Furthermore, parallel on-line detection and recovery of multiple module faults are not considered. Therefore, the present research study aims at parallel detection, location, and recovery of multiple module faults, for array based systems at run-time. The parameters analysed in this study are reliability, test time, test power, fault recovery time, hardware overhead and fault coverage. In the present work multiple module failures are considered, with a single fault occurring in a module at a time.

Self-test and repair process for the novel fault tolerant system

The main steps involved in the proposed fault tolerant (FT) system shown in Figure 1 are: (i) test pattern application to the array based FFT system for parallel detection and location of faults (ii) built-in self replacement followed by reconfiguration using PDR for error recovery. The FT system, based on a 2D array of processing elements (PE), is tested with a FFT application as shown in Figure 2. This PE array is designed with on-line checkers for on-line fault recovery. In the present work, an accumulator based built-in pseudo-exhaustive two-pattern generator is used for detecting faults in the PE array (Voyiatzis, Gizopoulos, and Paschalis 2010). A BIST per clock test strategy for pseudo-exhaustive delay testing is followed in this work. [Figure 1 near here] [Table 1 near here]

Pseudo-exhaustive two-pattern testing provides 100% fault-overage. This testing scheme does not require fault simulation. In pseudo-exhaustive delay test, a VLSI system is partitioned into smaller segments and exhaustive delay test of each segment is done. Outputs of circuit-under-test (CUT) depend only on a subset of primary inputs. For an 'n' input CUT with 'm' outputs and cone-size â€˜kâ€™, pseudo-exhaustive testing involves applying exhaustive test to the 'm' output cones (Dasgupta, Chattopadhyay, Chaudhuri, and Sengupta 2001). For a CUT (n=7), exhaustive two-pattern testing requires 16,384 test patterns whereas, (7, 2) pseudo-exhaustive two-pattern testing, requires only 13 test patterns. Reduction in test-vector length, results in reduced test-time and test-power with optimal fault coverage. Test-length depends on cone-size, regardless of the VLSI network size. The reduction in test length due to pseudo-exhaustive approach is given in Table 1. In the present work, improved primary input fan out (I-PIFAN) partitioning algorithm is used to insert appropriate design for testability (DFT) cells in the PE. The DFT cells are required for applying the pseudo-exhaustive two-patterns generated by the BIST circuitry. Depending on the application, a circuit or PE is partitioned for different cone size and fan-out values (Shaer, Al-Arian and Landis 2000; Kumar, Bhaskar, Chattopadhyay, and Mandal 2009; Shaer and Dib 2002). I-PIFAN algorithm bases its partitioning scheme on the primary input cones and fan out of the given circuit. The partitioning is carried out for minimizing the hardware overhead due to DFT. [Figure 2 near here] [Table 2 near here]

In FPGAs, the designer can update/reconfigure only a specific part of the internal structure at run-time using PDR. This allows the FPGA to remain operational, without compromising the integrity of the applications running on other parts of the FPGA. Partially reconfigurable module (PRM) is the part of the design that can be implemented into the partial reconfigurable region (PRR) (Cancare, Bhandari, Bartolini, Carminati, and Santambrogio 2011; Upegui and Sanchez 2005). Using modular PDR, the configuration bit stream of BIST circuitry and FFT modules are loaded according to run-time needs of the application as shown in Figure 2. This ensures on-line testing. The BIST circuitry is also provided with online checker. This is because the BIST architecture forms the crux of the design. Duplex architecture with checkers and PDR ability (refer Figure 1) is designed to increase the availability and reliability of the FFT system. The architecture consists of three PRMs. A microprocessor/IPcore can be used to read the partial bitfile out of the flash and send the data to the internal configuration access port (ICAP). However, in this work a partial reconfiguration controller (PRC) is designed for loading the bitfiles in order to reduce resource utilization. The on-line checker is supposed to detect errors in the PRMs and isolate them. Error signals from all three PRMs are the input of PRC. If an error is detected in the FFT module, the hardware reconfiguration is triggered. If error signal in FFT module persists for more than one clock cycle (indicating hardware reconfiguration has failed), it is send to PRC. Following this, the identification number of the failing PRM is sent into the PRC. The PRC loads the corresponding bitstream from flash memory, for error recovery. Since PRC forms an important part of the FT design, a fault tolerant implementation is necessary. This has not been dealt in the present work.

Testable design of array based FFT processor

The most popular delay-fault models are the path delay-fault model and the transition delay-fault model. For both models, a two-pattern test is required. The test requires a pair of vectors {T1, T2}, where T1 initializes the target node/path and T2 launches the appropriate transition and propagates it to an observable point (Voyiatzis, Paschalis, Nikolos, and Halatsis 1999; Voyiatzis, Haniotakis, and Halatsis 2006). As is clear from the generated patterns depicted in Table 2, test patterns are generated with different cone size. This results in variation of the degree of parallelism for testing. In the current approach, the same BIST circuitry can perform testing of different modules with different input cone capacities. Based on the application, using modular PDR, different modules (here the designed BIST and FFT modules) can be loaded at run-time into the PRM, making this an adaptive fault tolerant design. The BIST circuitry is loaded in the PRM only when the application demands. This provides on-line testing, reduces the device area due to BIST, power consumption due to BIST and test cost. [Figure 3 near here]

The proposed fault-tolerant FFT architecture is based on radix-2 operation. The implementation of Nâ€“point FFT using two-point butterfly module, consists of log2 N stages. Each stage contains N/2 two-point butterfly module. As shown in Figure 2, 2D-array of regularly connected multiply-subtract-add (MSA) modules is formed for FFT calculation (Lu, Shih, and Huang 2005). All bit-level cells have similar complexity and are provided with similar bypass capability for hardware reconfiguration. During testing, identical two-patterns are given to all MSA blocks. During two-pattern test application, it is observed that, since identical inputs of the multiplier lead to identical outputs, subtractor results in constant zero. Subsequently, stuck-at-zero at the subtractor output cannot be determined. Therefore, for optimal fault-coverage, a DFT modification is done by using xor gates to modify the adder blocks and subtractor blocks to subtract/add blocks. This ensures optimal fault coverage. Furthermore, on-line checkers in the form of xor gates (comparators) are added to the FFT array design for comparison of test responses. Redundant modules are provided at each stage for fault recovery. [Figure 4 near here] [Figure 5 near here]

Since, identical pseudo-exhaustive two-patterns are applied to MSA blocks, outputs of each fault free MSA are equal. Hence, instead of traditional test response compaction, fault is detected and located, at the instant when there is a distinct result from rest of the MSAs. On-line checkers in the form of xor gates are needed for comparison of test responses. Thus, this method enables parallelism in fault-detection and location. The amount of hardware complexity involved in test response compaction is reduced. Once the faulty modules are detected, necessary bypass / selection bit signals for multiplexers and switches in the selection logic are generated, in order to replace the faulty module with the redundant module as shown in Figure 3. [Table 3 near here]

Results and discussion

The FT architecture is modelled in Verilog high-speed hardware description language (HDL) using Xilinx12.1i. The simulations of the fault-tolerant circuits are performed using Modelsim. They are displayed in Figure 4, Figure 5 and Figure 6. The function of PRC is verified. Analysis of simulation results prove that the designed system does detection, location and recovery of faults parallely and automatically. Table 3 summarizes the synthesis results for the novel system when realized on Virtex7E FPGA with specifications: speed grade -1L, device-XC7v285tl, package- ffg1157. From Table 4, it is clear that the static power, hardware, and critical path delay overhead incurred due to DFT cells for the proposed architecture is negligible. Dynamic power and area analysis of the entire test strategy is done with Cadence Encounter Tool (0.18Âµm tech). The total power analysis for the system (static, dynamic and leakage) during the entire test strategy is displayed in Table 5. [Figure 6 near here] [Figure 7 near here] [Figure 8 near here]

Hardware reconfiguration for fault recovery is based on simple response checking algorithm that generates the necessary control signals to perform the built in self replacement of the faulty module. A redundant MSA row is included at each stage of the FFT processor (refer Figure 3) provides module level fault tolerance. Therefore, hardware overhead is approximately 1/2N. An alternative is to provide a redundant MSA module in each butterfly. But, this method provides less efficient resource utilization (Li, Lu, Hwang, and Wu 2000). The BIST test patterns can detect errors in online checkers (comparators) also. If a failure is recognized in the checker, then also an error signal signal is generated. The current approach can identify the fault in the switches as well as multiplexers and routing resources. The fault recovery time due to hardware reconfiguration is negligible, due to the minimal switching time. Error persistence even after hardware reconfiguration can be due to SEU or faults in FPGA fabric. It is proved from simulation results shown in Figure 7 that multiple module faults (due to SEU, stuck-at and delay faults) can be detected and located at the same time instant. Error recovery by hardware reconfiguration is also done simultaneously. Figure 8 displays the pseudo exhaustive two patterns generated by the BIST circuitry. [Table 4 near here] [Table 5 near here] [Table 6 near here]

The advantage of testable FFT design is that optimal fault coverage can be achieved by applying 57 test patterns if cone-size of the partitioned circuit is 3, or by 13 patterns if cone-size is 2. Here, for experimental purpose the word-length is assumed to be three. Test length depends on cone-size and is irrespective of FFT network-size. For higher word length processors (like 'w' equals 16), (16, 3) pseudo-exhaustive two-patterns are required. Thus the test-vector length is only 57 compared to 2 32 exhaustive two-patterns. Eventually, the comparison of the present novel approach with previous works is depicted in Table 6. The reliability of the module level fault tolerant FFT array (excluding on-line checkers) in PRM is calculated using binomial distribution:

R m s a = (e -2wÎ»t) w+2 (1)

Where, Î» represents failure rate

(2)

(3)

From the plots in Figure 9 and Figure 10, it is observed that the FFT array provides better reliability. The reliability of FFT array can be further increased with additional redundant modules at the expense of hardware overhead. [Figure 9 near here] [Figure 10 near here]

The Markov reliability model for the proposed architecture is shown in Figure 11. Failure rate (Î») and repair rate (Î¼) represent directed edges between states. The first state represents state of the system, when all functional units (FU) are faultless. When a fault occurs in one FU, the system enters the first state. The system enters this state with a failure rate 2Î». The repair process is initiated. When a fault occurs in one functional unit (FU), the system enters the second state from first state. The second state represents the state of the system, when a fault occurs in two FU. Again, the repair process is initiated. The last state of the system represents the non-operational system. [Figure 11 near here] [Figure 12 near here] [Table 7 near here] [Table 8 near here]

Availability = MTBF

MTBF + MTTR

Availability = MTBF

MTBF + MTTR

Availability (As) and mean time between failures (MTBF) are calculated for Virtex5 XC5VLX50T, from the Markov model representation of the system. The repair rate for the proposed system is smaller than the standard duplex system with online checkers (duplex2ch). This is because the fault recovery process is divided into two phases. The first recovery phase uses multiplexers and switches for hardware reconfiguration. The recovery time for this process is negligible. If this phase fails, recovery is done by loading the corresponding bitfiles of the faulty PRM. In the second phase, the repair rate depends on the time needed for the reconfiguration of the FPGA. It depends on the Virtex5 XC5VLX50T clock frequency and the configuration memory size needed for each architecture. The repair rate parameter is set to 0. 05 hâˆ’1 (Straka and Kotasek 2009). This study assumes an increase in this repair rate parameter by 30% (assuming two redundant MSA modules in each stage), due to the two phase recovery process. The value of failure rate for Virtex5 - XC5VLX50T FPGA is set to 1, 23 âˆ— 10-3 hâˆ’1and 1.85 * 10-4 hâˆ’1 for space and surface applications respectively (Device Reliability Report 2009). The reliability parameter comparisons for our architecture with the standard pre-existing architectures are calculated (Straka and Kotasek 2009). The improvement of reliability parameters for the proposed system from the duplex2ch and triple modular redundancy with three online checkers (tmr3ch) can be observed from the results in Table 7 and Table 8. The improvement in reliability (R(t)) of the proposed architecture from the pre-existing architectures with respect to mean time between failures (MTBF) 1/Î¼ are also shown in the plots in Figure 12, Figure 13 and Figure 14. [Figure 13 near here] [Figure 14 near here]

Conclusions

In this research work, the advantages of combining on-line BIST and PDR for design of FPGA based FT systems is explored. The present study concludes that the proposed design can improve reliability and availability parameters of the FT system, while simultaneously providing less performance overhead. The review of research proves that this unique system is ideal for FPGA implementations of real-time, adaptive and mission critical fault tolerant systems. In the future, the percentage reduction in area and power due to PDR approach will be analysed. The fault tolerant implementation of PRC will be done.

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now