Fault Diagnosis In Mobile Robots

Published Date: 02 Nov 2017

A survey of fault diagnosis techniques in mobile robots

Abstractâ€” Fault Diagnosis is becoming increasingly important for mobile robots as they are deployed autonomously in remote or hazardous environments. There has been lot of research carried out in this area over the past couple of decades. NASA has been focusing on research involving self-diagnosing and self-repairing systems which would be essential in self-sustained long duration human operations. There is also an increasing use of mobile robots as first response agents in hazardous situations. This paper aims to survey the existing solutions available in fault diagnosis in mobile robots and highlight the drawbacks of the current solutions and also list the open research questions in the field.

Index Termsâ€”Fault Diagnosis, mobile robots

Introduction

Mobile robots are increasingly being used in various areas, such as transportation, planetary exploration, search and rescue, mine mapping, demining and nuclear waste cleanup to name a few. Recently, we have seen examples of mobile robots being used in the Fukushima Daichii nuclear disaster where it was too hazardous for human personnel to operate in. We also see examples of mobile robots being used in search and rescue missions in the event of a natural disaster such as an earthquake. They also have very important applications in planetary exploration. So, with these applications the main concern would be the reliability of the robots. Fault detection and identification (FDI) are therefore very important problems in the development of reliable, robust mobile robots.

In the past years, there has been intensive research in the area of fault diagnosis. This survey presents a number of fault diagnosis techniques for detecting faults on-board operating robots, where we define a fault as a deviation from expected behavior. In a recent paper (Carlson and Murphy 2003), the reliability of seven mobile robots from three different manufacturers was tracked over a period of two years and the average mean time between failures was found to be 8 hours. The result suggests that faults in mobile robots are quite frequent. One of the main reasons for this is that components degrade over time and another is that the operators of the robot rarely have complete knowledge of the environment in which it operates and, hence, may not have accounted for certain situations.

As we mentioned before mobile robots are increasingly being used in critical domains where an unaddressed fault could cause severe problems. It is therefore essential for robots to monitor their behavior so that faults may be addressed before they result in catastrophic failure. This monitoring also needs to be efficient as there is limited computational power available on robots. It is also really important for these robots to detect faults in a timely manner, since failure to do so may result in expensive consequences. If faults go undetected, autonomous robots in real-world environments may behave in an unpredictable or dangerous manner. A robot malfunction may also have the potential to initiate larger accidents, especially in certain critical nuclear or hazardous waste operations.

Technically, a fault is an unexpected change in system function which hampers or disturbs normal operation, causing unacceptable deterioration in performance (Isermann and BallÃ© 1997). A fault tolerant system is capable of continued operation, possibly at a reduced performance, in the event of faults in some of its parts. Fault tolerance is extremely important in robots used for planetary exploration such as the Mars Rover. If the robot can be designed such that it can operate even when a fault occurs, then the mission life can be extended well beyond its designed life.

This survey would study the fault detection in autonomous robots. Fault diagnosis and prognosis methods can be broadly classified into three general categories namely data-driven methods, knowledge-based methods and model-based methods (analytical approach). We would cover each of these methods in the next section and then in Section III we would discuss the open research questions in this field where we would also list some of the recent developments in this area.

Background literature

Fault detection is based on monitoring the execution of the robot as illustrated in the paper by Petersson (2005). Execution monitoring is needed in robotics to handle problems caused by uncertainties, both in the robot itself and in the environment. We can basically have four main sources of uncertainty in robotics

Missing information

Unreliable sensors

Stochastic phenomena

Inherently vague concepts

The missing information is there because the world is not totally observable. An example would be that the robot does not know what is behind a door. Unreliable resources are a broken driving shaft, or a map of an environment that is out of date. Stochastic phenomena are the variance in the value of a physical quantity. So, when we measure from the fact that whenever we measure distance through a sonar sensor, there is always a stochastic noise component. Inherently vague concepts come up when the world state or human knowledge is modeled. For example, concepts like "doors are most likely open," or "large red door" are inherently vague concepts.

Also, sources of uncertainty are present at several levels of abstraction in a robotic system. Our goal is to eliminate the uncertainty in the system. However, that is not always feasible as we might have to use expensive sensors which would increase the cost of the system. There are alternatives however such as reasoning about uncertainty and tolerating the uncertainty.

According to Gertler(1991), a monitoring system could be divided into the following three functionalities:

Fault Detection

Fault Isolation

Fault Identification

While fault detection is an absolute must in any practical system, and isolation is almost equally important, fault identification may not always justify the extra computational effort (Gertler 1998). Therefore, most practical systems contain only the fault detection and isolation stages.

According to Chiang et al. [2001] fault detection can be classified to one or more of three approaches, namely: analytical, data-driven and knowledge based. An overview of the existing fault detection systems applied to robotics is given in the coming section.

Analytical methods

The analytical approach is model-based since mathematical models, are used. These models are usually physical models of the robot system. This approach relies on the concept of analytic redundancy. So, basically what we have is two analytically generated quantities, obtained from different sets of variables are compared. The resulting difference, called residual, indicates the presence of a fault in the system. Fig. 1 illustrates the conceptual structure of such an analytical approach.

Figure 1: Analytical approach to fault detection

So, as we can see there are two main stages in this approach, first we generate the residuals and then the decision making block makes the analysis of whether a fault is present in the system or not. The residual signal is given as, r(s) = Huu(s) + Hyy(s), where Hu and Hy must be chosen such that:r(s) = 0 when no fault occurs, and r(s) _= 0 when a fault occurs.

After this the residual is examined for the likelihood of faults, and a decision rule is applied to determine if a fault has occurred. This decision process can be based on, for example, a simple threshold test, on the instantaneous values of moving averages of the residual, or it may involve methods from statistical decision theory. In the literature, there are three different approaches to residual generation: parameter estimation, parity relations, and observers.

Parameter estimation

In this approach a reference model is created by first identifying the systemâ€™s physical parameters in a fault free situation. Such physical parameters could be friction and mass-velocity resistance. The residuals are calculated as the difference between the reference model parameters and the estimated model parameters. This approach is used for fault detection in robot arms (Isermann 1990). The residual is calculated from filtered torque signals. The main advantage with this method is that no acceleration measurement is required.

Parity relations

In this method, mathematical equations referred to as parity equations are compared. The systemâ€™s model parameters need to be known a priori. The residual is then generated by a consistency check between the reference model equation and the system equation generated from measurements. If the equations are not consistent, then a fault is indicated.

Observers

Usually most fault-detection systems used in robotics use this method. Again, the systemâ€™s model parameters must be known a priori. The basic idea is to estimate the system outputs from available inputs and outputs of the system. The difference between the measured output and the estimated output is then used as a vector of residuals. One very commonly used state observer is the Kalman filter. Kalman filters are commonly used for sensor fusion. It is an optimal estimator when the sensor model and system model are both corrupted with noise. This is why Kalman filters are used for fault detection systems. Using one Kalman filter is often sufficient for sensor fusion or fault detection. If we want to isolate the faults, a bank of Kalman filters (Kalman filters in parallel) must be used. For example in the paper by Sukhatme et. al(1998 and 2000) , a bank of three kalman filters was used to detect one of the following faults; a noisy gyro, a broken gyro(stuck to a fixed value), and a broken wheel encoder. In the later paper, the performance of this approach was improved by sending the residuals from the bank of Kalman filters to a neural network which performs the decision making. The first autonomous mobile robot Shakey had a system for fault detection. The fault detection system was used to identify faulty plans. Shakey managed to accomplish tasks such as pushing boxes from one room to another in a physical, but highly structured environment. PLANEX was the STRIPS planner onboard Shakey (Fikes 1971). The plan given from STRIPS was stored in a tabular form called the triangle table as shown in Figure 2.

Figure 2: An example of the triangle table

In this table, the rows correspond to the actions of the plan. The preconditions are given in the cells to the left of the action, and the expected outcome of each action is given in the cell below the action. Fault detection is realized by comparing the measured state to the expected outcome given from the triangle table. An advantage of the triangle table is that not all world states have to be modeled since the plan can be generalized. For example, the action gothru(d1, A, B) is a general model for crossing any door between two rooms. The main drawback of this method is the lack of robustness, since full observability of the robotâ€™s state is assumed.

Data-driven approaches

In data-driven approaches, we do not have a mathematical model of the system. Instead, the information used for fault detection and isolation is directly taken in from the input data. The decision making is often based on statistical methods. An autonomous robotic system such as swarms is a large-scale system. These systems produce a large amount of data. The strength of data-driven approaches is their ability to take this data about the system state and convert it into an approximation of the system parameters. By computing statistic measures, the monitoring system can be improved significantly in large scale systems. The main drawback using this approach is that the performance is highly dependent on the amount and quality of the input data.

The application of statistical theory to execution monitoring relies on the assumption that the characteristics of the data variations are relatively unchanged unless a fault occurs in the system. It implies that the properties of the data variations, such as the mean and variance, are repeatable for the same operating conditions, although the actual values of the data may not be very predictable.

The data-driven approach can be divided into univariate statistical monitoring, where only one variable is measured at a time and multivariate statistical monitoring, where several different variables are measured and combined.

In model based approaches, the residual signal is often compared to a given limit value. If the measurement increases a threshold limit a fault is indicated. This method is called limit value checking and is the most frequently used method for decision making. However, setting tight threshold limits result in a high false alarm rate and a low missed detection rate whereas limits that are spread apart would give a low false alarm rate and a high missed detection rate. To overcome this problem a univariate statistical approach is used to determine the threshold for some observation variables. The Shewart chart (Montgomery et. al. 1985)(Figure 3) is used to visualize the limits while minimizing the number of false alarms and missed detections.

Figure 3: An example of Shewart chart showing the measure of a certain observation variable x(t). A fault is detected either when the measure exceeds the upper limit or lower limit.

This method was used in developing a robot control architecture called the Thinking Cap (Parsons et. al 2000). This was used to generate plans for an autonomous robot to move through an environment. The control architecture was based on fuzzy logic. A fuzzy composition of several variables is used to detect faulty plans (again based on whether it crosses a threshold). If a fault is detected, then planning is repeated.

An example of multivariate statistics is illustrated in the paper by Kawabata et. al (2002). Basically here, the motor current of a mobile robot is monitored. The limit is derived from univariate analysis when the robot works under normal conditions. The work in the paper shows that the performance of the fault detection unit is not very reliable when only one variable is measured. This is because when the robot moves up a slope, the motor current increases even though there is no fault. To overcome this issue, the gyro measuring the pitch direction is also measured. This way the motor current limit is tuned on the basis of the pitch reading which makes the system more robust.

Again, in the paper by Dearden et. al (2004) a fault detection system based mostly on Kalman filters and observer based detection is presented. This detection system is to be employed on a planetary rover. The system is described as a discrete-time hybrid automaton. However, the main difference here is that instead of defining the different states in the automaton from first principles, the transition functions between states are given from statistical measures from several observed variables. The fault isolation is achieved using particle filters. A particle filter is a Markov chain Monte Carlo algorithm that approximates the belief state using a set of samples (particles), and keeps the distribution updated as new observations are made over time. The faults addressed include: actuator faults, such as broken motors or gears; faults due to environmental interactions, such as a wheel stuck against a rock; and sensor faults, such as broken encoders.

Knowledge-based approach

In this method, the problem-solving behavior of human experts is simulated. This approach is actually a combination of the previous two approaches because it combines both analytical and data driven approaches into a hybrid fault detection unit. These approaches can basically be divided into three categories: causal analysis, expert systems, and artificial neural networks.

3.1) Causal analysis

Causal analysis methods are based on causal models of fault-symptom relationships. These methods are primarily used for fault isolation. Basically fault isolation using causal models usually involves the use of a signed directed graph (SDG). An SDG is a map that shows the relationship between process variables. The nodes can represent process variables, sensors, system faults, or component faults. When an SDG is used for fault isolation, upper and lower threshold limits for each variable must first be defined. The limit values may be found using univariate approach as discussed previously.

So, basically when a measured variable is normal, the node takes the value 0.When the variable is higher or lower than the given upper or lower limit the node takes the value of + or âˆ’, respectively. Assuming that a single fault affects only a single root node and that the fault does not change other causal relations in the SDG, the causal linkages will connect the fault origin to the observed symptoms of the fault. So, this implements fault isolation as well.

In the work by Kaminka et. al (2002), the fault detection system Overseer, monitors multi-agent systems, that is, several robots in parallel. In Overseer the agents are modeled as a signed directed graph. This graph is a map showing the relationship between belief and the status of different agentsâ€™, and it also reflects the behavior of each agent as well as the full team hierarchy. Each node in the graph corresponds to the known states of all the agents. According to the authors, state information can be obtained through:

â€¢ Team-memberâ€™s communication (report-based monitoring);

â€¢ Observation (overhearing) of team-members.

As there are many problems involved with communication between agents, the authors focused on observation of team members. The key idea here is the use of various models of social relationships among agents, rather than goal-oriented models of the tasks. For example, a driver may not see a road-sign that tells it to turn, and therefore keeps on going straight ahead. But if the driver is driving in a convoy, where everybody shares the goal destination, the driver can infer the existence of the road sign that everyone else is communicating.

Another novel approach is presented in the paper by Hamilton et.al.(2001). In this paper a heterogeneous knowledge-based approach, named Recovery, is presented. Different types of knowledge bases are linked together with the use of partitioned semantic networks. Here the knowledge is classified into five different types, namely: design knowledge, sensor knowledge, historical knowledge, mission knowledge, and fault knowledge. The main advantage in this approach is that a priori fault isolation knowledge gathered from human experts can be combined with available sensor knowledge. Recovery has been tested on a real autonomous underwater vehicle (AUV). In the experiments, a broken lift thruster is isolated.

3.2) Expert Systems

Expert systems are used in the area of fault isolation. Expert systems are used to imitate the reasoning of human experts when isolating faults. The knowledge in the expert system is often formulated in terms of IF-THEN rules,

IF condition THEN conclusion, which can be found from first principles or a structural description of the system for isolating faults. Expert systems based on logic using terms that are either true or false are relatively sensitive to uncertainties.

One way to overcome this problem is to use fuzzy logic.

Fuzzy logic provides an approximate, but still effective, means of describing complex systems by using graded statements instead of strictly true or false. The monitoring system within the behavior-based Saphira architecture (Lamine et. al. 2000 ) is designed as an expert system implemented in linear temporal logic (LTL). In this approach a set of temporal fuzzy rules, are used for fault detection and isolation.

3.3) Artificial neural networks

Artificial neural networks (ANN) were inspired from the study of the human brain, which is made up of millions of interconnected neurons. Again, the main reason for using this approach is to represent systems that are too difficult to model mathematically. The main drawback is that their performance is highly dependent on the amount and quality of the input data. Several authors have applied ANNs to fault detection and isolation.

As mentioned, the main drawback using learning approaches for execution monitoring is that the training data must provide an adequate coverage of all the fault situations. This is not a feasible solution, because some of the fault situations may be catastrophic to the robot or its environment. An interesting technique to tackle this drawback is "learning from simulation" [Peterrsor et. al. 2005] which lets the fault detection system learn the fault classes safely in simulation, and then use the system to monitor a real robot.

In the next section we will discuss some of the open research problems that are still present in the area of Fault Diagnosis in Mobile Robots and some of the current research being conducted in this area.

Open Research Problems

Fault diagnosis in mobile robots is still a very active area of research. As mentioned earlier, mobile robots are making their way into our day-to-day lives. They are being used in a variety of applications, from first response agents in hazardous environments to being used in planetary exploration.

When we deploy mobile robots in such remote environments, it is imperative that they some form of fault detection on board so that they can continue working without needing repair. Repair would need human personnel and this would defeat the purpose of using robots in the first place. In this paper, we discussed several approaches to the problem of fault detection. We learned that fault detection monitoring can be categorized into one of three classes, analytical, data-driven, and knowledge-based.

Analytical approaches rely on the concept of analytical redundancy. This approach is the most frequently used approach for fault detection in robotics, including both robot manipulators and mobile robotics. Analytical monitoring as we saw earlier can be divided into: parameter estimation, parity relations, and observers, where the latter is the most common approach for analytical monitoring. Analytical approaches are preferable when the monitored system is well understood and the amount of uncertainty is limited. Many parts within a robotic system are designed with the help of basic models and this knowledge might as well be used when designing the monitoring system. Considering the example of a wheel motor controller, we can use the analytical approach such as faults in a wheel motor controller, such as a broken driving shaft, or a broken wheel encoder. The main problem with this approach is the assumption that a precise mathematical model of the system is available. Deriving a precise mathematical model of a complex non-linear system such as mobile robots is definitely not a trivial task. In the paper by Halder and Sarkar(2007), they take care of this by considering a non-linear analytic redundancy scheme based on the parity relation method. Another example is a behavior based controller where the combination of different reactive behaviors may produce an emergent behavior that is very difficult to model.

In contrast, data-driven approaches do not rely on analytical models. Instead, the information used for monitoring is derived directly from input data. The decision making is often based on statistical methods. However, the assumption here is that the characteristics of the data variations are relatively unchanged unless a fault occurs. This may not hold in an actual robot. The main strengths of the data-driven approaches are their robustness to uncertainty, and their ability to predict the value of a system variable on the basis of the system output without fully understanding the system behavior. An example of a fault that could be detected and isolated by the data-driven approach is a low battery charge level that causes increased noise in the sonar readings. The main drawback using this approach is that the performance is highly dependent on the amount and quality of the input data. If sufficient data is not available, then this will result in sub-optimal fault detection and isolation.

The knowledge-based approaches as mentioned are a combination of the previous two methods. The main strength using the knowledge-based approaches is probably their ability to combine analytical and data-driven approaches in a hybrid -monitoring system. So, they can be used to model complex autonomous robotic systems.

Another issue with this field is that is it difficult to compare the performance of different approaches. If the same quantifiable criteria were used when evaluating the performance of a monitoring system, a comparison between different ideas would be easier. Gertler suggests that the following criteria could be measured when evaluating execution monitoring (Gertler 1998), reaction speed, that is the ability to detect faults with reasonably small delays, robustness, that is the capability to operate in the presence of noise, disturbances and modeling errors, with few false alarms and isolation performance, that is the ability to distinguish between faults.

Today, a lot of research is also being done on diagnosis in multi-robot teams. There is a lot interest in this area, as deploying multi-robot teams would be the next step in autonomous mobile robots. Teams of robots could perform a lot of tasks. The ability to self-diagnose and self-repair is becoming necessary in a range of man-made systems for which in-situ repair/diagnosis by a human operator is not feasible within an appropriate time frame (Kutzer et. al 2008). Cooperative multi-robot teams address critical problems that are beyond the capabilities of current, standalone robotic systems.

In the paper by Bererton et. al (2001), a multi-robot team capable of self-diagnosis and self-repair is proposed. The key application, according to the authors would be to set up a robot colony on another planet which would help prepare a station for human habitation as shown in Figure 4.

Figure 4: Possible tasks for a self-sufficient team of robots.

In a well-designed multi-robot team, a failure in one member will not result in total system failure. For example, the distribution of a mission-critical sensor package over multiple independent robotic vehicles enables sampling of at least part of a desired dataset despite, in the worst possible case, the complete loss of one or more of the robotic delivery platforms. Moreover, the source of malfunction can be diagnosed by other members of the team, and possibly repaired in situ using spare parts carried by the robots. In the future, it is conceivable to take this concept even further by producing robots capable of complex repair procedures utilizing in situ resources. In both scenarios, the overall robustness and reliability of the system are greatly improved when multi-robot teams are used. In comparison, critical failures in a standalone robotic system can have a dire effect on its operational capabilities. In a team-diagnosis and team-repair scheme, using a combination of spare parts and in situ resource utilization (ISRU) robotic teams can, in principle, produce greatly extended mission lives.

When applied with a high level of autonomy, repair strategies that require communication with a human operator can be minimized. This is especially applicable to the robotic exploration of space. These missions carry both a high cost and a high risk of failure. By sending multiple robots with capabilities including diagnosis and repair, the risk of failure following successful deployment in space or on planetary surfaces is drastically reduced. Furthermore, the resulting extended mission life and hardware sustainability can make missions like these far more effective.

Conclusions

Mobile robot fault diagnosis is a rapidly expanding area of research. Observer-based monitoring is by far the most common approach for fault detection within robotics. When several sequential states are modeled, causal analysis, often realized by a directed graph or a state machine, is the most common approach. Only in the cases when it is very hard to develop an adequate model of the system, a data-driven approach is tested. The reason why analytical monitoring is the most common approach could be the superior understanding of the underlying concepts which results in a far more robust model. An analytical model created from first principles is easier to maintain as well, compared to black box approaches where the information is hidden. So, if our system is behaving erratically it is far easier to debug the analytical model rather than the data-driven model. However, in most systems it is difficult to come up with a mathematical model of the system without making certain assumptions or linearizing the non-linearizing system. An autonomous mobile robot is a highly non-linear system and using a linear mathematical model could cause errors in the fault detection. Although most monitoring systems applied to robotics are based on analytical redundancy, the other approaches have some complementary advantages. Therefore, data-driven approaches together with knowledge-based approaches need to be further studied. Right now, the current focus is on developing fault detection systems for multi-robot teams. The key idea in multi-robot teams is that of distributed diagnosis where rather than having one central diagnosing system we have multiple diagnosing agents which make the system more robust.

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now