View Invariant Action Recognition Computer Science Essay

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Abstract

View invariant movement recognition method based on adaptive neuro fuzzy inference system (ANFIS) is proposed. ANFIS is an intelligence system which combines both fuzzy inference system and neural networks. Human body posture prototypes are identified by self organizing map. Fuzzy inference system is proposed to identify the action classification. This method maps, set of input data onto a set of desired output and it matches the data from the training and testing phase. Bayesian framework to recognize the various kinds of activities and produes the recognition results.The proposed method is able to discover different instances of same action performed by different people in different view points.

Keywords-ANFIS, bayesian frameworks, human action recognition, multilayer perceptrons, self organizing map, view invariance.

Introduction

Recognition of human actions from video sequences has many applications in the video surveillance and monitoring [2], human-computer interaction [3], content based video retrieval [3], analysis of sports events and more. The term action refers to a single period of human motion pattern for a period of time. Action is discriminate from activity. An activity is continuous event of small atomic actions. For example the activity jogging consists of the following actions walk, run, jump etc. Recognizing the human action is a very challenging problem because the actions can look very different depending upon the context such as same actions with different garbs, same action performed by different people in different viewpoints or different people performed the same action but it may appear in various ways [1].

Representation of human action is used to match the similarity of all human body poses by a self organizing map (SOM) in a neural network. In the training phase SOM is used to produce action independent posture prototypes. In the testing phase Adaptive neuro fuzzy inference system is used to classify each testing sequence of the most probable action type according to the model built in the training phase. It uses the fuzzy rules and the membership functions in the testing phase [5]. Fuzzy inference system is proposed for the action classification. Fuzzy inference system can be trained to develop fuzzy rules and determine membership functions for input and output variables of the system. Bayesian Framework is to recognize the unknown actions and also produces the combined recognition results with high classification accuracy [6].

RELATED WORK

S. Ali and M. Shah proposed kinematic features for Action recognition. It represents the complex human action in videos. Kinematic features not view invariant because the same action viewed by different viewing angle. Occlusion will also affect the performance of the action [7]. H. J. Seo and P. Milan far proposed regression kernel analysis. It captures the data even in the presence of misrepresentation of action and error present in the data. It also finds the similarity actions and does not need prior knowledge about actions [4]. M. Ahmad and S. Lee proposed Hidden Markov Model. This method based on view-independent human action recognition using body silhouette feature, optical flow feature, and combined feature. Based on this feature, we can recognize the action from arbitrary view rather than the specific view [8].

N.Gkalelis et al proposed fuzzy vector quantization (FVQ) and linear discriminant analysis (LDA). This method allows view independent movement recognition, without the use of calibrated cameras and different movements are represented and classified. LDA reduces the dimensionality of the multiview movement video features.This method is efficient because low dimensional feature achieves good recognition rates.It finds only the linear combination of features in a classes of objects or events [9]. F. Lv and R. Nevatia proposed Pyramid Match Kernel algorithm.It improves the matching score between two similar feature sets.It achieves comparable result and lower computational cost.It has been applied to object recognition. But the single view action classification need large number of parameter to solve the ambiguity of the classification [10].

S. Yu et al proposed appearance based gait recognition.It is valuable for robust gait recognition system.This method is not suitable to recognize the human action from side view and also from various viewing angles [11].D. Weinland et al proposed principal component analysis (PCA).It is commonly used to reduce the higher dimensional features into lower dimensional features. It is useful for view invariant recognition for larger class of primitive actions.It does not perform linear separation and linear regression of classes and it does not perform the similar human actions also [8].

.

MATERIALS AND METHODS

A. Experimental Setup

Human action recognition is automated detection of ongoing events from video data. Action recognition is the finding of video segments containing such actions. Video segment is used to display the properties of the actions. The video sequences are collected from Weizmann datasets [12].The video sequences are converted into frames and stored in the database. It contains the actions such as Bend, walk, run, jump and wave two hands etc .

The proposed method consists of identification of posture prototypes, testing of data with ANFIS method, Action classification and action recognition. The block diagram of proposed method is as shown in the fig.1.

The diagram shows SOM is used to train the data in the training phase. After classification of SOM, Fuzzy inference system is used to test the data. Finally Bayesian framework is to recognize the actions.

Fig 1.Overview of the proposed method

B.Methods

Preprocessing phase

In action recognition, elementary action video sequences are converted into video frames. Moving object segmentation techniques [13, 14] are used to create binary images. Background subtraction is a widely used approach for detecting the moving object. After the background subtraction, person’s body is extracted and produces the binary posture frames with the same size. Continuous movement of bend posture is shown in Fig 2.

C:\Documents and Settings\acer\Desktop\anitha output\bend\10bend.JPG

Fig 2.Posture images of bend actions

Identification of posture prototypes

In the training phase action videos of the posture vectors are clustered to a fixed number of classes using a self organizing map (SOM) algorithm [15]. It is a special class of neural networks, because it is based on competitive and unsupervised learning. The SOM is used to identify and grouping different portion of images with similar features. The SOM are also based on competitive learning where the output neurons in the network complete among themselves and produce one output neuron. An output neuron that wins the competition called best matching neuron.

The training process for constructing the SOM is based on the following procedures.

1) Intialization: Weights are initialized randomly.

2) Sampling: Produce the sample X and give it to the network.

3) Similarity Matching: The winning neuron N is mapped with the weight of the input vector. It is considered as the best matching neuron.

N = argmin (j) {X-Wj}

4) Updating: Adjust the parameter of the neighbourhood function.

∆W = η. hij (X - Wj)

Where hij is the neighbourhood function, η is the learning rate parameter.

The algorithm is trained upto 100 iterations. This algorithm is to generalize over the data that was not trained on. This procedure is applied multiple times.

Testing and classification of data with ANFIS method

In this phase, the user gives an input posture image for which the corresponding output image is tested. Here the input data is normalized and then checked with the ANFIS method. It uses the sugeno type fuzzy inference system for training routine. It utilizes the automatic identification of fuzzy rules and membership function parameters [5].

FIS classifier

Fuzzy Inference System using a strategy of hybrid approach of adaptive neuro-fuzzy inference system. It is the hybrid approach to identify the parameter of sugeno type fuzzy inference system. In action classification, FIS classifier completed the training of data upto 100 epochs. Once trained, the FIS can classify each testing sequence and then produce the most probable action type according to the model built in the training phase. Finally correctly classified sequences give the final action recognition accuracy.

Action Recognition

During recognition, video sequence in input frames are segmented using background subtraction and the features are extracted.The input frame is compared with the posture stored in the database.If a similar posture is available, this posture is assigned as the class label to the current frame.If there is no similar posture available, the frame is added to the initially empty database.

In the Bayesian framework case, the human actions are fed to the FIS classifier to recognize the corresponding action that provides the highest cumulative probability according to the Bayesian decision [6].It produces the combined recognition results with high classification accuracy. The best recognition result is presented in the confusion matrix as shown in Table 1.Finally it can recognize the action such as bend, walk and run etc.A recognition rate equal to 86.66% was obtained for Bayesian approach.

RESULT AND ANALYSIS

The results and discussions of the human action recognition is based on bayesian framework. The recognition process can be divided into two phases. In the training phase, SOM is trained and matches the similarity of all human actions. In the testing phase, FIS is used to classify each testing sequence and produce the most probable action type.

In action recognition video sequences are collected from the Weizmann datasets. Here 20 videos from the Weizmann datasets are used for action recognition. Each video describes one human performing one action. The video sequence are converted into frames and stored in the database. It contains the action such as bend, walk and run.

The input image is taken from the database as shown in Fig 3(a).The grayscale image is converted into a binary image using edge detection method. It detects the wide range of edges in image. The binary image is as shown in Fig 3(b).The binary image is segmented for clearly represent the action. By using segmentation techniques actions are easier to analyze. It is also used for extracting foreground form background model. The segmented image is as shown in fig 3(c).The input image is matched with the actions in the database. Here the input image is matched with the posture stored in the database. If a similar posture image is available, this posture is assigned as the class label to the current image. If there is no similar posture image is available, the image is added to the initially empty database. Finally matches the similarity of the action and recognize the actions such as bend, walk and run.

C:\Documents and Settings\acer\Desktop\Questions\anitha output\bend\bend12.JPGC:\Documents and Settings\acer\Desktop\Questions\anitha output\bend\bend10 correct.JPGC:\Documents and Settings\acer\Desktop\Questions\anitha output\bend\bend11.JPGC:\Documents and Settings\acer\Desktop\Questions\anitha output\bend\bend12.JPG

C:\Documents and Settings\acer\Desktop\Questions\anitha output\walk\walk2.JPG C:\Documents and Settings\acer\Desktop\Questions\anitha output\walk\walk4.JPGC:\Documents and Settings\acer\Desktop\Questions\anitha output\walk\walk1.JPGC:\Documents and Settings\acer\Desktop\Questions\anitha output\walk\walk2.JPG

C:\Documents and Settings\acer\Desktop\Questions\anitha output\run\run1.JPG C:\Documents and Settings\acer\Desktop\Questions\anitha output\run\run4.JPG C:\Documents and Settings\acer\Desktop\Questions\anitha output\run\run2.JPGC:\Documents and Settings\acer\Desktop\Questions\anitha output\run\run1.JPG

(b) (c) (d)

Fig 3. Action recognition (a) Input image (b) Binary image

(c) Segmented image (d) Matched image

Analysis

Bayesian approach is used to recognize the action and the result is presented in table 1 by using the confusion matrix [16]. It contains information about actual and predicted classes. In this matrix rows represent the actual classes and columns represent the predicted classes. The diagonal elements represent the number of correct classification and the off-diagonal elements represent the error.

TABLE I

Confusion matrix for three actions

Bend

Walk

Run

Bend

19

1

0

Walk

2

16

2

Run

1

2

17

Overall correct classification rate is equal to 86.66% for Bayesian approach. Action which contains different body poses like bend is almost perfectly classified.Similar body poses like walk and run are difficult to be correctly classified.

Performance Metrics

Performance metrics compare the strength and weakness of different classifiers by computing the precision, recall and F1 metric [16].Performance metrics and accuracy results are described in the following.

Accuracy

It is the measure of all correct classifications divided by the total classification.

Accuracy = TP+TN / TP+ TN + Fp +FN

The overall accuracy of the human action recognition is 86.66% as shown in table II.

Precision

It is the measure of specific cases predicted based on positive classes.

Precision = TP / (TP+FP)

Recall

It is the measure of positive cases that was correctly calculated. It is also called sensitivity. It is similar to the true positive rate.

Recall = TP / (TP+FN)

TABLE II

Performance Metrics

Metrics

Bend

Walk

Run

Precision

Recall

F1

Specificity

0.8636

0.9500

0.9047

0.9250

0.8421

0.8000

0.8205

0.9250

0.8947

0.8500

0.8718

0.9500

F1 metric

Figure of metric or F measure is the weighted mean of precision and recall.

F1 metric = 2(Recall*Precision) / (Recall + Precision)

Specificity

It is the measure of negative cases classified correctly. It is same as the true negative rate.

Specificity = TN / (TN+FP)

Performance metrics for human action recognition is as shown in table II.

V.CONCLUSION AND FUTURE WORK

View invariant action recognition method based on adaptive neuro fuzzy inference system to solve the generic action recognition problem. ANFIS is the very useful tools to train the images. It is a quick and straightforward way of input selection. SOM is constructed from the dataset processing and training the data and the input query is tested which is based on ANFIS. FIS classifier is used for classifying the given actions. It measures the similarity between images and produces the classification results. Bayesian approach is to recognize the human actions using a single video samples. This method also recognizes the continuous human action.

In future, this method can detect the human interaction between persons and also calculate the abnormal representation of human actions.



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now