The Unsupervised Neural Networks

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Artificial Neural Networks are a programming paradigm that seek to emulate the micro-structure of the brain, and are used extensively in artificial intelligence problems from simple pattern-recognition tasks, to advanced symbolic manipulation. Neural networks have been shown to be very promising systems in many applications due to their ability to "learn" from the data, their nonparametric nature and their ability to generalize. Applications of neural networks are in finance, marketing, manufacturing, operations, information systems, and so on.

4.1.1 Background

To simulate the functioning of brain on a computer, the differences between human brain and computer are resolved by focusing attention on the following properties (Denker, 1986).

The human brain consists of approximately 1011 or 100 billion neurons which are interconnected via a dense network of connections. Every neuron is considered as a simple processing element that operates very slowly to perform tasks (McCulloch & Pitts, 1943). Every connection has its own weight. The human brain is very robust and fault tolerant: every day neurons die but the brain continues to function.

A computer has one or a few very complicated processors which consist of 10' to 1010 transistors for memory and logic functions. Each transistor can be regarded as a very simple computing (switching) element that switches very quickly(10ns).Computers are designed in a very hierarchical way and are not fault tolerant. A fault in one transistor or in few transistors is sufficient to make the machine useless.

A computer is superior in calculating and processing data. To perform complex tasks, the computer has to be efficiently programmed. Human beings are less efficient in doing fast computation. Human beings are capable of doing tasks like association, evaluation and pattern recognition because they learn how to do these tasks. Computers are very awful in doing data mining tasks.

Many researchers have contributed a lot of work to resolve the differences between human brain and computer that leads to the development of new type of networks called connectionist models or neuromorphic systems or neural networks (Lippmann, 1987a).They found that the connections between the processing elements will define the behavior of computer and brain. The connections between the neurons hold the information and a change in the interconnection will cause a change in the stored information. So the machines are implemented with all these demands which are very good in doing "human" tasks.

4.2 Biological neuron

Figure 4.1 Biological Neuron

A neural network consists of artificial neurons. A simple highly idealized (biological) neuron is shown in Fig. 4.1. It consists of a cell-body (B), dendrites (D) and an axon (A).

A neuron has a roughly spherical cell body called soma (Figure 4.1). The activation signals generated in soma are transmitted to other neurons at different locations through an extension on the cell body called axon or nerve fibres.The axon is about 1m long. The connection between two neurons is called a synapse. A synapse is either stimulatory or inhibitory. Stimulatory means that an incoming signal raises the activity level of the neuron. Inhibitory means that the incoming signal lowers the activity level of the neuron. The extensions around the cell body like bushy tree are the dendrites that are responsible from receiving the incoming signals generated by other nerve cells (Noakes, 92). A neuron collects all input signals. If the total input exceeds a certain threshold level, the neuron fires i.e. it generates an output signal. The threshold level governs the frequency at which the neuron fires.

When real neurons fire, they transmit chemicals (neurotransmitters) to the next group of neurons up the processing chain alluded to in the previous subsection. These neurotransmitters form the input to the next neurone, and constitute the messages neurones send to each other.

These messages can assume one of three different forms.

1. Excitation - Excitatory neurotransmitters increase the likelihood of the next neuron in the chain to fire.

2. Inhibition - Inhibitory neurotransmitters decrease the likelihood of the next neurone to fire.

3. Potentiation - Adjusting the sensitivity of the next neurons in the chain to excitation or inhibition (this is the learning mechanism).

4.3 Perceptron

A McCulloch Pitts Neuron is a mathematical model of a simulated biological neuron. It essentially takes in a weighted sum of inputs and calculates an output. Rosenblatt (1958-1962), an American psychologist defines a perceptron (an extended model of McCulloch Pitts Neuron) to be a machine that learns, using examples, to assign input vectors (samples) to different classes, using a linear function of the inputs.

Minsky and Papert (1969) described the perceptron as a stochastic gradient-descent algorithm that attempts to linearly separate a set of n-dimensional training data.Perceptron has a single output whose values determine to which of two classes each input pattern belongs. Such a perceptron can be represented by a single node that applies a step function to the net weighted sum of its inputs. The input pattern is considered to belong to one class or the other depending on whether the node output is 0 or l.

The Perceptron takes a vector of real-valued inputs (x1,...,xn) weighted with (w1,...,wn) calculates the output as linear combination of these inputs shown in Eq.(4.1)

1 if

-1 otherwise

O(x1,x2,…,xn=

(4.1)

Where w0 denotes a threshold value, x0 is always 1, outputs 1 if the result is greater than 1, otherwise -1.

A learning procedure called the "perceptron training algorithm" can be used to obtain mechanically the weights of a perceptron that separates two classes, whenever possible.So the perceptron training algorithm can be allowed to run until all samples are correctly classified. Termination is assured if ɳ is sufficiently small, and samples are linearly separable.

Perceptron training rule

(4.2)

Where t target output

o is the perceptron output

ɳ learning rate lies between 0.0 and 1

4.3.1 Perceptron learning algorithm using delta rule

i) Initialize the weights and threshold to small random numbers.

ii) Repeat until each training sample is classified correctly

a) Apply the perceptron training rule to each training example.

Present the pattern (x1,x2,…,xn) and evaluate the output of the neuron.

iii) Update the weights according to perceptron learning rule shown in Eq.4.2

Minsky and papert (1969) concluded that the above theorem guaranteed the classification of linearly separable data. But most problems do not have such data. One such example of pattern classification problem is the XOR problem.XOR takes two binary inputs, output 1 if exactly one of the inputs is high and output 0 otherwise. So there are four patterns and two possible outputs (0 or 1).The use of multilayer perceptrons solved XOR knowledge representation problem.

4.4 Feed forward neural network (Multilayer Perceptron)

MLPs are the most common networks in the supervised learning family.Feedforward neural network consists of nodes that are partitioned into layers numbered 0 to L, where the layer number indicates the distance of a node from the input nodes. The lower most layers is the input layer numbered as layer 0, and the topmost layer is the output layer numbered as layer L.The hidden layers numbered 1 to (L-1).Hidden nodes do not directly receive inputs from nor send outputs to the external environment. Input layer nodes merely transmit input values to the hidden layer nodes, and do not perform any computation. The number of input nodes equals the dimensionality of input patterns, and the number of nodes in the output layer is stated by the problem under consideration. The number of nodes in the hidden layer is up to the discretion of the network designer and generally depends on problem complexity.

The equation for the unit’s output is given by Eq. (4.3)

Output = 1/ 1 + e(-sum) (4.3)

The feed forward process involves presenting an input pattern to input layer neurons that pass the input values onto the first hidden layer. Each of the hidden layer nodes computes a weighted sum of its inputs, passes the sum through its activation function and presents the result to the output layer.Backpropagation algorithm is the most popular algorithm used to learn multilayer perceptron.Neuron outputs feed forward to subsequent layers. These are good for solving static pattern recognition, classification and generalization problems

4.5 Unsupervised neural networks

In unsupervised learning, there is no teacher signal. We are given a training set {xi; i = 1,2,..., m}, of unlabeled vectors in Rn. The objective is to categorize or discover features or regularities in the training data. The xi's must be mapped into a lower dimensional set of patterns such that any topological relations existing among the xi's are preserved among the new set of patterns. The success of unsupervised learning depends on optimized weights returned by the learning algorithm.

Neural networks for unsupervised learning used to discover internal structure of the data without making use of information about the class of an example. The best well known neural networks used for clustering are self-organizing map (SOM) (Kohonen 1984, 1995, 1997, 2001) and adaptive resonance theory models (ART) (Carpenter and Grossberg et.al.)

4.5.1 Kohonen self-organizing map

Feature maps constitute basic building blocks in the information-processing infrastructure of the nervous system. feature maps preserve neighborhood relations in the input data to represent regions of high signal density on correspondingly large parts of the topological structure. This is a distinct feature of the human brain that motivates the development of the class of self-organizing neural networks. Basically, there are two different models of self-organizing neural networks proposed by Willshaw and Von Der Malsburg (1976) and Kohonen (1982) respectively.

The Willshaw–Von Der Malsburg model is used where input and output dimensions are same. But Kohonen model is capable of generating mappings from high-dimensional signal spaces to lower dimensional topological structure.These mappings are performed adaptively in a topologically ordered fashion.

The mappings make topological neighborhood relationship geometrically explicit in low-dimensional feature map. To understand and modeling of computational maps in the brain, SOMs are applied in many areas (Ritter et al., 1992). The applications are

(i) Subsystems for engineering applications ex: cluster analysis (Su et al., 1997),

(ii) Motor control (Martinetz et.al., 1990),

(iii) Speech recognition (T.Kohonen, 1988),

(iv) Vector quantization (S.P.Luttrell, 1989) ,

(v) Adaptive equalization (Kohonen et al., 1992) ,

(vi) Combinational optimization (Farata and Walker, 1991; Kohonen et al., 1996)

Figure Kohonen model (Haykin, 1999, p.445

The principal goal of self-organizing feature maps is to transform patterns of arbitrary dimensionality into the responses of one or two-dimensional (2-D) arrays of neurons and to perform transformation adaptively in a topological ordered fashion. The essential constituents of feature maps (Kohonen, 1982) are as follows:

• An array of neurons that compute simple output functions of incoming inputs

of arbitrary dimensionality;

• A mechanism for selecting the neuron with the largest output;

• An adaptive mechanism that updates the weights of the selected neuron and its neighbors.

SOM is an unsupervised neural network that approximates an unlimited number of input data by a finite set of nodes arranged in a grid, where neighbor nodes correspond to more similar input data.The model is produced by a learning algorithm that automatically orders the inputs on two-dimensional grid according to their mutual similarity.

A simple Kohonen net architecture consists of two layers shown in , an input layer and a Kohonen (output) layer. These two layers are fully connected. Each input layer neuron has a feed-forward connection to each output layer neuron.

The inputs of KSOM are n-tuples where x=(x1,x2,…,xn) and outputs m cluster units arranged in a one or two-dimensional array. The weight vector for a cluster unit serves as an exemplar of the input patterns associated with that cluster.During the self-organization process,the cluster unit whose weight vector matches the input pattern most closely using squared minimum Euclidean distance is chosen as the winner. The winning unit and its neighboring units update their weights. In general, the weight vectors of the neighboring units are not close to the input pattern.

4.5.1.1 SOM learning algorithm (copied from fausett)

Step 0. Initialize weights Wij

Set topological neighborhood parameters. As clustering progresses, the radius of the neighborhood decreases. Set learning rate parameter α.It should be a slowly decreasing function of time.

Step 1 While stopping condition is false, do steps 2-8

Step 2 For each input vector x, do steps 3-5

Step 3 For each j, compute:

(4.4)

Step 4 Find index j such that D(j) is a minimum.

Step 5 For all units j within a specified neighborhood of j, and for all i:

(4.5)

Step 6 Update learning rate α using the Eq.3.6

(4.6)

Step 7 Reduce radius of topological neighborhood at specified times.

Step 8 Test stopping condition.

Kohonen SOM has applied to computer-generated music (Kohonen, 1989b).It was also applied to the solution of the well-known traveling salesman problem (Angeniol et al., 1988).

4.5.2 Adaptive resonance theory

Carpenter and Grossberg developed different ART architectures, a result of 20 years of very fundamental research in different fields of science. They introduced ART 1 (Carpenter and Grossberg, 1986), a neural network for binary input patterns. They developed and are still developing different ART 2 architectures (Carpenter and Grossberg, 1987a), (Carpenter and Grossberg, 1987b) which can be used for both analog and binary input patterns. Later they introduced ART 3 (Carpenter and Grossberg, 1990), hierarchical ART 2 networks in which they even incorporate chemical (pre)synaptic properties.

Input patterns can be presented in any order. Each time a pattern is presented, an appropriate cluster unit is chosen and that cluster’s weights are adjusted to let the cluster unit learn the pattern. The motivation behind designing these nets is

i) To allow the user to control the degree of similarity of patterns placed on the same cluster.

ii) The designed nets must be both stable and plastic.

Back propagation network (feed forward neural network) is very powerful to simulate any continuous function with a certain number of hidden neurons discussed in 4.4. But training a back propagation network is quite time consuming. It takes thousands of epochs for the network to reach the equilibrium and it is not guaranteed to find global minimum. Once a back propagation is trained, the number of hidden neurons and the weights are fixed. The network cannot learn from new patterns unless the network is re-trained from scratch. Thus we consider the back propagation networks don’t have plasticity. The number of hidden neurons can be kept constant, the plasticity problem is solved by re-training the network on the new patterns using on-line learning rule. It will cause the network to forget about old knowledge rapidly. So the back propagation algorithm is not stable. To solve this problem, ART nets are proposed by Carpenter and Grossberg.

Stability is achieved by reducing learning rates and network has the ability to respond to a new pattern equally at any stage of learning is called as plasticity. The network includes bottom-up (input-output) competitive learning combined with top down (output-input) learning to resolve stability-plasticity dilemma.

4.5.2.1 ART architecture

The fundamental architecture of ART consists of three groups of neurons

1. Input processing neurons (F1 layer).

2. Clustering units (F2 layer).

3. Control mechanism.

Figure 4.2 ART architecture

F2 layer (clustering units)

Reset

Control Units

bij

tji

F1(b) layer (Interface units)

F1(a) layer (Input units)

The F1 layer consists of two portions: Input portion and interface portion. In the case of ART2, the input portion perform processing based on the inputs it receives.F1 (a) and F1(b) denote the input portion and interface portions of layer F1. F1(b) layer combines the input from F1(a) and F2 layers for comparing the similarity of the input signal with the weight vector for the cluster unit that has been selected as a unit for learning.

Two sets of weighted interconnections exist for controlling the degree of similarity between the units in the interface portion and the cluster layer.The bottom-up weights(bij) are used for the connection from F1(b) layer to F2 layer. The top-down weights (tji) are used for the connection from F2 layer to F1(b). The competitive layer is the cluster layer and the cluster unit with largest net input is the victim to learn the input pattern, and the activations of all other F2 units are made zero. The interface units combine the data from input and cluster layer units. The cluster unit is allowed to learn the input pattern based on similarity between the top-down weight vector and input vector. Reset mechanism unit takes decisions based on the signals it receives from interface portion and input portion of the F1 layer. When cluster unit is not allowed to learn, it is inhibited and a new cluster unit is selected as the victim.

Operation of ART network

In ART network the learning begins with the presentation of input pattern. Initially the activations of all the units in the net are set to zero. All units in the F2 layer are inactive .On presentation of a pattern, the input signals are sent continuously until learning trial is completed. The vigilance parameter controls the degree of similarity of the patterns assigned to the same cluster unit. The function of the reset mechanism is to control the state of each node in F2 layer.

The state of each node can exist in one of three states.

i) Active: In this state the unit is ON. The activation is equal to 1.For ART1, d=1

and for ART2, d lies between 0 and 1.

ii) Inactive: In this state, the unit is OFF. The activation is zero and may be available

to participate in competition.

iii) Inhibited: It is similar to inactive but the unit is prevented from participating in further competition during the presentation of current input

vector.

4.5.2.2 Types of learning in ART

The learning can be performed in two ways. Fast learning and slow learning. Fast learning: Weight changes occurs rapidly in fast learning, relative to the length of a time a pattern is being presented in a particular trial and in each trial the weights reach equilibrium.

Slow learning: The weight updation takes place slowly relative to the time taken for a trial and weights do not reach equilibrium in each trial. More number of patterns has to be presented for slow learning compared to that for fast learning.In slow learning minimum number of calculations occurs for each learning trial. The network is stabilized when each pattern closes its correct cluster unit in the case of fast learning.

The ART1 network receives binary patterns, so the weights associated with each cluster unit stabilize in fast learning mode. In case of ART2 network the weights produced by fast learning continue to change each time a pattern is presented. The net is found to be stable after few presentations of each training pattern. In slow learning process the weight changes do not reach equilibrium during any particular learning trial and more trials are required before the net stabilizes. Fast learning is adoptable to ART1 where as slow learning is better for ART2.

4.5.2.3 ART algorithm

ART algorithm discovers clusters of a set of pattern vectors.The basic steps in the algorithm (for ART and its variations) are given below.

Step0: Initialize the necessary parameters

Step1: Perform Steps 2-9 when stopping condition is false.

Step2: Perform steps 3-8 for each input vector.

Step3: Do processing of F1 layer.

Step4: Perform steps 5-7 when reset condition is true.

Step5: Find the victim unit to learn the current input pattern. The victim unit is going to be the F2 unit with the largest input.

Step6: F1 (b) units combine their inputs from F1(a) and F2.

Step7: Test for reset condition.

If reset is true, then the current victim unit is rejected; go to step4.

If reset is false, then the current victim unit is accepted for learning;

go to step 8.

Step8: Weight updation is performed.

Step9: Test for stopping condition.

4.5.3 Adaptive resonance theory 1(ART1)

ART1 network is designed for binary patterns. The basic architecture is made up of two units.

i) Computational units

ii) Supplemental units

Figure 4.3 Supplemental unit of ART1

-

R

+

-

G11

+

G2

+

+

+

+

bij

tji

F2 layer

(cluster units)

F1(b) layer (interface portion)

F1(a) layer

(input portion)

A computational unit is similar to the fundamental architecture shown in fig.4.2. One drawback with computational units is they respond differently at different stages of the process and are not supported by any of the biological neuron to decide what to do when. The other drawback is that the operation of the reset mechanism is not well defined for its implementation in neural system.

The drawbacks are rectified by the introduction of two supplemental units G1 and G2 (gain control units) along with reset control unit R to provide efficient neural control of the learning process shown in fig.4.3. These three units receive signals from and send signals to all of the units in input layer and cluster layer.

4.3.3.1. Training process of ART1

Initially, binary input vector "s" is presented in the F1 (a) layer. Then the signals are sent to the corresponding X layer, i.e., F1 (b) layer. Each F1(b) layer sends the activation to the F2 layer over the weighted interconnection paths. Each F2 layer unit then calculates the net input. The unit with the largest net unit is selected as the winner and will have activation "1" ,the other units activation will be ‘0".The winning unit is specified by its index "j". Only this winner unit can learn the current input pattern. Then the signal is send from F2 layer to F1(b) layer over the top-down weights. The X units present in the interface portion F1 (b) layer remain on, only if they receive nonzero signal from both F1 (a) and F2 layer units. The norm of vector x gives the number of components in which the top-down weight vector for the winning F2 unit and input vector s are 1.This is called match. The match ratio is the norm of x to norm of s.

Match ratio: norm of x/norm of s = (4.7)

Reset condition: If the match ratio is greater than or equal to vigilance parameter then both the top-down and bottom-up weights have to be adjusted.

i.e. (4.8)

If (4.9)

then the current winning cluster unit is inhibited and the activations of the F1 units are reset to zero.

This process is repeated until a satisfactory match is found or until all the units are inhibited. At the end of each presentation the status of all cluster units is inactive to participate in the next activation.

4.3.4 Adaptive resonance theory 2(ART2)

ART2 is developed for continuous-valued input vectors (Carpenter and Grossberg, 1987b).ART2 network is designed to self-organize recognition categories for analog as well as binary input sequences. The main difference between ART1 and ART2 networks is the input layer.ART2 has high network complexity because much processing is needed in F1 layer.

To achieve stability for analog inputs, a three layer feedback system is added in the input layer. The layers are

Bottom layer: The input patterns are presented through this layer.

Middle layer: Top and bottom patterns are combined together to form a matched pattern which is then fed back to top and bottom input layers.

Top layer: The inputs coming from the output layer (F2 layer) are read in.

The architecture of ART2 is shown in figure 4.4. The F1 layer consists of six types of units-W, X, U, V, P, Q and there are ‘n’ units of each type. The supplemental part of connection between W and X is shown in figure 4.5.

Figure Figure 4.4 Architecture of ART2 network

Ri

Reset Unit

Yi

Yj

Ym

. . .

. . .

Cluster Units

tji

bij

Xi

Vi

Qi

Wi

Cpi

Pi

Ui

auij

Si(input pattern)

bf(qi)

f(xi)

The supplemental unit "N" between units W and X receives signals from all "W" units. Computes the norm of vector w and sends this signal to each of the X units. This signal is inhibitory signal. Each of this (X1,X2,…,Xn) also receives excitory signal from the corresponding W unit. There are supplemental units between U and V and p and Q, performing the same operation as done between W and X.Each X unit and Q unit is connected to V unit. The connections between P1 of the F1 layer and Yj of the F2 layer show the weighted interconnections, which multiplies the signals transmitted over those paths. The winning F2 units activation is d (0<d<1).Normalization is performed between W and X, V and U, and P and Q approximately to unit length.

Figure 4.5 Supplemental part of connection between W and X

Xi

N

Wn

Xn

X1

Wi

W1

..

.

..

.

..

..

The operations performed in F2 layer are same for both ART1 and ART2.The units in F2 layer follow winner-take-all policy to learn each input pattern.The testing of reset condition differs for ART1 and ART2 networks.

The main advantages of ART2 (Kuo et.al, 2004) are:

1. Rapidly learning and adapting to a non-stable environment,

2. Stability and plasticity,

3. Unsupervised learning of preferences behavior that the target does not know

initially, and

4. Deciding the number of clusters exactly and automatically.

4.3.5 Adaptive resonance theory 3(ART3)

Carpenter and Grossberg (1990) proposed ART3 a hierarchical ART2 network. It is a model to implement parallel search of compressed or distributed pattern recognition codes in a NN hierarchy. The computational properties of chemical synapse are transmitter accumulation, release, inactivation and modulation are emulated in the search process.The search process works well either in fast learning or in slow learning mode that discovers appropriate representations of a non-stationary input environment.



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now