Problems Of Automatic Image Annotation Computer Science Essay

Published Date: 02 Nov 2017

Shweta Kackade , Prof. S.A. Takale VPCOE Baramati, Pune University.

Abstract

This approach addresses the problems of automatic image

annotation (AIA) for the purpose of image retrieval in an

Annotation Based Image Retrieval (ABIR) system. Specifically,

study of different models of image representation in

the AIA area. This combination will give a model which captures

at the same time the global information and the details

of objects. The proposed approach is composed of four main

stages. As there is exponential growth of web 2.0 applications,

tags have been broadly used to describe the images on

the web. As these tags usually contains user generated noise,

which makes image retrieval task quit difficult. The Content

based Image retrieval can provide fruitful results; it is employed

to improve the image retrieval results. However, it is

challenging to associate the image contents and tags semantically.

To attack this severe problem, an integrated framework

is proposed. First a unified graph is built to fuse the visual

feature-based and tag-based image similarity graph with the

image-tag bipartite graph; Then a novel random walk model

is then proposed, which utilizes a transition probabilities between

image content and tags. The presented framework can

naturally integrate the pseudo relevance feedback process, also

it can be directly applied to applications such as content-based

image retrieval, text-based image retrieval, and image annotation.

Key terms

Automatic Image Annotation, Content-based image retrieval,

image annotation, Random walk, text-based image

retrieval

1. Introduction

Automatic image annotation (AIA) has been studied

extensively for several years. As defined by

Wikipedia:"Automatic image annotation is the process

by which a computer system automatically assigns metadata

in the form of text description or keywords to a

digital image. Image annotation is used in image retrieval

systems to organize and locate images of interest

from a database. Also Also it can be called as a type of

multi-class image classification for very large number of

classes. Thus AIA is challenging task and open problem

in multi-class object recognition This paper is focused

on the image annotation task to help the image retrieval

task.

Figure 1:Taj Mahal Image from Flicker website Fig. 1

shows an example image extracted from social networking

site with its user-generated tags. This is a photo

taken Agra, India, and is described by users with the

tags Taj, The Taj mahal, Palace ,Farewell, which are

all semantically relevant to this image. For the current

content-based image retrieval methods is difficult to produce

Image search results with annotation/tags. Hence,

the tag data is an ideal source for improving many tasks

Figure 1: Image extracted from flicker

in image retrieval. Unfortunately, tags predictably contain

the injected noise in the manual labelling process.

As shown in Fig. 1, the last tag associated with this

photo is Farewell. To the owner who submitted this

photo, obviously, this tag is not a noisy tag since this

photo probably was taken when the owner was enjoying

farewell. But in terms of the image retrieval tasks, most

probably the tag farewell is a noise.

Need Automatic Image Annotation:

With the huge growth of digital pictures there is need

of efficient image management system that support the

end user for fast searching, browsing and tagging images

Accuracy of current CBIR systems is still inadequate for

real-world applications. Image retrieval based on text

is sometime called as Annotation Based Image Retrieval

(ABIR). However, ABIR systems have also some lacuna.

1. Manual image annotation that is time consuming

and costly.

2. Human annotation is skewed and sometimes it is

difficult to describe

Challenges in Image Annotation:

AIA is extremely useful in many fields such as image

analysis, machine learning, media understanding and information

retrieval. Usually AIA in case of image analysis

is done with feature vector calculation and the training

of words/tags/keywords with machine learning techniques.

Annotation of new image is possible only after

learning process. This task of scene understanding and

object recognition for semantic prediction is challenging

task. This can be well illustrated with face detection system.

For face detection system accuracy is near about

85% yet face is easiest object to detect. Other problems

in AIA are to deal with object recognition at the same

time from a large image database. Famous problem is

bridging the semantic gap between image and its relevant

tags.

More On AIA:

Application of AIA system can be imagine as a machine

to tell story or text description. Consider the machine

with number of related images associated with text description.

With the use of contextual information such

as personal information, time, location etc and machine

learning technique goal is to generate short story with

respect to image content.

2. Related Work

Image annotation was carried out with the help of popular

statistical models such as co-occurrence model, Translational

model and relevance model. Basically there is

set of keywords with an image, using probability calculation

of keywords and images or specific regions of images.

Words were assigned to images according to the

highest probabilities. One major disadvantages of this

method was computational cost of parameter estimation

e.g. learning process. Mori et al. [2] used co-occurrence

Model which predicts co-occurrence of words with image

region using regular grid. Images were divided into rectangular

tiles of same size using grid based segmentation

algorithm. For each tiles low level features (color + texture).

These rectangular tiles were clustered using traditional

algorithm. The each cluster associated with set of

labels using probability calculation which were inherited

from original image. Jeon et al. [3] improved performance

of previous approach using joint probability calculation.

However, JeonÃ¢AZs performance got improved

using generative language model by Duygulu et al. [4],[5]

which is also called as Cross Media Relevance Model

(CMRM). Instead of image divided into tiles, model divide

images into blobs. Also keyword based image similarity

considered in this model for annotation and retrieval

and ranked retrieval. CMRM method shows better

experimental result analysis over [1] and [2]. The use

of visual dictionary with the visual similarity assessment

was carried out by C. Djeraba [6]. A more work is carried

out to bridge the semantic gap that exists between image

content and its keywords [4], [5], [7], [8], [9]. This work

involves more on content based image retrieval and image

annotation for better performance improvement. A

distance semantic map is constructed using distance metric

learning for measuring textual space similarity [10].

Semantic concepts such as semantic distance estimation,

concept clustering and image annotation is done with visual

correlation [11],[12]. Aiming at above challenges we

develop a user friendly application in which a user can

easily and quickly retrieve the image that he wants to

retrieve. An overall frame is described as: we first build

image similarity graph based on the visual similarity and

user contributed tags. Then a novel approach of random

walk model is proposed for finding the transition probabilities

between image-image, image-tag, tag-image and

finally tag-tag. At last a relevance feedback is used for

result score refinement. The project process summary

is feature extraction of images, similarity matching using

hybrid graph construction, Random walk model on

hybrid graph and finally relevance feedback. We conduct

experiment on corel image database wit user user

contributed tags This system will interact users with the

help of user friendly GUI. The GUI will be self contained

and added with Help function for initial users. Software

Interface needed is IDE- Net beans 7.0, Front End Client:

Java, Back End Ms Access. This project does not require

any external hardware apart from a mouse and keyboard

that would facilitate giving proper inputs in the form of

images. It may also require a scanner to scan external

images.

3. Programmerâ€™s design

Figure 2: System Architecture

3.1. Problem Statement

With the use of social networking site such as facebook

and flicker, user share huge amount of digital images

with the relevant tags. These tag clouds are extremely

useful for image retrieval task and image annotation

task. Visual features of images and user contributed

tags/keywords are two sides of one coin. Hence, both

can be used for efficient image retrieval and image annotation.

Main challenges are how to improve content

based image retrieval and co-relate tags with the image.

3.2. Solving approach

Aiming at the above problem a novel framework is proposed

for content based image retrieval, which also involves

text based image retrieval and image annotation.

Figure 2. shows system architecture of the processes in

system. A two dataset image and tags dataset are collected

from social networking site. Tag dataset reflects

of co-occurrent frequencies of tags with image dataset.

An image similarity graph is constructed based on the

visual similarity and tags similarity of images. Weight

on the edges reflects the similarity between two images.

For visual similarity assessment four types of global features

used while for tags similarity co-occurrent frequencies

are used. Once Image similarity graph build, imagetag

graph is constructed using co-occurrent frequencies

of tags with the images. We then try to find out the

relations between tags using tag graph for better image

annotation purposed. Once all graph constructed, random

walk model which uses spectral clustering is applied

on the graph. Finally a pseudo relevance feedback applied

on the results for score improvement.

The graph mentioned above can be visualize as shown

in figure: first graph is the Image- Image graph in which

node is represented by the blue nodes. While other graph

is Image-Tag and Tag-Image are represented by bipartite

graph where image nodes are denoted by blue node and

tag nodes are denoted by red nodes. Tag-Tag graph is

represented by node red.

Figure 3: Image to Image Graph

Figure 4: Image-Tag and Tag-Image Bipartite Graph

Figure 5: Tag Graph

3.3. Proposed System

In this project there are several processes used to retrieve

an image. Some processes are given below:

1. Global Feature Extraction

2. Graph Construction

3. Random Walk Model

4. Pseudo relevance feedback

Global Feature Extraction

Here we are going extract four types of global features

only since they always efficient in calculation and storage

over local features of images [14].

A.Grid Color Moment (GCM):

Usually color histogram is computed for an image. However

it is unable to find local relationship between colors.

Therefore it can not always distinguish between objects.

This leads to computation of grid color moment. Initially

partition image into 9 grids, then extract color mean,

color variance and color skewness for each grid. Thus,

81 dimensional feature vector is obtained for color feature.

B.Local binary pattern (LBP):

The local binary pattern [15] is texture feature which

remains monotonic to the gray scale image. Here we

compute 59 dimensional LBP histogram.

C. Gabor wavelets texture:

It is also a texture feature which involves 5 scaling and

8 orientation of images. After scaling and orientation of

desired measures 40 sub images are obtained. Then for

each sub image three types of moment calculated [16][17].

Thus 120 dimensional feature vector obtained.

D.Edge:

First convert each image to gray scale image. Then apply

canny edge detector to obtain 37 dimensional feature

vector edge histogram[18]. If we combine these feature

vector we obtain 297 dimensional feature vector for each

image. We employ zero means and unit variance technique

for normalisation of feature vector.

Graph Construction: Once high dimensional feature

vector is calculated we proceed for construction of different

graph such as image similarity graph, image-tag

graph , tag-image graph and tag-tag graph. To construct

these graph we use K-nn graph technique. In

Image similarity graph nodes represent the image feature

vectors while edges corresponds to visual similarity

and tag based similarity weight. In Image-tag and Tag-

Image Graph node are of two type image and tag and

vice versa. edges on the these two graph represents the

co-occurrent frequencies of tags and images. While in

Tag-Tag graph nodes are only tags and edges are the occurrences

both tags to different visual similar images.

Random Walk Model:

Many We applications studied the random walk model

for transition probability calculations. We define random

walk model as follows: Let G (V,E) denote a directed hybrid

graph, where V is the vertex set, and D represents

the set of image nodes while T denotes the set of tag

nodes. E=E"[E+ [ E* is the edge set which consists of

two types of edges. If the edge E ij is in the edge set E+ ,

then i _ D and j _ D. If the edge Eij is in the edge set E*,

then, i _D and j _ T or i _ T and j_ D. If the edge Eij is

in the edge set E" , then i _ T and j _ T.For all the edges

in the edge set E*, define the transition probability as

Rij = fij

fi + fj âˆ’ fij

(1)

P[(tp+1)/tp(j|i)] = PAij

p_T Aip

,

P Aij

p_DAip

,

Sim(V i, V j) P

Sim(V i, V j)

| {z }

(2)

Where, Aij is the number of time tag node Vj has been

assigned to the image node Vi.

P Aip is the total number of time that the image node

has been tagged, when p belongs to tag node.

P Aip is the total number of time that the tag node has

been assigned to all images, when p belongs to image

node.

fi and fj are the occurances of tags i and tag j and fij is

the co-occurrances of i and j.

3.4. Mathematical Model

Let system is represented by S.

1. Identifying input as two set I and T where, I _

{i1, i2, i3|i 2 all images in the image dataset}

T _ {t1, t2, t3|t 2 all tags in the tag dataset}

2. Identifying Output as set of similar images R where,

R _ {r1, r2, r3|r 2 Annotated similar images}

3. Identifying Processes in the system as P where,

P _ { Featurep[SimMatchp[Graphp[RWp[

Relevancep}

Now we can distinguish among the processes as Feature

Extraction (Featurep), Similarity Matching (Sim-

Matchp), Graph construction (Graphp) Random Walk (

RWp), and Pseudo Relevance Feedback (Relevancep).

As above mention in Feature Extraction process four

types of global features are extracted. Therefore,

Featurep _ {GRM,LBP, Gabor, CannyEdge} Where,

GRM = Gridcolormoment

LBP = LocalBinaryPatteren

Gabor = GaborWavelet

CannyEdge = CannyEdgedetector.

After extracting features of image it is represented by

feature vector Fv. Therefore, Next process Similarity

Matching is defined as follows. It uses feature vectors

Fvi for matching images using cosine function. Input to

SimMatchp are Fvi.

SimMatchp _ {Fv1, Fv2, Fv3, ....., Fvn}

CosineSim(Fvi, Fvj) = Fvi.F vj

||Fvi||||Fvj||

(3)

Now, Graph Construction process consists of construction

of three different types of graph

1. Image-Image similarity graph

2. Image-Tag graph

3. Tag-Tag graph

Therefore Graphp can be defined as Graphp _

{Image-Image Graph [ Image-Tag [ Tag-Tag}

Once Graph Construction over the next process is

Random Walk model which compute transition probabilities

for all graphs in Graph Construction process.

Finally Pseudo Relevance is applied on the output obtained

from Random Walk model process to rectify results.

Hence these processes are defined as follows:

RWp _ {Trans-Image-Image [ Trans-Image-Tag [

Trans-Tag-Tag} Where,

Trans âˆ’ Image âˆ’ Image

Trans âˆ’ Image âˆ’ Tag

Trans âˆ’ Tag âˆ’ Tag.................................eq(1and2)

Relevancep _ {Text Score[Image Score[PRF score}

Where, TextScore = Scorebasedontagsimilarity

ImageScore = Scorebasedonimagesimilarity

PRFScore = ScorebasedonP seudorelevancefeedback

Thus, System S is represented by

S _ {Featurep [ SimMatchp [ Graphp [ RWp [

Relevancep}

3.5. Dynamic Programming and Serialization

For Graph Construction Process there any many ways

e.g. _ -nn graph, exp-weighted graph, K-nn graph. Yet

we go for K-nn graph [13] since it is easy understand

and performs well in practice. Some advantages of

Knn algorithm are as Knn avoid assumptions over

underlying data distribution which is called as Non

parametric property. It doesnot use traning data point

for generalization i.e no explicit training phase. So

pretty fast algorithm. It makes decision based on the

entire training data set (in the best case a subset of

them). Even though there is a minimal training phase

but a costly testing phase. The cost is in terms of both

time and memory. More time might be needed as in the

worst case, all data points might take point in decision.

More memory is needed as we need to store all training

data.

Algorithm K-nn

Input: Query Image

Output: K nearest-neighbour for query image.

Step 1: Enter Query Image and define K=20.

Step 2: Compare query image based on training image

dataset.

Comparison is based on feature vector matching with

cosine similarity.

Step 3:Return 20 images closet to query image.

3.6. Data independence and Data Flow architecture

Initially feature extraction of all image dataset is done

to avoid computational overhead. when query image

comes then feature vector is calculated only for query

image.Then searching for nearest cluster is initiated using

K-nn algorithm. Once cluster is found repeated random

walks are performed to find out top ranked images

and tags using image-image and image-tag graph. Random

walk is repeated to refine the result. The data flow

level one diagram for proposed architecture is as shown

in figure 3.

Figure 6: Data flow Architecture

3.7. Turing Machine

The State Transition diagram is as shown in figure 4. Initially

system is ideal. User enters query for image searching.

In the system images are classified using processes

namely K-nn graph and Random walk model which uses

spectral clustering. If query image is new to system, it

get preprocessed according to system flow and database

get updated. Then image get downloaded with its suggested

tags or keywords.

4. Results and Discussion

Clustering of images is very famous problem in image

retrieval system. In this paper we use spectral clustering

by means of graph constructions and random walk

model. It performs better when compared to traditional

algorithm such as K-means or single linkage algorithm.

We perform the experimental analysis on Corel image

dataset. Corel image dataset consists of visually similar

images which shows the efficiency of content based image

retrieval.

For the purpose of image annotation we manually train

images and contribute user defined tags. Also we add

some noise to show the efficiency of image annotation

system. Figure 5 shows results for given image. In each

Figure 7: State Transition Diargam

row first image is query image while other are the results

of images. Figure 6 shows results for image annotation.

Figure 8: Results for Content Based Image Retrieval

5. Conclusion

Thus a novel framework based on Markov random walk is

proposed which tries to bridge the semantic gap between

visual features and its associated textual tags. In future

we will try to incorporate indexing scheme for fast image

retrieval and associate more features such as meta data

on social networking site (time, geo-location and personal

information)

Figure 9: Image Annotation Results

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now

Problems Of Automatic Image Annotation Computer Science Essay

Abstract

This approach addresses the problems of automatic image

annotation (AIA) for the purpose of image retrieval in an

Annotation Based Image Retrieval (ABIR) system. Specifically,

study of different models of image representation in

the AIA area. This combination will give a model which captures

at the same time the global information and the details

of objects. The proposed approach is composed of four main

stages. As there is exponential growth of web 2.0 applications,

tags have been broadly used to describe the images on

the web. As these tags usually contains user generated noise,

which makes image retrieval task quit difficult. The Content

based Image retrieval can provide fruitful results; it is employed

to improve the image retrieval results. However, it is

challenging to associate the image contents and tags semantically.

To attack this severe problem, an integrated framework

is proposed. First a unified graph is built to fuse the visual

feature-based and tag-based image similarity graph with the

image-tag bipartite graph; Then a novel random walk model

is then proposed, which utilizes a transition probabilities between

image content and tags. The presented framework can

naturally integrate the pseudo relevance feedback process, also

it can be directly applied to applications such as content-based

image retrieval, text-based image retrieval, and image annotation.

Key terms

Automatic Image Annotation, Content-based image retrieval,

image annotation, Random walk, text-based image

retrieval

1. Introduction

Need Automatic Image Annotation:

Challenges in Image Annotation:

More On AIA:

2. Related Work

3. Programmerâ€™s design

Global Feature Extraction

A.Grid Color Moment (GCM):

B.Local binary pattern (LBP):

C. Gabor wavelets texture:

D.Edge:

Random Walk Model:

,

,

Relevancep}

Relevancep}

Algorithm K-nn

4. Results and Discussion

5. Conclusion

Our Service Portfolio

Want To Place An Order Quickly?

Do not panic, you are at the right place

Get 20% Discount, Now £19 £14/ Per Page14 days delivery time

Get An Instant Quote

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time