Text String Detection From Natural Scenes

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Mr. Rampurkar Vyankatesh Vijaykumar.,VPCOE Baramati.

Mrs. Gyankamal J. Chhajed. Asst.Prof.,VPCOE Baramati.

Abstract

In this paper we have presented different methods to find

strings of characters from natural scene images. We have

presented different techniques like extraction of character

string regions from scenery images based on siblings and

thickness of characters, efficient binarization and enhancement

technique followed by a suitable connected component

analysis procedure, text string detection from natural scenes

by structure-based partition and grouping, and a robust algorithm

for text detection in images. It is assumed that

characters have closed contours, and a character string consists

of characters which lie on a straight line in most cases.

Therefore, by extracting closed contours and searching neighbors

of them, character string regions can be extracted; Image

binarization successfully processes natural scene images

having shadows, non-uniform illumination, low contrast and

large signal-dependent noise. Connected component analysis

is used to define the final binary images that mainly consist of

text regions. One technique chooses the candidate text characters

from connected components by gradient feature and color

feature. Color-based partition performs better than gradientbased

partition, but it takes more time to detect text strings

on each color layer. The text line grouping is able to extract

text strings with arbitrary orientations. The combination of

color-based partition and adjacent character grouping (CA)

gives the best performance.

Key terms

Connected Components Analysis, Adjacent Character

grouping, Image partition, Text line grouping, Text string

detection, Color reduction technique, Edge detection.

1. Introduction

Text information in natural scene images serves as important

clues for many image-based applications such as

scene understanding, content-based image retrieval, assistive

navigation, and automatic geocoding. However,

locating text from a complex background with multiple

colors is a challenging task. A lot of objects on which

characters are written exist in our living environment.

We humans get much information from these texts. It

is expected that robots act in our living environment

and support us in the future. If robots can read text

on objects such as packages and signs, robots can get

information from them, and they can use it in their activation

and support for us. Owing to the progress of

OCR, computers have been able to read text in images.

However, images have many non-character textures, and

they make it difficult to read text by OCR. To cope with

that problem, we need to extract character string regions

from images. Indexing images or videos requires information

about their content. This content is often strongly

related to the textual information appearing in them,

which can be divided into two groups: Text appearing

accidentally in an image that usually does not represent

anything important related to the content of the image.

Such texts are referred to as scene text. Text produced

separately from the image is in general a very good key

to understand the image which is called artificial text.

In contrast to scene text, artificial text is not only an

important source of information but also a significant

entity for indexing and retrieval purposes. Natural scene

images contain text information which is often required

to be automatically recognized and processed. Localization

of text and simplification of the background in images

is the main objective of automatic text detection approaches.

However, text localization in complex images

is an intricate process due to the often bad quality of images,

different backgrounds or different fonts, colors, sizes

of texts appearing in them. In order to be successfully

recognizable by an OCR system, an image having text

must fulfill certain requirements, like a monochrome text

and background where the background-to-text contrast

should be high. This paper strives toward methodologies

that aids automatic detection, segmentation and recognition

of visual text entities in complex natural scene

images.

The algorithms of text extraction from images can be

broadly classified under three types. They are gradient

feature based, color segmentation based, and texture

analysis based. The gradient feature based algorithm

is based on the idea that pixels which have high gradient

are the candidates of characters since edges exist

between a character and background. In this paper,

1

Chucai Yi and YingLi Tian propose a new framework to

extract text strings with multiple sizes and colors, and

arbitrary orientations from scene images with a complex

and cluttered background [3]. The proposed framework

consists of two main steps: a) image partition to find

text character candidates based on gradient feature and

color uniformity. In this step, Chucai Yi and YingLi

Tian propose two methods to partition scene images into

binary maps of non overlapped connected components:

gradient-based method and color-based method b) Character

candidate grouping to detect text strings based

on joint structural features of text characters in each

text string such as character sizes, distances between two

neighboring characters, and character alignment. In this

step, Chucai Yi and YingLi Tian propose two methods

of structural analysis of text strings: adjacent character

grouping method and text line grouping method.

Figure 1: Examples of text in natural scene images [3]

2. Related Work

One of the characteristics of common characters in real

images is that most of them are capable of producing

closed contour when edge extraction process is applied.

So Tomohiro Nishino[1] takes an approach to detect

closed contours from images. Moreover, it is assumed

that a character string consists of characters which lie

on a straight line. From these assumptions, character

string should be found from regions where closed contours

are arranged with regularly.Assuming that characters

included in a character string are aligned horizontally,

string regions can be extracted by detecting horizontally

aligned closed contours. Tomohiro Nishino explains

how to detect the horizontally aligned closed contour.

First, a circumscribed rectangle of a closed contour

is calculated. Next, the rectangle is slid to right by some

pixels as much as the width of the rectangle. If the rectangle

includes the center of a circumscribed rectangle of

another closed contour, these two closed contours are assumed

to be aligned horizontally and to be included in

the same character string. Closed contours which are isolated

are assumed not to be characters. Circumscribed

rectangles of each character string are assumed to be

string regions. By this process, string regions of horizontally

aligned closed contours are extracted. Both closed

and unclosed contours which lie left or right of character

string regions are extracted, and each thickness is

calculated. These contours are added to the character

string region if they have the similar thickness to that of

characters in that region.

Basilios Gatos[2] produces gray level image and inverted

gray level image. Then, calculate the two corresponding

binary images using an adaptive binarization

and image enhancement technique. In the sequel, the

proposed technique involves a decision function that indicates

which image between binary images contains text

information.

An efficient algorithm which can automatically detect,

localize and extract horizontally aligned text in images

(and digital videos) with complex backgrounds is presented

by by Julinda Gllavata, Ralph Ewerth and Bernd

Freisleben[4]. The proposed approach is based on the

application of a color reduction technique, a method for

edge detection, and the localization of text regions using

projection profile analyses and geometrical properties.

2

Figure 2: The flowchart of the framework [3]

3. Programmer’s design

Fig.2 depicts the flowchart of the framework. The

proposed framework consists of two main steps, given

here.

Step 1) Image partition to find text character candidates

based on gradient feature and color uniformity.

In this step, Chucai Yi and YingLi Tian propose two

methods to partition scene images into binary maps of

nonoverlapped connected components: gradient-based

method and color-based method. A post processing is

then performed to remove the connected components

which are not text characters by size, aspect ratio, and

the number of inner holes.

Step 2) Character candidate grouping to detect text

strings based on joint structural features of text characters

in each text string such as character sizes, distances

between two neighboring characters, and character alignment.

In this step, Chucai Yi and YingLi Tian propose

two methods of structural analysis of text strings: adjacent

character grouping method and text line grouping

method.

4. Image Partition

To extract text information from a complex background,

image partition is first performed to group together

pixels that belong to the same text character, obtaining

a binary map of candidate character components.

Based on local gradient features and uniform colors of

text characters, we design a gradient-based partition algorithm

and a color-based partition algorithm, respectively.

Figure 3: We compare results of four morphological operators

with result of our gradient-based partition.[3]

Figure 4: Some examples of color-based partition, where

the left column contains original images and other

columns contain the corresponding dominant color layers.[

3]

5. Connected Components Grouping

The image partition creates a set of connected components

S from an input image, including both text characters

and unwanted noises. Observing that text infor-

3

mation appears as one or more text strings in most natural

scene images, we perform heuristic grouping and

structural analysis of text strings to distinguish connected

components representing text characters from

those representing noises. Assuming that a text string

has at least three characters in alignment, we develop

two methods to locate regions containing text strings:

adjacent character grouping and text line grouping, respectively.

In both algorithms, a connected component

C is described by four metrics: height,width ,area ,and

centroid.In addition, we use D to represent the distance

between the centroids of two neighboring characters.

6. Adjacent Character Grouping

Text strings in natural scene images usually appear in

alignment, each text character in a text string must possess

character siblings at adjacent positions. The structure

features among sibling characters can be used to

determine whether the connected components belong to

text characters or unexpected noises. Here, five constraints

are defined to decide whether two connected

components are siblings of each other.

1) Considering the capital and lowercase characters, the

height ratio falls between 1/T1 and T1.

2) The distance between two connected components

should not be greater than T2 times the width of the

wider one.

3) For text strings aligned approximately horizontally,

the difference between Y-coordinates of the connected

component centroids should not be greater than T3 times

the height of the higher one.

4) Two adjacent characters usually appear in the same

font size, thus their area ratio should be greater than

1/T4 and less than T4.

5) If the connected components are obtained from gradient

based partition, the color difference between them

should be lower than a predefined threshold Ts. In their

system Chucai Yi and YingLi Tian set T1=T4=2, T2=3,

T3=0.5, T5=40

6.1. Mathematical Model

For two connected components C and C’ they can be

grouped together as sibling components if the above five

constraints are satisfied. When C and C’ are grouped

together, their sibling sets will be updated according to

their relative locations. That is, when C is located on

the left of C’, C’will be added into the right-sibling set

of C, which is simultaneously added into the left-sibling

set of C’ The reverse operation will be applied when C is

located on the right of C’. To create sibling groups corresponding

to complete text strings, Chucai Yi and YingLi

Tian merge together any two sibling groups SG(C1) and

SG(C2) when their intersection contains no less than two

connected components. At this point, each sibling group

can be considered as a fragment of a text string. Repeat

the merge process until no sibling groups can be

merged together. Text string in scene images can be described

by corresponding adjacent character groups. To

extract a region containing a text string, Chucai Yi and

YingLi Tian calculate rectangle covering all of the connected

components in the corresponding adjacent character

group.

Advantages:-

1) The structure features among sibling characters can

be used to determine whether the connected components

belong to text characters or unexpected noises.

2) Character grouping is performed to combine the candidate

text characters into text strings which contain at

least three character members in alignment.

4

Figure 5: Two detected adjacent character groups

marked in red and green [3]

7. Text Line Grouping

In order to locate text strings with arbitrary orientations,

Chucai Yi and YingLi Tian develop text line

grouping method. To group together the connected components

which correspond to text characters in the same

string which is probably non horizontal, Chucai Yi and

YingLi Tian use centroid as the descriptor of each connected

component.

In order to locate text strings with arbitrary orientations,

Chucai Yi and YingLi Tian develop text line

grouping method. To group together the connected components

which correspond to text characters in the same

string which is probably non horizontal, Chucai Yi and

YingLi Tian use centroid as the descriptor of each connected

component.

7.1. Mathematical Model

Given a set of connected component centroids,

groups of collinear character centroids are computed, as

shown below.

M= {m | C 2 S & m = centroid (C)}

L = {G | G is subset of M, |G| _ 3, they are character

centroids & are collinear}

Where M denotes the set of centroids of all of the connected

components obtained from image partition, and

L denotes the set of text lines which are composed of text

character centroids in alignment.

Chucai Yi and YingLi Tian design an efficient algorithm

to extract regions containing text strings. At first,

they remove the centroids from the set M if areas of

their corresponding connected components are smaller

than the predefined threshold Ts. Then, three points

mi, mj , mk are randomly selected from the set to form

two line segments. They calculate the length difference,

and incline angle difference between line segments mimj

and mjmk as shown

_d=D (mi , mj) / D (mj , mk)

__=| _ij-_jk | , if | _ij-_jk | _ _/2

=| _ij -_ -_jk | , if | _ij-_jk | > _/2

The three centroids are approximately collinear if

1/T6__d_T6 and ___T7

Here T6=2, T7=_/12 Thus, they compose a preliminary

fitted line lu=mi, mj , mk , u =index of fitted line. Other

collinear centroids along lu can be added into the end positions

to form a complete text string increasingly. For

now, each text string is described by a fitted line. The

location and size of the region containing a text string

is defined by the connected components whose centroids

are cascaded in the corresponding fitted line.

Advantages:-

1) In order to locate text strings with arbitrary orientations,

Chucai Yi and YingLi Tian develop text line

grouping method.

2) To group together the connected components which

correspond to text characters in the same string which

is probably non horizontal, they use centroid as the descriptor

of each connected component.

8. Image binarization and connected

component analysis based method

Proposed methodology is presented in Figure 7.

Starting from the scene image, Basilios Gatos produces

5

Figure 6: Resulting fitted lines from centroids cascading.

Red line corresponds to text region while cyan lines are

false positives to be removed [3]

gray level image and inverted gray level image. Then,

calculate the two corresponding binary images using an

adaptive binarization and image enhancement technique.

In the sequel, the proposed technique involves a decision

function that indicates which image between binary images

contains text information. In first fig. the original

binary image is selected while in second fig. the inverted

binary image is selected. Finally, a procedure that detects

connected components of text areas is applied.

8.1. Data independence and Data Flow architecture

Data Flow architecture is represented in figure 8.

8.2. Turing Machine

state transition diagram is represented in figure 9.

9. Conclusion

Due to the unpredictable text appearances and complex

backgrounds, text detection in natural scene images

is still an unsolved problem.In this paper we have presented

methods to find strings of characters from natural

scene images. We have presented techniques like

extraction of character string regions from scenery images

based on siblings and thickness of characters, efficient

binarization and enhancement technique followed

by a suitable connected component analysis procedure,

text string detection from natural scenes by structurebased

partition and grouping, and a robust algorithm

Figure 7: a) Flowchart of the proposed method for text

detection in natural scene images (original binary image

is selected). b)Flowchart of the proposed method for text

detection in natural scene images (inverted binary image

is selected).[2]

Figure 8: Data Flow architecture

for text detection in images. Also, we have presented an

approach to detect, localize, and extract texts appearing

in grayscale or color Images as well as locate text strings

with arbitrary orientations.

6

Figure 9: state transition diagram

Our future work will focus on developing learning

based methods for text extraction from complex backgrounds

and text normalization for OCR recognition.



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now