Optical Character Recognition (OCR)

Print   

08 Feb 2018

Disclaimer:
This dissertation has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional dissertation writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

INTRODUCTION

1.1. Optical Character Recognition:

Optical Character Recognition (OCR) is the mechanical or electronic interpretation, reading of images of handwritten, typewritten or printed text (usually captured by a scanner or tablet) into machine-editable text.

OCR is a playing field of research in pattern identification, artificial intelligence and machine vision. An OCR system enables you to take a book or a magazine article, feed it directly into an electronic computer file, and then edit the file using a word processor.

All OCR systems include an optical scanner for reading text, and suave software for analyzing images. Most OCR systems use a mishmash of hardware (specialized circuit boards) and software to recognize characters, although some economical systems do it entirely through software. Advanced roman OCR systems can read text in large variety of fonts, but they still have difficulty with handwritten text.

1.2. History Of Optical Character Recognition:

To comprehend the phenomena described in the above section, we have to look at the history of OCR [3, 4, 6], its improvement, recognition methods, computer technologies, and the differences between humans and machines [1, 2, 5, 7, 8]. It is always intriguing to be able to find ways of enabling a computer to ape human functions, like the ability to read, to write, to see things, and so on. OCR research and development can be traced back to the early 1950s, when scientists tried to confine the images of characters and texts, first by mechanical and optical means of rotating disks and photomultiplier, flying spot scanner with a cathode ray tube lens, followed by photocells and arrays of them. At first, the scanning operation was dawdling and one line of characters could be digitized at a time by moving the scanner or the paper medium. Subsequently, the contraptions of drum and flatbed scanners arrived, which extended scanning to the full page. Then, advances in digital-integrated circuits brought photo arrays with higher solidity, faster transports for documents and higher speed in scanning and digital conversions.

These vital improvements greatly accelerated the speed of character recognition and abridged the cost, and opened up the possibilities of processing a great range of forms and documents. Throughout the 1960s and 1970s, new OCR applications sprang up in retail businesses, banks, hospitals, post offices; insurance, railroad, and aircraft companies; newspaper publishers, and many other industries [3, 4].In parallel with these advances in hardware development, rigorous research on character recognition was taking place in the research laboratories of both academic and industrial sectors [6, 7]. Although both recognition techniques and computers were not that powerful in the in the early hours (1960s), OCR machines tended to make masses of errors when the print quality was poor, caused either by wide disparity in type fonts and roughness of the surface of the paper or by the cotton ribbons of the typewriters [5]. To make OCR work proficiently and economically, there was a big ram from OCR manufacturers and suppliers toward the standardization of print fonts, paper, and ink qualities for OCR applications. New fonts such as OCRA and OCRB were designed in the 1970s by the American National Standards Institute (ANSI) and the European Computer Manufacturers Association (ECMA), respectively. These special fonts were quickly approved by the International Standards Organization (ISO) to facilitate the recognition process [3, 4, 6, 7]. As an upshot, very high identification rates became achievable at high speed and at reasonable costs. Such accomplishments also brought better printing traits of data and paper for practical applications. Actually, they completely revolutionize the data input industry [6] and eliminated the jobs of thousands of keypunch operators who were doing the really mundane work of keying data into the computer.

1.3. Common Steps Of OCR Processing:

The method of converting documents into electronic forms, which is usually referred to as digitization is undertaken in different steps.

The process of scanning a document and representing the scanned image for further processing is called the pre-processing or imaging stage.

The process of manipulating the scanned image of a document to produce a searchable text is called the OCR processing stage.

1.3.1. The Imaging Stage:

The imaging procedure involves scanning the document and storing it as an image. The most popular image format used for this purpose is called Tagged-Image File Format (TIFF).

The resolution (number of dots per inch - dpi) determines the accurateness rate of the OCR process.

1.3.2. The OCR Process:

The major steps of the OCR processing stage are shown below.

1.3.3. Distinguishing Between Text And Images - Segmentation:

In this step, the process of recognizing the text and image blocks of the scanned image is undertaken. The boundaries of each image are analyzed in order to identify the text.

1.3.4. Character Recognition - Feature Extraction:

This step involves recognizing a character using a process known as feature extraction. OCR tools stockpiles rules about the characters of a given script using a method known as the learning course. A character is then identified by analyzing its shape and comparing its features adjacent to a set of rules stored on the OCR engine that distinguishes each character.

1.3.5. Recognition Of Character:

Following the character identification process, character detection process is performed by comparing the string of characters against an existing dictionary of words. Additional processes such as spell-checking are performed under this step.

1.3.6. Output Formatting:

The finishing step involves storing the output in one of the industry standard formats such as RTF, PDF, WORD and plain UNICODE text.

1.4. Pattern Recognition:

Pattern recognition (also known as classification or pattern classification) is a field within the vicinity of artificial intelligence and can be defined as "the act of taking in raw data and taking an action based on the category of the data". It uses methods from statistics, machine learning and other vicinities.

Typical applications of pattern recognition are:

  • Automatic speech identification.
  • Classification of text into numerous categories (e.g. spam/non-spam email messages).
  • The automatic identification of handwritten postal codes on postal envelopes.
  • The automatic identification of images of human faces etc.

The preceding three examples form the subtopicimage analysis of pattern recognition that pact with digital images as input to pattern recognition systems.

Some trendy techniques for pattern recognition include:

  • Neural Networks(NN)
  • Hidden Markov Models(HMM)
  • Bayesian networks (BN)

The application domains of pattern identification include:

  • Computer Vision
  • Machine Vision
  • Medical Image Analysis
  • Optical Character Recognition
  • Credit Scoring.

1.5. Applications Of The Pattern Recognition:

Pattern recognition has many useful applications. Some of them are outlined below.

  • Utilizes as a telecommunication aid for deaf, in airline reservation, in postal department for postal address reading (both handwritten and printed postal codes/addresses) and for medical diagnosis.
  • For use in customer billing as in telephone exchange billing system, order data logging, and automatic finger print identification, as an automatic inspection system.
  • In automated cartography, metallurgical industries, computer assisted forensic linguist system, electronic mail, information units and libraries and for facsimile.
  • For direct processing of documents as a multipurpose document reader for large scale data processing, as a micro-film reader data input system, for high speed data entry, for changing text/graphics into a computer readable form, as electronic page reader to handle large volume of mail.

1.6. Scope Of This Work:

The Project is designed to classify and identify a scanned image containing Arabic characters using two pace approaches. In the first pace the Arabic text image is preprocessed. And in the second pace it features are extracted. During the itinerary of work it is assumed that there is no noise in the image and the image is flawlessly scanned with no deviation from its original angle no skewing.

1.7. Objectives And Applications Of This Work:

Arabic Optical Character Recognition can open a novel way of realizing the dream of the natural mode of communication amid man and machine in this part of the world. It will inflate and multiply already available knowledge to new horizons. Century's aged rare script in Arabic, Urdu and Persian will become available to common man.

The ultimate goal of character recognition is to conjure up the human reading capabilities. Character recognition systems can contribute immensely to the advancement of the automation process and can improve the interaction among man and machine in many applications, including office automation, check verification and a large variety of banking, business and data entry applications, library archives, documents identifications, e-books producing, invoice and shipping receipt processing, subscription collections, questionnaires processing, exam papers processing and many other applications[9], beside online address and signboard reading.

1.8. Thesis Organization:

The remaining part of this thesis is divided into four chapters. Chapter 2 describes review of literature. Chapter 3 describes Arabic script, its peculiarities and problems. Chapter 4 is regarding the development of Arabic Character identification and chapter 5 is about conclusions and future directions respectively.

Chapter 2

REVIEW OF LITERATURE

2.1. Optical Character Recognition:

Since the beginning of writing as a form of communication, paper prevailed as the medium for writing. Electronic media is replacing paper with time. Because it preserves space and is fast to access, electronic media are constantly gaining esteem. The convenience of paper, its pervasive used for communication and archiving, and the quantity of information already on paper, press for quick and accurate methods to automatically read that information and adapt it into electronic form [Albadr95].

The latent application areas of automatic reading machines are numerous. One of the earliest, and most thriving, applications is sorting checks in banks, as the volume of checks that circulates daily has proven to be too huge for manual entry. Other applications are detailed in the next section [Govindan90, Mantas86].

The machine imitation of human reading (i.e. optical character recognition) has been the subject of widespread research for more than five decades. Character identification is pattern recognition application with a crucial aim of simulating the human reading capabilities of both machine printed and handwritten cursive text. The currently available systems may interpret faster than humans, but cannot reliably read such a wide diversity of text nor consider context. One can say that a great quantity of further effort is required to, at least, narrow the gap between humans reading and machines reading capabilities. The practical significance of OCR applications, as well as the interesting nature of the OCR problem, has lead to great research interest and assessable advances in this field. Now, commercial OCR systems for Latin characters are commonly accessible on personal computers achieving recognition rates above 99% [McClelland91, Welch93]. Further, systems on the market can now interpret a variety of writing styles (e.g., hand-written, printed Omni-font), and character sets including Chinese, Japanese, Korean, Cyrillic, and Arabic.

Since the 50s, researchers have carried out far-reaching work and published many papers on character recognition. Nearly all of the published work on OCR has been on Latin, Japanese or Chinese characters. This has started since the median 40s for Latin, the middle of the 1960s for Chinese and Japanese. The following are positive surveys and reviews on Latin character recognition. Reference may be made to [Mori92] for historical appraisal of OCR research and development. The survey of [Govindan90] includes surveys of other languages; [Mantas86] has an overview of character identification methodologies, [Impedovo91] on commercial OCR systems, [Tian91] on machine-printed OCR, [Tappert90, Wakahara92] for on-line handwriting identification. [Suen80] has a survey on automatic identification of hand printed characters (viz. numerals, alphanumeric, FORTRAN, and Katakana), while [Nouboud90] produced a review of the recognition of hand-printed (non-cursive) characters and conducted beta tests on a business system. [Bozinovic89, Simon92] surveyed off-line cursive word recognition, Jain et al [Jain2000] reviewed statistical pattern recognition methods, and [Plamondon2000] comprehensive survey of online and offline handwriting identification. Two bibliographies of the fields of OCR and document scrutiny appeared in [Jenkins93, Kasturi92]. [Stallings76, Mori84], produced surveys on identification of Chinese machine- and hand-printed characters, respectively, and Liu et al [Liu2004] addressed the state of the art of online identification of Chinese characters.

2.2. General Review Of Arabic Character Recognition:

Although almost one billion people world-wide, in several diverse languages, use Arabic characters for writing (Arabic, Persian, and Urdu are the most noted examples), Arabic character identification has not been researched as thoroughly as Latin, Japanese, or Chinese. The first published work on Arabic character acknowledgment may be traced back to 1975 by Nazif [Nazif75] in his master's thesis. In his thesis a system for the identification of printed Arabic characters was developed based on extracting strokes that he called radicals (20 radicals are used) and their positions. He used correlation between the templates of the deep-seated and the character image. A segmentation phase was included to segment the cursive text. Years later Badi and Shimura [Badi78, Badi80] and Noah [Nouh80] toiled on printed Arabic characters and Amin [Amin80] on hand-written Arabic characters. Surveys on AOTR may be referred in [Amin85a, Amin98, Shoukry89, Jambi91, Albadr95, Nabawi2000, Ahmed94].

On-line systems are restricted to recognizing hand-written text. Some systems recognize remote characters [Ali89, Amin80, Amin85b, Amin87, ElSheikh89, ElSheikh90b, ElWakil87, ElWakil89, Saadallah85] and hand-written mathematical formulas [ElSheikh90c, Amin91b], while others recognize cursive words [Badi78, Badi80, Badi82, Amin82a, Amin82b, Shaheen90, AlEmami90]. Since the segmentation problem in Arabic is non-trivial the concluding systems deal with a much harder problem.

While several off-line systems use video cameras to digitize pages of text (e.g., [Abbas86, Goraine92, Amin86, HajHassan85, HajHassan90, Nouh80, Nouh87, Nouh89, Sarfraz2003, Sarfraz2004]), the inclination now is to use scanners with resolutions ranging from 200 to 400 dots per- inch (e.g., [AbdelAzim89c, AbdelAzim90a, AlYousefi88, Amin91a, Bouhlila89, ElDabi90, ElSheikh88a, Ramsis88, Sarfraz2003a, Sarfraz2003b, Zidouri2002, Zidouri2005]). Scanners set up less noise to an image, are less pricey, and more convenient to use for character recognition, especially when coupled with automatic document feeders, automatic Binarization, and image enhancement.

Among the off-line systems that identify hand-written isolated characters are [Abuhaiba90, AlYousefi90, AlTikriti85, ElDesouky92, Hyder88]. [Abbas86, AbdelAzim89b, Goneid92] identify hand-written Arabic (Hindi) numerals, and [Badi80, Badi82, Goraine92, Jambi92, Zahour91] distinguish hand-written words. The majority of off-line systems distinguish typewritten cursive words [AbdelAzim89c, AbdelAzim90a, Bouhlila89, ElDabi90, Amin86, ElKhaly90, ElSheikh88b, Goraine89, Khella92, Margner92, Nazif75, Nouh87, Ramsis88, Tolba89, Tolba90, ElRamly89c, HajHassan90, HajHassan91], while [ElShiekh88a, Mahdi89, Mahmoud94, Nouh80, Nouh89, NurulUla88, Fayek92, Sarfraz2005d, Zidouri2005] identify only typewritten isolated characters. The systems of [Abdelazim90b, AlBadr92, ElGowely90, Kurdy92, Fakir93] are intended to recognize typeset words. One of the systems [Abdelazim89a] recognizes bilingual (Arabic/Latin) typewritten words. Examples of systems for detection of other languages that use Arabic script are [Parhami81, Yalabik88, Hyder88], which are designed for the identification of Persian, Ottoman (Old Turkish), and Urdu, respectively.

2.3. Applications Of Optical Character Recognition:

Optical character recognition technology has many practical applications that are independent of the treated language. The following are some of these applications:

    • Financial Business Applications:

For cataloging bank checks since the number of checks per day has been far too large for manual arrangement.

    • Commercial Data Processing:

For inflowing data into commercial data processing files, for example inflowing the names and addresses of mail order customers into a database. In addition, it can be worn as a work sheet reader for payroll accounting.

    • In Postal Department:

For postal address reading, cataloging and as a reader for handwritten and printed postal codes.

    • In Newspaper Industry:

Premium typescript may be read by recognition equipment into a computer typesetting system to keep away from typing errors that would be introduced by keypunching the text on computer peripheral equipment.

    • Use By Blind:

It is used as a reading abet using photo sensor and tactile simulators, and as a sensory aid with sound output. Additionally, it can be worn for reading text sheets and reproduction of Braille originals.

  • In Facsimile Transmission:

This procedure involves transmission of pictorial data over communications channels. In practice, the pictorial data is mainly text. Instead of transmitting characters in their pictorial representation, a character identification system could be used to recognize each character then transmit its text code. Finally, it is worth to say that the major potential application for automatic character identification is as a general data entry for the automation of the work of an ordinary office typist.

2.4. Development Of New OCR Techniques:

As OCR research and development advanced, demands on handwriting identification also increased because a lot of data (such as addresses written on envelopes; sums written on checks; names, addresses, identity numbers, and dollar values written on invoices and forms) were written by hand and they had to be pierced into the computer for processing. But early OCR techniques were based generally on template matching, simple line and geometric features, stroke detection, and the extraction of their derivatives.

Such techniques were not classy enough for practical identification of data handwritten on forms or documents. To cope with this, the Standards Committees in the United States, Canada, Japan, and some countries in Europe designed some handprint models in the 1970s and 1980s for people to write them in boxes [7]. Hence, characters written in such specified shapes did not diverge too much in styles, and they could be recognized more easily by OCR machines, especially when the data were pierced by controlled groups of people, for example, employees of the same company were asked to write their data like the advocated models. Sometimes writers were asked to follow certain bonus instructions to enhance the quality of their samples, for example, write big, close the loops, use simple shapes, do not link characters, and so on. With such constraints, OCR detection of handprints was able to flourish for a number of years.

2.5. Recent Trends And Movements:

As the years of exhaustive research and development went by, and with the birth of several new conferences and workshops such as IWFHR (International Workshop on Frontiers in Handwriting Recognition), 1 ICDAR (International Conference on Document Analysis and Recognition), 2 and others [13], identification techniques advanced rapidly. Moreover, computers became much more authoritative than before. People could write the way they normally did, and characters need not have to be written like specified models, and the subject of unimpeded handwriting recognition gained considerable momentum and grew swiftly. As of now, many new algorithms and techniques in pre-processing, feature extraction, and powerful classification methods have been urbanized [8, 9].

Chapter 3

ARABIC A CURSIVE SCRIPT

3.1. Arabic:

Arabic is a semantic language used as principal language in most countries. Arabic is vocalized by 234 million people [9] and essential in the culture of many more. While spoken Arabic varies across region, written Arabic, sometimes called "Modern Standard Arabic" (MSA), is a uniform version used for official communication across the Arab world [9]. The characters of Arabic script and similar character are used by a much higher entitlement of the world's population to write language such as Arabic, Farsi, Persian and Urdu. Thus the ability to automate the understanding of written Arabic would have wide spread benefits.

Arabic is normally written in the calligraphic Nastaliq script, whereas Naskh is more commonly used. Usually, bare transliterations of Arabic into Roman letters exclude many phonemic elements that have no counterpart in English or other languages commonly written in the Roman alphabet. National Language Authority of Pakistan has developed numeral systems with specific notations to signify non-English sounds, but these can only be appropriately read by someone already familiar with Urdu, Persian, or Arabic for letters such as ? ? ? ? or ? and Hindi for letters. Most of Arabic characters when pooled form a degree of about 45 to the horizontal line because of which Arabic script reading is faster than roman script but on the other hand it makes it harder for the greenhorn readers and the machines to identify the word or segment one character from the rest.

Unlike the English script there is no capital or small characters in Urdu, but the last character of a word can be measured as a capital character as in many cases it presents the full form of the character and the characters at early and middle positions are considered as small. Every character has an impartial shape besides different joining forms, but some of the alphabet like the characters making the word Urdu (? ? ? ?) or of the similar category are not joinable or cannot be connected. Arabic alphabet utilizes consonant letters, vowels, diacritic marks, numerals, punctuations and a few superscripts signs.

The graphical representation of each alphabet has surplus one form depending on its position and context in the word. In general each letter has four forms that is beginning, middle, final and standalone as shown in table 3.1.

3.2. Arabic Letters:

The Arabic alphabet contains 28 letters. Each has between two and four shapes and the choice of which shape to use depends on the situation of the letter within its word or sub word. The shape correspond to the four positions: beginning of a (sub) word, middle of a (sub) word. End of a (sub) word and in isolation. Table 3.1 shows each shape for each letter. Letters without "initial" shapes are purely their isolated shapes, and their "medial" shapes are their final shapes.

Some letters have "descanters" or "ascenders" which are position that extend below the primary line on which the letters sit or above the stature of most letters. There's no upper or lower case, but only one case. Arabic script is written from right to left, and

Letters within a word are usually joined even in machine print. Letter shapes and whether or not to connect depend on the letter and its neighbors. Letters are connected at the same virtual height. The "baseline" is the line at the height at which letters are allied, and it is akin to the line on which some an English word sits. Letters are wholly above it except for decanters and some markings. There's no association between separate words. So word boundaries are always represented by a breathing space. Six letters, however, can be allied only on one side. When they occur in the middle of a word, the word is divided into manifold sub-words separated by space.

A "ligature" is a word shaped by combining two or more letters in an accepted manner. Arabic has numerous standard ligatures, which are exception to the above rules for joining letters. Most common is "laam- alif", the combination of "laam" and "alif" and other include "yaa-meem".

3.3. Problems Of Arabic Script:

Despite a huge character set Arabic has a small set of characters which are easily discernible from one another. The remaining character fluctuates from these character using dots or symbols above or below these shapes [19]. The table 3.2 shows group of similar characters and their derived forms.

As shown above table 3.2, only 21 different groups' exits out of 32 character set. It will complicate the identification phase of Arabic characters. Further study of other forms ( initial, middle and final ) of these character divulges that ein( ??) is analogous to hamza(?), wow (?) might be perplexing with (?) , ze (?) resembles noon (??) and mem(?) can be baffled with middle form of ein (???) and with stand alone goal-he (?).

A key distinction between Latin scripts and Arabic script is the fact that many letters only differ by a dot(s) but the primary stroke is exactly the same. [19]

3.4. Others Problems In Arabic OCR:

All Muslims (almost ¼ of the people on the earth) can read Arabic because it is the language of Al-Quran, the holy book of Muslims. Even though, Arabic script identification has not received enough welfare by the researchers. Little research progress has been accomplished comparing to the one done on the Latin and Chinese. The elucidations available in the market are still far from being perfect [11, 14]. There are few raison d'êtres led to this result.

  • Require of financial support and platform accessible from any government (official language of countries).
  • lack of ample support in terms of journals, books etc. and lack of interaction between researchers in this playing field;
  • lack of broad-spectrum support utilities like Arabic text databases, dictionaries, programming tools, and supporting staff;
  • belatedly start of Arabic text identification (first publication in 1975 compared with the 1940s in the case of Latin character recognition);
  • The research carried out on Arabic language is typically scattered and outside from the Arab world.
  • There are no specialized conferences or symposium demeanor so far.
  • Algorithms developed for other language scripts are not pertinent on Arabic.

3.5. Characteristics Of Arabic Characters:

The calligraphic nature of the Arabic set is eminent from other languages in several ways. For example,

  1. Arabic text is written from right to left.
  2. No upper or lower cases subsist in Arabic, but sometimes the last character of a word is considered as upper case because it's always remains in its full form.
  3. Arabic has 28 fundamental characters, of which 16 have from one to three dots. Those dots discriminate between the otherwise similar characters. Additionally, three characters can have a meander like stroke. The dots are called secondaries and they are located above the character primary part as in ALEF (?), or below like BAA (?), or in the middle like JEEM (?).
  4. Written Arabic text is cursive mutually in machine-printed and hand-written text. Within a word, some characters unite to the preceding and/or following characters, and some do not connect. The connectivity of characters consequences in a word having one or more connected components. We will refer to each connected piece of a word as a sub-word.
  5. The shape of an Arabic character depends on its location in the word; a character might have up to four different shapes depending on it being isolated, connected from the right (beginning form), connected from the left (ending form), or connected from both sides (middle form).
  6. A distinguishing feature of Arabic writing is the presence of a base-line. The baseline is a level line that runs through the connected portions of text (i.e. where the character's connection segments are located). The baseline has the highest number of text pixels. (See figure 3.2.)
  7. Characters in a word may overlie vertically (even without touching).
  8. Arabic characters do not have permanent size (height and width). The character size varies according to its pose in the word,
  9. Characters in a word can have diacritics. These diacritics are written as strokes, placed either on top of, or below, the characters. Poles apart diacritic on a character may change the meaning of a word. Readers of Arabic are accustomed to reading un-diacritical text by deducing the meaning from context.
  10. Numerous characters can combine vertically to form a ligature, especially in typeset and handwritten text.
  11. Arabic words may perhaps consist of one or more sub-words. Each sub-word may have one or more characters, because some Arabic characters are not joinable to others from the left side. As an example, the word Ketab ( ???? ) consists of two sub-words: Keta ( ??? ) which consists of three characters and BAA( ?) which is a single character.
  12. There are merely three characters that represent vowels, ? , ? or ? . However, there are other shorter vowels represented by diacritics in the form of over scores or underscores but practice of over score and underscore in Arabic is less
  13. Dots may materialize as two separated dots, touched dots, hat or as a stroke.
  14. Another style of Arabic handwriting is the arty or decorative calligraphy which is usually full of overlapping making the identification process even more difficult by human being rather than by computers.

3.6. Summary:

Arabic script includes its cursive nature of writings, right to left style of writing and change of form and shape when a character is placed at different locations of a word, loops, half closed characters and dots on above or below a character. National Language Authority defined 32 characters set but it has 21 working characters beside numeral and diacritics.

Chapter 4

ARABIC CHARACTER RECOGNITION

4.1. Phases Of Arabic Character Recognition:

In an offline character identification system, the user scans a particular script, runs the OCR and gets the documents saved in a file format of his choice. The alteration of the text from the scanning phase to the final document involves a number of phases that are transparent to the user. The proposed system can be implemented in the following steps:

  • Image Acquisition;
  • Digitization;
  • Preprocessing;
  • Feature extraction;
  • Recognition.

Figure 4.1 shows the component of a recognition system. All of these steps are equally essential towards obtaining the result that is accurate. An error in any stage will decrease the accuracy of the results and thus has to be dealt very carefully. Now we'll discuss these phases in detail.

4.2. Image Acquisition And Digitization Of Text:

The text is scanned with a scanner and the image is stores in a file. Scanners are capable of producing images represented in a variety of formats. One of the most common of these is the bit map (BMP) format. We have to convert BMP file to the pixel map representation to be used with the recognition techniques that follow. BMP file can be thought of as consisting three main parts. First, a header provides essential information regarding what is to follow. Such information includes the width and the depth of the pixel map, the number of bits per pixel, and a pointer to the beginning of pixel data. Second, a color palette follows the header files. This is typically represented in intensities in red, blue, and green (RGB), the composite of which designated the actual color. The size of the palette depends on the number of bits per pixel since this, the pixel value; serve as index into the palette. For grey scale images the RGB values will generally be equal and will serve as intensity.

We thus have to separate headers and color information from the bitmap data and convert it into raw pixel data. This raw data represent the images only and processed directly. Binarization is then achieved by comparing the grey value with a given threshold. The threshold is calculated by finding the dominant gray value in the text which represent the background and then choose the threshold value to be a midpoint between the dominant gray value and maximum gray value. Below are matlab 7.0 functions used for reading image and for converting image from any format into binary.

  • imread(filename,fmt)
  • im2bw(I, level)

4.3. Preprocessing:

Preprocessing is the second phase of Arabic character recognition (AOCR). In this phase character is extracted from background and is enclosed in rectangle. It then passes through various processes, so that features of a character can be extracted easily in next phase. Preprocessing is further divided into two phases;

  • Skeletonization / Thinning
  • Connected Component Construction.

4.3.1. Skeletonization:

As in any type of data acquisition system, noise errors will occur on the input. Smoothing is needed to eliminate the noise from the text image. Skeletonization is an important approach to representing the shape of plain region. The objective of Skeletonization is to reduce the representation of a region to a chain of single pixel width while preserving all other relevant features.

The width of strokes within character provides little useful information for the recognizer and may even serve to obscure the classification. Thus there is a sense in which skeletonization, sometimes termed characters thinning, can be thought of as removing unnecessary or redundant portion of the input.

Additionally the extraction of geometric features (intersection, endpoint, and loops) is facilitated by this process. Character thinning will ideally reduce the character representation to a single pixel width while preserving all other relevant features. One method for doing this involves eroding the edges of the image until only the skeleton of character remains. This can be done by raster scanning the images and checking with templates stored in memory. Each template provides set of conditions under which center pixel should be deleted. A "1" pixel in the character bitmap matches the band if it has a "0" pixel in specific direction. That "1" pixel will then be stripped off if and only if doing so would not cause any of the neighboring pixels (Xs) that are "1" to become disconnected.

4.3.2. Thinning:

Other methods of preprocessing include a thinning process to obtain character in the form of string of points. This gives the possibility of creating dynamic information such as stroke sequence, from a static image. First we apply Hilditch's method, which consisting of removing the pixel that lie on the edge of binary until only one pixel wide line remains. This is followed by some conditions to reduce the junction points to one junction point. Figure 4.3 shows the result of thinning on an input pattern of Arabic character.

4.3.3. Connected component construction:

The concluding stage of preprocessing is to find the connected components. Connected component are regular boxes bounding together region of 4-8 connected black pixels. The purpose of the connected component analysis of an image is to form rectangles around distinct component, whether they be textual element such as character or non textual such as images.

The technique used to obtain the connected component is a simple iterative procedure which compares successive scan line of an image to determine whether black pixels in any of line scan are connected together. Bounding rectangles are extended to enfold any grouping of connected black pixels between successive scan lines.

4.4. Feature Extraction:

Feature extraction process eradicate redundancy from data and represent the character image by Global transformation, Structural, and Statistical features.

4.4.1. Structural features:

It describes geometrical and topological characteristics of a pattern by representing its global and local properties

4.4.2. Statistical features:

Statistical features are plagiaristic from the statistical distribution of pixels and describe the characteristic measurements of the pattern

4.4.3. Global transformation:

Global transformation system transforms the pixel representation to a more compact form. This reduces the dimensionality of the feature vector and provides feature invariants to global deformation like translation, dilation and rotation

For feature extraction phase, character in binarized image will be enclosed in a geometrical shape (mostly used rectangle or square), and within that box its features will be extracted. For this function connected component analysis is done to label each pixel of a character. All the pixels in each component are given an integer label 0, 1, 2, where 0 is the background. On the basis of these components minimum and maximum coordinates are found and rectangle is constructed around the character. Then Hu-moments and other statistical measures for each character are computed.

By using neighboring technique starting point, ending point and loops or turning point of a character is found. For this purpose, certain pixels in the character skeleton were defined as starting, turning and ending point. To find starting point, an image will be scan row by row from the top to the bottom. Once single pixel which exactly has got one neighbor and having value "1" is obtained; it will be considered as starting point and consider the first pixel of the body of character in an image. .After finding the start point in a given character image, we move to the next neighbouring pixel which will also be a part of the body and will have more than one neighbor pixel, the pointer will start follow all neighboring pixels in coordinate (having neighbors and value "1") until it find pixel which won't have any neighbor. Scanning of pixel is done from left to right. After obtaining starting point, ending points and turning point, all pixels (having value "1"), are traced and result for each character is stored in a matrix.

4.5. Recognition:

In recognition phase, the last phase of Arabic character recognition (AOCR), features extracted from a single handwritten character are entered or feed into system to train the system. Database of these features is created for training system. Collected handwritten samples are then feed to this training system for recognition.

4.6. Results:

The final part of this project is to extract the features for all the characters in all the images, and to create a database of character features to be used in recognition. For this purpose extracted features are put into one big matrix.

All the necessary steps are added; from reading the image, pre-processing, and extracting components, etc. into ".m" file which is matlab file, where it can be call for different images. In this project a simple nearest neighbor technique, approach is used to find the features of Arabic characters.

Result obtained from AOCR process is stored in database. Those results are compared with the real image that exists in database that was gathered as input samples. On the bases of that result accuracy of a character is found which is approximately at least 60% to 80% or more.

Chapter 5

CONCLUSIONS

5.1. Conclusions

In Arabic Character Recognition (AOCR), we used a technique to skeletonise image and extract features of a character in an image by finding its starting turning and ending points. After extracting features these features can be used to recognize Arabic character without using neural network. Database was created, and obtained data from feature extraction was stored in it. The result was compared with other raw handwritten samples. With the help of this technique we can produce effective recognition system to identify each character uniquely.

Chapter 6

FUTURE RESEARCH DIRECTIONS

6.1. Future Directions

The feature extraction algorithm needs improvement to overcome garbage characters problem. Further research is required to incorporate diacritic characters in the system. The recognition part can be improved using new algorithms and / or with the introduction of more techniques for training the system.

Matlab And Image Processing Toolbox

The name MATLAB stands for matrix laboratory. MATLAB is a high-performance language for technical computing. It integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation. Typical uses include:

  • Math and computation
  • Algorithm development
  • Modeling, simulation, and prototyping
  • Data analysis, exploration, and visualization
  • Scientific and engineering graphics

Application development, including Graphical User Interface building

MATLAB is an interactive system whose basic data element is an array that does not require dimensioning. This allows you to solve many technical computing problems, especially those with matrix and vector formulations, in a fraction of the time it would take to write a program in a scalar non-interactive language such as C or Fortran. MATLAB has evolved over a period of years with input from many users. In university environments, it is the standard instructional tool for introductory and advanced courses in mathematics, engineering, and science. In industry, MATLAB is the tool of choice for high-productivity research, development, and analysis.

The reason that I have decided to use MATLAB for the development of this project is its toolboxes. Toolboxes allow you to learn and apply specialized technology. Toolboxes are comprehensive collections of MATLAB functions (M-files) that extend the MATLAB environment to solve particular classes of problems. It includes among others image processing toolboxes.

References

  1. S. Mori, H. Nishida, and H. Yamada. Optical Character Recognition, Wiley Interscience, New Jersey, 1999.
  2. Optical Character Recognition and the Years Ahead. The Business Press, Elmhurst, IL, 1969.
  3. Pas d'auteur. Auerbach on Optical Character Recognition. Auerbach Publishers, Inc., Princeton, 1971.
  4. S. V. Rice, G. Nagy, and T. A. Nartker. Optical Character Recognition: An Illustrated Guide to the Frontier. Kluwer Academic Publishers, Boston, 1999.
  5. H. F. Schantz. The History of OCR. Recognition Technologies Users Association, Boston, 1982.
  6. C. Y. Suen. Character recognition by computer and applications. In T. Y. Young and K. S. Fu, editors, Handbook of Pattern Recognition and Image Processing. Academic Press, Inc., Orlando, FL, 1986, pp. 569-586.
  7. A. Amin, H. Al-Sadoun and S. Fischer, "Hand-Printed Arabic Character Recognition System using An Arificial Network" Pattern Recognition, Vol. 29, No. 4, pp. 663-675, 1996.
  8. ethnologue: language of the world. 14th ed: sil international 2000.
  9. Ahmed M. Zeki and Mohamad S. Zakaria ,Challenges in Recognizing Arabic Character, International Islamic University Malaysia (IIUM), Kuala Lumpur, Malaysia, National University of Malaysia (UKM), Bangi, Selangor, Malaysia.
  10. A. Amin, "Off-line Arabic Character Recognition - the State of the Art", Pattern Recognition, Vol. 31, No. 5, 517-530, 1998.
  11. H. Bunke and P. S. P. Wang. Handbook of Character Recognition and Document Image Analysis. World Scientific Publishing, Singapore, 1997.
  12. Proceedings of the following international workshops and conferences:
    • ICPR—International Conference on Pattern Recognition
    • ICDAR—International Conference on Document Analysis and Recognition
    • DAS—Document Analysis Systems
    • IWFHR—International Workshop on Frontiers in Handwriting Recognition.
  13. Erlandson, E. J., Trenkle, J.M., Vogt, R.C., "Word-level recognition of multifont Arabic text using a feature-vector matching approach" Proceedings of the SPIE, Vol. 2660-08, San Jose, 1996.
  14. Gillies, A.M, Erlandson, E.J., Trenkle, J.M.,Schlosser, S.G., "Arabic Text Recognition System", Proceedings of the Symposium on Document Image Understanding Technology, Annapolis, Maryland, 1999.
  15. Zhidong Lu, Issam Bazzi,Andras Kornai, John Makhoul, Premkumar Natarajan,Richard Schwartz, " A Robust, Language-Independent OCR System " , In: Robert J. Mericsko (ed): Proc. 27th AIPR Workshop: Advances in Computer-Assisted Recognition SPIE Proceedings 3584 1999
  16. Trenkle, J.M., Gillies, A.M, Erlandson, E. J.,Schlosser, S.G., "Arabic Character Recognition" Proceedings of Symposium on Document Image Understanding Technology. Bowie, Maryland, pp. 191-195, October 24-25, 1995.
  17. Trenkle, J. M. and R. C. Vogt, "Disambiguation and Spelling Correction for a Neural Networkbased Character Recognition System." In Proceedings Document Recognition, Proc. SPIE 2181, eds. Luc M. Vincent and Theo Pavlidis, San Jose, CA, pp. 322-333, 6-10 February 1994.
  18. N. Otsu, "A threshold selection method from gray- level histograms", IEEE transactions on systems, Man and Cybernetics, vol. 9, no. 1, pp 62-69, 1979.
  19. Starner, T., J. Makhoul, R. Schwartz, and G. Chou. "On-line Cursive Handwriting Recognition Using Speech Recognition Methods." In IEEE Proceedings International Conference on Acoustics, Speech, and Signal Processing, pp. 125-128, April 1994.
  20. Tapas Kanungo, Gregory A. Marton, and Osama Bulbul , "OmniPage vs. Sakhr: Paired Model Evaluation of Two Arabic OCR Products " ,Proceedings of SPIE Conference on Document Recognition and Retrieval (VI), vol. 3651 San Jose, CA; January 27-28, 1999
  21. Tapas Kanungo, Gregory A. Marton, and Osama Bulbul, " Performance Evaluation of Two Arabic OCR Products" ,Proceedings of AIPR Workshop on Advances in Computer Assisted Recognition, SPIE vol. 3584 Washington, D.C.; October 14-16, 1998 MathWorks ,"Image Processing Toolbox User's Guide Version 3", MathWorks
  22. Tim Klassen ,"Towards Neural Network Recognition Of Handwritten Arabic Letters " Thesis, Faculty of Computer Science, Dalhousie University
  23. Zaheer Ahmad, Jehanzeb Khan, Urdu Nastaleeq OCR (Optical Character Recognition, Proceedings of World Academy of Science, Engineering and Technology, Volume 2, ISSN:1307-6884, December 2007.
  24. Amin, A. "Arabic Character Recognition", Handbook of Character Recognition and Document Image Analysis, World Scientific Publishing Company, 1997, pp. 398.
  25. Towards Neural Network Recognition Of Handwritten Arabic Letters By Tim Klassen thesis for MASTER OF COMPUTER SCIENCE (M.C.Sc.) 2001
  26. Ahmed M. Zeki and Mohamad S. Zakaria ,Challenges in Recognizing Arabic Character, International Islamic University Malaysia (IIUM), Kuala Lumpur, Malaysia, National University of Malaysia (UKM), Bangi, Selangor, Malaysia.
  27. F. Al-Fakhri, On-Line Computer Recognition of Hand-Written Arabic Text, Master's Thesis, Science University of Malaysia, 1997.
  28. Zeki, Plausable inference Approach to Character Recognition, Master's Thesis, National University of Malaysia, 1999.
  29. A.Amin, H. Al-Sadoun and S. Fischer, "Hand-Printed Arabic Character Recognition System using An Arificial Network" Pattern Recognition, Vol. 29, No. 4, pp. 663-675, 1996.
  30. A.Amin, "Off line Arabic Character Recognition - A Survey", in Proceeding of the 4th International Conference Document Analysis and Recognition (ICDAR '97), pp. 596-599, 1997.

Bibliography

  • [Abbas86] S. H. Abbas, M. I. Harba, M. H. Al-Muifraje," Optimising the Digital Learning Network for Recognition of the Handwritten Numerals Used by the Arabs ", Proc. of the Euro. conf, Paris, France, pp. 505-513, (April 1986).
  • [AbdelAzim88] Hazim Y. Abdel-Azim, M. A. Hashish," Arabic Reading Machine ", Proc. of the 10th Nat. Computer conf, King Abdulaziz University, Jeddah, Saudi Arabia, pp. 733-744, (March 1988).
  • [AbdelAzim89a] Hazim Y. Abdel-Azim, M. A. Hashish," Interactive Font Learning for Arabic OCR ", Proc. of the First Kuwait Computer conf, Kuwait, pp. 463-486, (March 1989).
  • [AbdelAzim89b] Hazim Y. Abdel-Azim, M. A. Hashish," A Hidden Markov Modelling Approach to the Recognition of Signatures: A Feasibility Study ", Proc. of the First Kuwait Computer conf, Kuwait, pp. 402-425 (March 1989).
  • [AbdelAzim89c] Hazim Y. Abdel-Azim, M. A. Hashish," Automatic Recognition of Handwritten Hindi Numerals ", Proc. of the 11th Saudi Nat. Computer conf, Dhahran, Saudi Arabia, pp. 287-298 (March 1989).
  • [AbdelAzim90a] Hazim Y. Abdel-Azim, A. M. Mousa, Y. L. Saleh, M. A. Hashish," Arabic Text Recognition Using a Partial Observation Approach ", Proc. of the 12th Nat. Computer conf, Riyadh, Saudi Arabia, pp. 427-437 (Oct. 1990).
  • [AbdelAzim90b] Hazim Y. Abdel-Azim, M. A. Hashish," Arabic Typeset: An OCR Approach ", Proc. of the 5th Signal Processing conf. (EUSIPCO'90), Barcelona, Spain, pp. 1019-1022 (Sep. 1990).
  • [AbdelAzim92] Hazim Y. Abdel-Azim, A. A.-Maguid Mohammad," Automatic Reading of Arabic Text with Spell Checking Assistance ", Proc. of the conf. on the Use of Arabic Language in Information Technology, Riyadh, Saudi Arabia, pp.1-12 (in Arabic) (May 1992).
  • [AbdelAzim95] Hazim Y. Abdel-Azim, A. A.-Maguid," Arabic Script Recognition Using Hopfield Network ", IJCA, Int. Journal of Computers and their Applications, Vol. II, No.1 (April 1995).
  • [AbdelAzim96] Hazim Y. Abdel-Azim," A Hybrid Fuzzy-Neural Approach to the Recognition of Arabic Script ", Proc. of the 5th Int. conf. and Exhibition on Multi-lingual Computing, University of Cambridge, England, UK, pp. 2.3.1-2.3.13 (April, 1996).
  • [Abdulla88] W. H. Abdullah, A. O. M. Saleh, A. H. Morad," A Preprocessing Algorithm for Handwritten Character Recognition ", Pattern Recognition Letters, Vol. 7, pp. 13-18 (January 1988).
  • [AbuHaiba90] I. S. I. Abu-Haiba," Use of Fuzzy Set Theory in Pattern Recognition with Application to Arabic Characters ", M.Phil. Thesis, University of Bradford, Bradford, England, UK (1990).
  • [AbuHaiba91] I. S. Abuhaiba, Sabri A. Mahmoud, and R.J.Green, "Cluster number estimation and skeleton refining algorithms for Arabic characters. " The Arabian Journal for Science and Engineering, Vol. 16, Number 4B, pp. 519-530, October 1991.
  • [AbuHaiba94] I. S. Abuhaiba, Sabri A. Mahmoud, and R.J.Green, "Recognition of Handwritten Arabic characters.", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 16, No. 6, June 1994, pp. 664-672.
  • [Ahmed92] Mohammad Bin Ahmed, Al-Munji Jo'aly, J. Kraifos, S. Kaneer," Recognition of Arabic Characters Using Neural Networks for Electronic Document Processing ", Proc. of the conf. on the Use of Arabic Language in Information Technology, Riyadh, Saudi Arabia, pp. 1-8 (in Arabic) (May 1992).
  • [Ahmed94] Pervez Ahmed, M. A. A. Khan," Computer Recognition of Arabic Script Based Text - The State of the Art ", Proc. of the 4th Int. conf. and Exhibition on Multi50 lingual Computing (Arabic and Roman Script), University of Cambridge, England, UK, pp. 2.1.1-2.1.10 (April 1994).
  • [AlBadr92] Badr Al-Bader, R. Haralick," Recognition without Segmentation: Using Mathematical Morphology to Recognize Printed Arabic ", Proc. of the 13th Nat. Computer conf, Riyadh, Saudi Arabia, pp. 813-829 (Nov.1992).
  • [AlEmami90] Samir Al-Emami, M. Usher," On-line Recognition of Handwritten Arabic Characters ", IEEE Transactions on PAMI: Pattern Analysis and Machine Intelligence, Vol. 12, No. 7, pp. 704-710 (July 1990).
  • [Ali89] Sabah A. Ali, Mahdi S. Al-Saadoun," A Parallel Algorithm for Image Thinning ", Proc. of the First Kuwait Computer conf, Kuwait, pp. 121-140 (in Arabic) (March 1989).
  • [AlMuallim87] H. Al-Muallim, S. Yamaguchi," A Method for Recognition of Arabic Cursive Handwriting ", IEEE Transactions on PAMI: Pattern Analysis and Machine Intelligence, Vol. 9, No. 5, pp. 715-722 (Sep. 1987).
  • [AlQaisy89] E. K. Al-Qaisy, H. L. Naser," Using Probabilistic Functions for the Recognition of Handwritten Arabic Numerals ", Proc. of the First Kuwait Computer conf, Kuwait, pp. 109-120 (in Arabic) (March 1989).
  • [AlTikrity85] M. N. Al-Tikriti, S. Al-Ramahi," A Fuzzy Approach for Some Arabic Handwritten Characters Computer Recognition ", Proc. of Computer Processing and Transmission of the Arabic Language Workshop, Kuwait, pp. 1-14 (April 1985).
  • [AlTuwaijri 94a] Majid M. Al-Tuwaijri, Magdy A. Bayoumi," Arabic Text Recognition Using Neural Networks ", Proc. of the IEEE ISCAS: Int. Symposium on Circuits and Systems, London, England, UK, Vol. 6, pp. 415-418 (ABS) (1994).
  • [AlTuwaijri 94b] Majid M. Al-Tuwaijri, R. Ayoubi, Magdy A. Bayoumi," Skeletonization of Arabic Characters Using a Neural Network Mapped on Maspar ", Proc. ofWCNN, San Diego, USA, p. II-635 (June 1994). ,
  • [AlTuwaijri 94c] Majid M. Al-Tuwaijri, Magdy A. Bayoumi," Recognition of Arabic Characters Using Neural Networks ", Proc. of the ICECS'1994, Cairo, Egypt, pp. 720-725 (Dec. 1994).
  • [AlTuwaijri 95a] Majid M. Al-Tuwaijri, Magdy A. Bayoumi," A Parallel Recognition System for Arabic Cursive Words ", Private communication, a paper of 33 pages (1995).
  • [AlTuwaijri 95b] Majid M. Al-Tuwaijri, Magdy A. Bayoumi," A New Thinning Algorithm for Arabic Characters Using Self-Organizing Neural Networks ", ISCAS'95 (1995).
  • [AlTuwaijri 96] Majid M. Al-Tuwaijri, Magdy A. Bayoumi," A New Thinning Algorithm for Arabic Characters Using ART2 Neural Networks ", Submitted to IEEE Trans. On Neural Networks, (1996).
  • [AlYousefi88] H. Al-Yousefi, S. Udpa," Recognition of Handwritten Arabic Characters ", Proc. of the SPIE's 32nd Annual Technical Symposium on Optical and Opto- Electronics Applied Science and Engineering, San Diego, CA, USA, Vol. 974, pp. 330-336 (1988).
  • [AlYousefi90] H. Al-Yousefi, S. Udpa," Recognition of Handwritten Arabic Characters via Segmentation ", Arab Gulf J. Scient. Research, Vol. 8(2), pp. 49-59 (1990).
  • [Amin80] Adnan Amin, A. Kaced, J. Haton, R. Mohr," Handwritten Arabic Character Recognition by the IRAC System ", Proc. of the 5th Int. Joint conf. on Pattern Recognition, Miami, Florida, USA, pp. 729-731 (Dec. 1980).
  • [Amin82a] Adnan Amin, G. Masini," Machine Recognition of Cursive Arabic Words ", Proc. Of the SPIE's 26th Int. Symposium and Instrument Display, San Diego, California, USA, pp. 286-292 (Aug. 1982).
  • [Amin82b] Adnan Amin," Machine Recognition of Handwritten Arabic Words by the IRAC II System ", Proc. of the 6th Int. Joint conf. on Pattern Recognition, Munchen, Germany, pp. 34-36 (Oct. 1982).
  • [Amin83] Adnan Amin," IRAC: Recognition and Understanding Systems, Arab School on Science and Technology ", Applied Arabic Linguistics and Signal & Information Processing, Rabat, Morocco (1983).
  • [Amin84] Adnan Amin, G. Masini," Recognition of Handwritten Arabic Words and Sentences ", Proc. of the 7th Int. Joint conf. on Pattern Recognition, Montreal, Canada, pp. 1055-1057 (Aug. 1984).
  • [Amin85a] Adnan Amin," State of the Art on Character Recognition ", Arabic Language Meeting, IBM Europe, Paris, France (Jan.1985).
  • [Amin85b] Adnan Amin," Arabic Handwriting Recognition and Understanding ", Proc. Of Computer processing and Transmission of the Arabic Language Workshop, Kuwait, Vol. 1, pp. 1-37 (April 1985).
  • [Amin86] Adnan Amin, G. Masini," Machine Recognition of Multifont Printed Arabic Texts ", Proc. of the 8th IEEE Int. Joint conf. on Pattern Recognition, Paris, France, pp. 392-395 (Oct. 1986).
  • [Amin87] Adnan Amin," IRAC: Recognition and Understanding Systems ", in Descout, R. (ed.): Applied Arabic Linguistics and Signal and Information Processing, Washington, USA, pp. 159-170 (1987).
  • [Amin88a] Adnan Amin," OCR of Arabic Texts ", Proc. of the 9th Int. conf. on Pattern Recognition, University of Cambridge, England, UK, pp. 616-625 (March 1988).
  • [Amin88b] Adnan Amin, A. Kaced, J. Haton," Handwritten Arabic Character Recognition by the IRAC System ", Proc. of the Computer Arabization conf, Egyptian Computer Society, Cairo, Egypt, pp. 9-11 (1988).
  • [Amin89] Adnan Amin, J. F. Mari," Machine Recognition and Correction of Printed Arabic Text ", IEEE Trans. on Systems, Man, and Cybernetics, Vol. 19, No. 5, pp. 1300- 1306 (Sep. 1989).
  • [Amin91a] Adnan Amin, S. Al-Fedaghi," Machine Recognition of Printed Arabic Text Utilizing Natural Language Morphology ", Int. J. Man-Machine Studies, Vol. 35, pp. 769- 788 (1991).
  • [Amin91b] Adnan Amin," Recognition of Arabic Handprinted Mathematical Formulae ", Arabian J. For Science and Engineering (AJSE), KFUPM, Dhahran, SaudiArabia, Vol. 16, No. 4B, pp. 531-542 (Oct.1991).
  • [Amin98] Adnan Amin," Off-line cursive Arabic characters recognition: the state of the art ", Pattern Recognition, 31(5), pp.517-530.
  • [Badie78] K. Badie, M. Shimura," A Classification Method of Arabic Alphabets ", Paper of Technical Group, TGPRL.78-56, IECE, Institute of Electronics and Comm. Eng. of Japan (in Japanese) (1978).
  • [Badie80] K. Badie, M. Shimura," Machine Recognition of Arabic Cursive Scripts, ", In Pattern Recognition in Practice, pages 315-323, 1980. SSS
  • [Badie82] K. Badie, M. Shimura," Machine Recognition of Arabic Handprinted Scripts ", Trans. of the IECE, Institute of Electronics and Comm. Eng. of Japan, Vol. E65, No.2, pp. 107-114 (Feb 1982).
  • [Baird93] H. S. Baird," Calibration of Document Image Defect Models ", Proceedings of the 2nd Annual Symposium on Document Analysis and Information Retrieval, pages 1-16, Las Vegas, NV, April 1993.
  • [Bezdek75] J. Bezdek and J. Dunn, Optimal fuzzy partitions: A heuristic for estimating the parameters in a mixture of normal distributions, IEEE Trans. Comput. vol. C-24,835-838, Aug. (1975).
  • [Bezdek77] J. Bezdek and P. Castelaz, Prototype classification and feature selection with fuzzy sets, IEEE Trans. Syst. Man Cybernet. vol. SMC-7 No. 2, 87-92, Feb.(1977).
  • [Borghesi84] P.Borghesi et al,"Digital Image Processing Techniques for Object Recognition and Experimental Results," Digital Signal Processing,1984, Vol. 84.
  • [Bouhlila89] K. Bouhlila, M. K. Hamrouni, N. Ellouze," Method of Segmentation of Arabic Text Image into Characters ", Proc. of the First Kuwait Computer conf, Kuwait, pp. 442-446 (March 1989).
  • [Bozinovic89] R. Bozinovic, S. Srihari," Off-line cursive script word recognition ", IEEE Trans. Pattern Anal. Machine Intell., Vol. 11, No. 1, January 1989, pp. 68-83.
  • [Casey82] R. G. Casey, G. Nagy.," Recursive Segmentation and Classification of Composite Character Patterns, ", Proceedings of the 6th International Joint Conference on Pattern Recognition, pages 1023-1026, Munich, F.R.G., October 1982.
  • [Duda73] R.O. Duda, P.E. Hart," Pattern Classification and Scene Analysis ", John Wiley and Sons, New York, 1973.
  • [Dunn92] Christopher E. Dunn, P. S. P. Wang.," Character Segmentation Techniques for Handwritten Text_A Survey, ", Proceedings of the 11th IAPR International Conference on Pattern Recognition, volume 2, pages 577-580, The Hague, Netherlands, August 1992.
  • [ElDabi90] Sharif S. El-Dabi, R. Ramsis, A. Kamal," Arabic Character Recognition System: A Statistical Approach for Recognizing Cursive Typewritten Text ", Pattern Recognition, Vol. 23, No. 5, pp. 485-495 (1990).
  • [ElDesouky92] Ali El-Desouky, Mefreh Salem, Aida Abd El-Gwad, Hesham Arafat," A Handwritten Arabic Character Recognition Technique for Machine Reader ", Int. Journal of Mini and Microocomputers, Vol. 14, No.2, pp. 57-61 (1992).
  • [ElGowely90] Khaled El-Gowely, O. El-.Dessouki, A. Nazif," Multi-phase Recognition of Multifont Photoscript Arabic Text ", Proc. of the 10th Int. conf. on Pattern Recognition, Vol. 1, pp. 700-702 (1990).
  • [ElKhaly90] F. El-Khaly, M. A. Sid-Ahmed," Machine Recognition of Optically Captured Machine Printed Arabic Text ", Pattern Recognition, Vol. 23, No. 11, pp. 1207-1214 (1990).
  • [Elliman90] D. G. Elliman, I. T. Lancaster," A review of Segmentation and Contextual Analysis Techniques for Text Recognition, ", Pattern Recognition, 23(2/3):337-346, 1990.
  • [ElRamly89a] S. H. El-Ramly, M. A. El-Hamalawy," A Language-Dependent Arabic Character Recognition Approach ", Proc. of the 14th Int. Congress for Statistics, Computer Science, Social and Demographic Research, Vol. 4, pp. 247-254 (March 1989).
  • [ElRamly89c] S. H. El-Ramly , M. A. El-Hamalawy," A New Font for Arabic Characters Simplifies Recognition Procedure ", Proc. of the First Kuwait Computer conf, Kuwait, pp. 398-401 (March 1989).
  • [ElSheikh88a] T. S. El-Sheikh, R. M. Guindi," Automatic Recognition of Isolated Arabic Characters ", J. Signal Processing, Vol. 14, No. 2, pp. 177-184 (March 1988).
  • [ElSheikh88b] T. S. El-Sheikh, R. M. Guindi," Computer Recognition of Arabic Cursive Scripts ", Pattern Recognition, Vol. 21, No. 4, pp. 293-302 (1988).
  • [ElSheikh89] T. S. El-Sheikh, S. G. El-Taweel," Real-time Arabic Handwritten Character Recognition ", Proc. of the Third Int. conf. on Image Processing and its Applications, Warwick, UK, pp. 212-216 (1989).
  • [ElSheikh90a] T. S. El-Sheikh, S. G. El-Taweel," Segmentation of Handwritten Arabic Words ", Proc. of the 12th Nat. Computer conf, Riyadh, Saudi Arabia, pp. 389-402 (Oct.1990).
  • [ElSheikh90b] T. S. El-Sheikh," Recognition of Handwritten Arabic Mathematical Formulas ", Proc. of the UK IT 1990 conf, University of Southampton, Southampton, England, UK, pp. 344-351 (March 1990).
  • [ElWakil87] Mohamed S. El-Wakil, Amin A. Shoukry," On-line Recognition of Handwritten Isolated Arabic Characters ", Proc. of the First KSU Symposium on Computer Arabization, King Saud Univ., Riyadh, Saudi Arabia, pp.109-120 (April 1987).
  • [ElWakil89] Mohamed S. El-Wakil, Amin A. Shoukry," On-line Recognition of Handwritten Isolated Arabic Characters ", Pattern Recognition, Vol. 22, No. 2, pp. 97-105 (1989).
  • [Emam94] Ashraf Emam, M. Ismail, H. Al-Khatib, E. Korany," Character Recognition of Arabic Script ", Proc. of the 4th Int. conf. and Exhibition on Multi-lingual Computing (Arabic and Roman Script), University of Cambridge, England, UK, pp. 2.1.1-2.1.10 (April 1994).
  • [Fakir93] Mohamed Fakir, Chuichi Sodeyama," Recognition of Arabic Printed Scripts by Dynamic Programming Matching Method ", IEICE Trans. Inf. and Sys, Vol. E76-D, No. 2, pp. 235-242 (Feb.1993).
  • [Fayek92] M. B. Fayek, B. Al-Basha," A New Hierarchical Method for Isolated Typewritten Arabic Character Classification and Recognition ", Proc. of the 13th Nat. Computer conf, Riyadh, Saudi Arabia, pp. 750-760 (Nov. 1992).
  • [Fujisawa92] Hiromichi Fujisawa, Yasuaki Nakano, Yiyomichi Kurino.," Segmentation Meth



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

You're running out of money & a deadline?

jb

We know how critical is the final-year dissertation for a student. Check out how we help students in passing the final year.

Get 20% Discount, Now
£21 £17/ Per Page
14 days delivery time

Now! moonlight your way to A+ grade academic success. Get the high-quality work - or your money back.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now