Mining Using The Traditional Approaches

Published Date: 02 Nov 2017

Abstract

Many organizations have large quantities of spatial data collected in various application areas. These data collections are growing rapidly and can therefore be considered as spatial data streams. For data stream classification time is a major issue. The Peano Count Tree (P-tree) is a quadrant-based lossless tree representation of the original spatial data. The idea of P-tree is to recursively divide the entire spatial data, such as Remotely Sensed Imagery data, into quadrants and records the count of 1-bits for each quadrant, thus forming a quadrant count tree. Using P-tree structure, all the count information can be calculated quickly. This facilitates efficient ways for data mining.. In this paper, we developed a new method for decision tree classification on spatial data streams using a data structure called Peano Count Tree (P-tree).Using P-tree structure, fast calculation of measurements, such as information gain, can be achieved. We compare P-tree based decision tree induction classification and a classical decision tree induction method with respect to the speed at which the classifier can be built (and rebuilt when substantial amounts of new data arrive). Experimental results show that the P-tree and bit-sequential methods is significantly faster than existing classification methods, making it the preferred method for mining on spatial data streams.

Keywords: data streams, Compression, BSQ, Peano Ordering, P-Tree, Image mining, and retrieval.

1. Introduction

Image mining deals with the extraction of implicit knowledge, image data relationship, or other patterns not explicitly stored in the image database. Many issues of image mining[1] such as image processing feature extraction, image indexing and retrieval and, pattern and knowledge discovery can be optimized with different data mining techniques. These issues have not been considered for image retrieval. In this paper, emphasis is on the performance of transformation and retrieval processes using BSQ (bit Sequential) the new storage format of the images and transformation processes.

The data is in BSQ format, with each line of the data followed immediately by the next line in the same spectral band. This format is optimal for spatial (X, Y) access of any part of a single spectral band. The main objective is to emphasize on efficient image transformation and retrieval for fast image recognition.

Pixel level features describe spectral and textural Information about each individual pixel. Polygon level features describe connected groups of pixels. In the segmentation process, each polygon is described by its boundary and by a number of attributes that present information about the content of the region in terms of shape, size, etc. The spectral and texture properties are based on pixel features of points within the polygon. Tile level features present spectrum and texture information about whole image tiles.

2. Image Mining System

A general structure model for image mining System is shown in figure 1. The system considers a specified sample of images as an input, whose image features are extracted to represent concisely the image content. Besides the relevance of this mining task, it is essential to consider invariance problem to some geometric transformations and robustness with respect to noise and other distortions in designing a feature extraction operator. After representing the image content, the model description of a given image - the correct semantic image interpretation - is obtained. Mining results are obtained after matching the model description with its complementary symbolic description. The symbolic description might be just a feature or a set of features, a verbal description or phrase in order to identify a particular semantic.

Image mining system has two main themes. The first is mining large collections of images and the second is the combined data mining [6] of large collections of images and associated alphanumeric data. The following subsections will try to sketch the involved concepts.

To get information from a huge amount of images is a hard task. A vast collection of image data should be mined to discover new and valuable knowledge. The next sections explain some relevant techniques applied to convert raw data into minable one. Raw images consist of a two dimensional array of pixels, usually named iconic format. An image database containing raw information cannot be used for mining purposes. At this new representation, it is necessary to reflect the semantic observed in images, which is not possible to detect through a raw analysis of pixel values.

Figure 1: General Image Mining System

2.1 PROBLEM SPECIFICATION

In this work, the new storage format of the images bit Sequential format and peano count tree are used. Initially any image (jpg,bmp,png) is converted into the corresponding bit sequential format(bSQ) where each pixel is converted into its corresponding red green and blue bit values and then peano count trees are generated for the bSQ files. The peano count tree consists of the root count which is the total number of oneâ€™s present in each bSQ file and the .bSQ file is recursively divided into four equal quadrants.

2.2 bit Sequential Format(bSQ)

In its simplest form, the data is in bSQ format, with each line of the data followed immediately by the next line in the same spectral band. This format is optimal for spatial (X, Y) access of any part of a single spectral band. In bit Sequential Organization (bSQ). each intensity value ranges from 0 to 255, which can be represented as a byte, we try to split each bit in one band into a separate file, called a bSQ file. Each bSQ file can be reorganized into a quadrant-based tree (P-tree). There are several reasons to use the bSQ format. First, different bits have different degrees of contribution to the value. In some applications, we do not need all the bits because the high order bits give us enough information. Second, the bSQ format facilitates the representation of a precision hierarchy. Third, and most importantly, bSQ format facilitates the creation of an efficient, rich data structure, the P-tree, and accommodates algorithm pruning based on a one-bit-at-a-time approach

bSQ example

The following is an example of the band sequential (BSQ) file format as it would be written for the graphic below. An actual .bsq file will be binary; however, for the purpose of this example it will be shown using ascii characters.Figure 2.5 describes the red, green, blue colors separation from the .bsq image and also giving the relevant integer values to relevant colors with respect to the color contrast. From Figure 2.4, each bit from the pixel values are taken and stored in eight different files which belong to one band. Similarly, for every band eight different files are generated.

Figure 2.4: Spatial data formats for a two band 2x2 image

Figure 2.5: .bSQ Image composition

Figure 2.6 : bSQ format for two band 2x2 image

2.3 QUAD TREES

A quad tree is a tree whose nodes either are leaves or have 4 children. The children are ordered 1, 2, 3, 4.. the children of a node represent the 4 quadrants. The root of the tree is the entire picture.

Figure 2.7 Levels of a Quad tree

One way to efficiently store the quad tree in binary format is to use the following scheme:

Figure 2.8 Quadrant order of Quad trees

Quad trees are most often used to partition a two dimensional space by recursively subdividing it into four quadrants or regions. The regions may be square or rectangular, or may have arbitrary shapes. To represent a picture using a quad tree, each leaf must represent a uniform area of the picture. If the picture is black and white, we only need one bit to represent the color in each leaf; for example, 0 could mean black and 1 could mean white.

Figure 2.9 Example of a Quad tree

Â A similar partitioning is also known as aÂ Q-tree. All forms of Quadtrees share some common features:

They decompose space into adaptable cells

Each cell (or bucket) has a maximum capacity. When maximum capacity is reached, the bucket splits

The tree directory follows the spatial decomposition of the Quadtree.

3. PEANO COUNT TREES

A Peano count tree is a quadrant based tree. The idea is to recursively divide the entire image into quadrants and record the count of 1-bits for each quadrant, thus forming a quadrant count tree. P-trees are somewhat similar in construction to other data structures in the literature (e.g., Quadtrees and HHcodes). It is a spatial data organization that provides a lossless compressed representation of a spatial data set and facilitates efficient classification and other data mining techniques. Each new component in a spatial data stream is converted to P-trees and then added to the training set as soon as possible. Using P-tree structure, fast calculation of measurements, such as information gain, can be achieved .With the information in P-trees, we can rapidly build the decision tree. The P-tree structure is used to build the classifier. We compare P-tree based decision tree induction classification and a classical decision tree induction method with respect to the speed at which the classifier can be built. Experimental results show that the P-tree method is significantly faster than existing classification methods, making it the preferred method for mining on spatial data streams.

We reorganize each bit file of the bSQ format into a Peano Count Tree (P-tree).For example, given a 8Ã—8 bSQ file (one-bit-one-band file), its Ptree is as shown

bSQ FILE

1 1 1 1 1 1 0 0

1 1 1 1 0 0 0 0

1 1 1 1 1 1 0 0

1 1 1 1 1 1 1 0

1 1 1 1 0 0 0 0

0 0 1 1 0 0 0 0

0 1 1 1 0 0 0 0

P Tree

/ / \ \

/ _ / \_ \

/ / \ \

16 ____7__ 13 0

/ / | \ / | \ \

2 0 4 1 4 4 1 4

//|\ //|\ //|\

1100 0010 0001

Figure 2.10 P-Tree for a 8x8 bSQ file

Figure 2.11 8x8 image and its p-tree

In this example, 55 is the number of 1's in the entire image. This root level is labeled level 0. The numbers 16, 8, 15, and 16 found at next level (level 1) are the 1-bit count for the four major quadrants in raster order, or Z order (upper left, upper right, lower left and lower right). Since the first and last level-1 quadrants are composed entirely of 1-bits (called pure-1 quadrants), sub-trees are not needed, and these branches terminate. Similarly, quadrants composed entirely of 0-bits are called pure-0 quadrants, which also cause termination of tree branches. This pattern is continued recursively using Peano, or Z-ordering (recursive raster ordering), of the four sub-quadrants at each new level. Eventually, every branch terminates (since, at the "leaf" level, all quadrants are pure). If we were to expand all sub-trees, including those for pure quadrants, then the leaf sequence would be the Peano-ordering of the image. Thus we use the name Peano Count Tree.

ADVANTAGES OF PEANO COUNT TREE

The main advantage of peano count trees is that we can avoid expanding pure 0 and pure 1 quadrants.

the storage space required for any given image is comparatively less than the bit sequential format of the image.

APPLICATIONS OF PEANO COUNT TREE

The Peano count tree (P-tree) technology provides an efficient way to store and mine images of any format, together with pertinent land data of still other formats.

P-trees are a convenient technology to mine all media involved in the way text is almost always mined today to extract pertinent features into tables and to then mine the tables (i.e., extract structured records from the unstructured text first)

We use a data structure, called the Peano Count Tree (P-tree) including new data structures and algorithms, to derive "confident" rules (high confidence only rules), especially for spatial data.

Peano count trees are is used in association rule mining.nice way of feature matching, as it doesnâ€™t need to scan the database at all. A few algebraic operations perform the job. Besides, the AND operation which produce the tuple p-trees is fast.

4. SYSTEM REQUIREMENTS

The present application developed needs a basic operating system on the machine. As the application is developed on Java technology a compiler cum runtime environment for java is required. For this we need to install JVM (Java Virtual Machine) on operating system if it doesnâ€™t possess JVM by default. As this application is developed on java technology this is platform independent i.e., this application runs on any environment (i.e., any operating system), only required criteria is it should contain a JVM to run the application.

5.Flow chart diagrams and algorithm.

Flow Chart Diagrams describes the data flow in the process.

Figure 4

6. Image conversion into peano count tree format

The generation of peano count tree from .bsq file is given as follows:

Input: The bit sequential format file.

Output: The peano count tree of the image.

Step1: The bit sequential format file consists of 24 delimiters ($).The bits till each delimiter are considered to be in one file.

Step2: For each file a peano count tree is generated. Follow the below mentioned steps for each peano count tree.

Step3: Pad zeroes in the file till the number of bits in rows and columns are equal and equal to any power of 2.

Step4: Count the total number of ones in the file.

Step5: If the number of ones are equal to the total number of bits in the file or zero then store it in a text document with .ptree extension.

Step6: Else store the value in a text document with .ptree extension and divide the file into four equal quadrants.

Step7: Count the total number of ones in each quadrant in raster order(z-order).

Step8: Go to step5.

7. IMPLEMENTATION AND RESULT

The system is developed in JAVA. From the experimentation it is understood that the proposed method is reliable to convert any image format into peano count tree format even though there is a small color variation. The developed system is tested with some examples which illustrate the generation of the basic peano count tree.

Description:

The above figure shows the steps to execute the written java code.

Description:

The above figure includes the .png image of size 207x212 which is first converted into . bSQ format and then into the peano count tree.

Description:The above figure shows the bit sequential format of the above .png image.

Description:

The above figure shows the peano count tree generated from the above .bSQ format of the .png image.

Conclusion:

The proposed system is tested with several images. All images of the formats like jpeg, gif, bmp and png are converted into bsq format and then into the basic peano count tree.

This application is capable of accepting several image formats like .jpg, .png, .gif, .bmp, etc. All images are converted to bitwise image first and then the processing starts. The results show that the proportionality of accuracy depends on the size of image and clarity of image. The red, green and blue colors presented in the .bsq image are separated and stored independently in bits hence maximum accuracy is possible.The bits thus stored are converted into the basic peano count tree format which reduces the storage space required for any given image.

As we are using free resources for development, this application is purely cost free and acts as open source for future enhancements.In this proposed system any image is converted to peano count tree format. Here maximum accuracy is possible why because we are comparing each bit values.Any system speaks about its future work to overcome its drawbacks and limitations or to extend the functionality of the system

This is applicable for converting any image into basic peano count tree format. This can be extended to the generation of value and tuple p-trees and also to searching of images, retrieval and comparison of images, where it requires additional implementation

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now

Mining Using The Traditional Approaches

Keywords: data streams, Compression, BSQ, Peano Ordering, P-Tree, Image mining, and retrieval.

1. Introduction

2. Image Mining System

Figure 1: General Image Mining System

2.1 PROBLEM SPECIFICATION

2.2 bit Sequential Format(bSQ)

bSQ example

Figure 2.4: Spatial data formats for a two band 2x2 image

Figure 2.5: .bSQ Image composition

Figure 2.6 : bSQ format for two band 2x2 image

2.3 QUAD TREES

Figure 2.7 Levels of a Quad tree

Figure 2.8 Quadrant order of Quad trees

Figure 2.9 Example of a Quad tree

3. PEANO COUNT TREES

__________/ / \ \__________

/ ___ / \___ \

/ / \ \

/ / | \ / | \ \

//|\ //|\ //|\

ADVANTAGES OF PEANO COUNT TREE

APPLICATIONS OF PEANO COUNT TREE

4. SYSTEM REQUIREMENTS

5.Flow chart diagrams and algorithm.

Figure 4

6. Image conversion into peano count tree format

7. IMPLEMENTATION AND RESULT

Description:

Description:

Description:

Conclusion:

Our Service Portfolio

Want To Place An Order Quickly?

Do not panic, you are at the right place

Get 20% Discount, Now £19 £14/ Per Page14 days delivery time

Get An Instant Quote

/ / \ \

/ _ / \_ \

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time