Sector And Its Application In Egovernance System

Published Date: 02 Nov 2017

E-Governance System

Abstract- Sector/SphereÂ is an open source software suite for high-performance distributed data storage and processing. The system was created by Dr. Yunhong Gu in 2006 and it is now maintained by a group of open source developers. Its goal is handling big data on commodity clusters. Sector/Sphere can be broadly compared with Hadoop. It uses open source UDP based data transfer protocol. Sector/Sphere has been written in C++. Sector is a distributed file system targeting data storage over a large number of commodity computers. There are four parts in Sector, namely, security server, master server, slave node and client. The security server maintains the system security configurations such as user accounts, data IO permissions, and IP access control lists. The master servers maintain file system metadata, schedule jobs and respond to usersâ€™ requests. Slave nodes are racks of computers that store and process data. The client includes tools and programming APIs to access and process Sector data. Sphere is the programming framework that supports massive in-storage parallel data processing for data stored in Sector. Sphere allows developers to write parallel data processing applications with a very simple set of API. It applies user-defined functions (UDF) on all input data segments in parallel. In a Sphere application, both inputs and outputs are Sector files. Sector/Sphere is unique in its ability to operate in aÂ wide area networkÂ (WAN) setting. Overall, it performs about 2-4 times faster than Hadoop. This paper covers introduction to Sector/Sphere, comparison of its performance to Hadoop and technical view points on its added advantages and future scope in this field. A proposed case study, having the use of Sector/Sphere for the implementation of an â€˜E-Governance Systemâ€™ is also discussed.

I. INTRODUCTION

The paper starts with a brief description of the Sector/Sphere system architecture. The components of the architecture are described. Performance and features in various aspects of Sector/Sphere is compared with Hadoop, and finally a case study on the application of Sector/Sphere to implement a nationwide e-Governance system is presented.

II. HANDLING BIG DATA

Today there is still a growing gap between the amount of data that is being produced and the capability of current systems to store and analyze these big data. Google has developed a proprietary storage system called the Google File System (GFS) and an associated processing system called MapReduce that has very successfully managed and integrated large-scale storage and data processing.[1]

Hadoop is an open source implementation of the GFS/MapReduce design. It is now the dominant open source platform for distributed data storage and parallel data processing over commodity servers. Hadoop has an impressive performance, but there are technical challenges being faced that need to be solved, in order to improve the performance and increase the applicability.

Sector/Sphere is an open source software suite for high-performance distributed data storage and processing. It consists of Sector, which is a distributed file system targeting data storage over a large number of commodity computers, and Sphere, which is the programming framework that supports massive in-storage parallel data processing for data stored in Sector.

The Sector/Sphere system was created by Dr. Yunhong Gu in 2006. It is now maintained by a group of open source developers.

There are some similarities to be found between GFS/MapReduce and Sector/Sphere. But, there are also some important differences. Unlike GFS and traditional file systems, Sector is an application-aware file system and it can organize files in a way to support efficient data processing. Unlike

MapReduce, Sphere allows arbitrary user-defined functions

(UDFs) to process independent data segments in parallel.[1]

Cloud computing platforms (GFS/MapReduce/BigTable and Hadoop) that have been developed thus far have been designed with two important restrictions. First, clouds have assumed that all the nodes in the cloud are co-located, i.e. within one data centre, or that there is relatively small bandwidth available between the geographically distributed clusters containing the data. Second, these clouds have assumed that individual inputs and outputs to the cloud are relatively small, although the aggregate data managed and processed are very large. [2]

These restrictions do not apply to the Sector/Sphere system. For Sector, it assumes that it has access to a large number of commodity computers, called nodes. The nodes may be located either within or across data centres. The second assumption made by Sphere is that high-speed networks connect the various nodes in the system. The datasets stored by Sector are divided into one or more separate files, which are called Sector slices. The different files comprising a dataset are replicated and distributed over the various nodes managed by Sector.

Sector/Sphere is written in C++. It is open source and available from sector.sf.net.[1]

III. ARCHITECTURE OF SECTOR/SPHERE SYSTEM

The Sector/Sphere system consists of four components, a security server, one or more master servers, one or more slave nodes and client nodes. Sector/Sphere assumes that the master and slaves are connected with a high-speed network.[1] Sector /Sphere has one security server that is responsible for authenticating master servers, slave nodes, and users. The security server can be configured to retrieve this information required for authentication from various sources (such as from a file, database, or LDAP service). There can be one or more master servers started. The master servers are responsible for maintaining the metadata of the storage system and for scheduling requests from users. All Sector/Sphere masters are active and can be inserted or removed at run time. The system load is balanced between the various master servers. The slave nodes in the Sector/Sphere system are the computers that actually store and process the data. Sector /Sphere assumes that the slave nodes are actually racks of commodity computers that have internal disks or are connected to external storage via a high-speed connection. Also, all slaves need to be inter-connected by high speed networks. The fourth component, a Sector client is a computer that issues requests to the Sector system and accepts responses. File IO is performed using UDT [3] to transfer data between the slave nodes as well as between the client node and slave nodes.

Fig.1. illustrates the architecture of Sector/Sphere.

Fig.1. Sector/Sphere system architecture

The security server maintains the system security policies such as user accounts and the IP access control list. One or more master servers control operations of the overall system in addition to responding to various user requests. The slave nodes store the data files and process them upon request. The clients are the users' computers from which system access and data processing requests are issued.

. IV. COMPARISON BETWEEN SECTOR/SPHERE AND HADOOP

Both of these systems, Sector/Sphere and Hadoop, are cloud-based systems designed to support data intensive computing. Both include distributed file systems and closely coupled systems for processing data in parallel.[4] Given here is a comparison between these two systems in terms of their design goals, architecture, distributed file system, replication, security, wide area data access, compatibility with existing systems, speed and performance, etc.

TABLE I: Comparison of the two systems with respect to their design goals

Sector/Sphere

Hadoop

It has a three layer functionality which consists of distributed file system, data collection, sharing and distribution over wide area networks (WAN), and massive in-storage parallel data processing.

It has a two layer functionality which consists of distributed file system, and massive in-storage parallel data processing.

A distinct advantage of Sector/Sphere over Hadoop is its ability to operate in a wide area network (WAN).

TABLE II: Comparison of the two systems with respect to their architecture

Sector/Sphere

Hadoop

The architecture consists of a master-slave system. The master stores metadata and the slaves store data.

There can be multiple active masters present.

The client nodes perform IO directly with the slave nodes.

The architecture consists of a master-slave system which has a single namenode and datanodes. The namenode stores metadata, datanodes store data.

Clients perform IO directly with datanodes.

Hadoop can have only a single namenode, which is a single point of failure, whereas, in Sector/Sphere, there can be multiple active masters present.

TABLE III: Comparison of the two systems with respect to their distributed file system

Sector/Sphere

Hadoop

Sector has general purpose IO.

It is optimized for large files.

It is file based, i.e., files are not split by Sector, but the users have to take care of it.

It uses replication for fault tolerance.

The HDFS is write once, read many, i.e., no random write can be done.

It is optimized for large files.

It is block based, with 64MB block as default.

It uses replication for fault tolerance.

Sector allows flexibility in the way files are split as it is a file based system, unlike the block based HDFS used in Hadoop.

TABLE III: Comparison of the two systems with respect to replication

Sector/Sphere

Hadoop

Per-file replica factor can be specified in a configuration file and can be changed at run-time.

Replicas are stored as far away as possible, within a distance limit which is configurable at per-file level.

Per-file replica factor is supported during file creation.

3 replicas are created by default, out of which 2 are on the same rack and the 3rd on a different rack as the original.

Sector is much more flexible than HDFS in terms of specifying the per-file replica. Also, the storage location can be configured in Sector, which is fixed in HDFS.

TABLE IV: Comparison of the two systems with respect to security

Sector/Sphere

Hadoop

A security server is present which manages user accounts and permissions, etc.

There is optional file transfer encryption available.

User authentication is done by a token based security framework.

There is no file transfer encryption.

Sector uses a separate server for its security requirements, such as for user authentication and permissions. There is no such separate server in Hadoop. The optional file transfer encryption facility can be of immense use in the case of sensitive data, a feature not available in Hadoop.

TABLE V: Comparison of the two systems with respect to wide area data access

Sector/Sphere

Hadoop

Sector ensures high performance data transfer with UDT, a high speed data transfer protocol.

As Sector pushes replicas far away from each other, a remote Sector client may find a nearby replica.

HDFS has no special considerations for wide area access.

Its security mechanism may cause problems for remote data access.

Sector/Sphere uses UDT, which is a reliable UDP based application level data transfer protocol for distributed data intensive applications over wide area high-speed networks. [3]

TABLE VI: Comparison of the two systems with respect to their compatibility with existing systems

Sector/Sphere

Hadoop

Sector files can be accessed from outside if necessary.

Data in HDFS can only be accessed vis HDFS interfaces.

Another major point of comparison between Sector/Sphere and Hadoop is that Sector/Sphere is written in C++ and Hadoop has been written in Java. The transfer protocol used in Hadoop is TCP and UDT is used in Sector/Sphere. Sector/Sphere performs about 2-4 times faster than Hadoop.[4]

Thus, Sector is a unique system that integrates distributed file system, content sharing network, and parallel data processing framework. Hadoop is mainly focused on large data processing within a single data centre.

V. CASE STUDY OF NATIONAL E-GOVERNANCE PLAN

NeGPÂ (NationalÂ e-GovernanceÂ Plan) is a plan of theÂ Government of IndiaÂ to make all government services available to the citizens ofÂ IndiaÂ via electronic media. This plan was an outcome of the recommendations of the secondAdministrative Reforms Commission.Â It is under the administration of theDepartment of Information TechnologyÂ of theÂ Ministry of Communications and Information Technology,Â Government of India.[5]

This plan has faced a lot of technological problems with respect to its implementation. One of the major problems was the lack of a technology for storage, distribution and processing of the vast amounts of data that this e-Governance system would require.

Sector provides scalable, fault-tolerant storage using commodity computers, while Sphere supports in-storage parallel data processing with a simplified programming interface.[6]

Consider the following graph for e-Governance system. At the first level, every state and union territory is a node in the graph, thus bringing us 35 nodes on the first level. At the next level, every district headquarter is a node. This gives 640 nodes on the second level. At the third level, every taluka headquarter is considered. This brings us approximately 5500 nodes on the level. The next level includes block development offices, approximately 50,000 in number. At the last level, every village Gram Panchayat is a node. Thus, we have about 6.5 lakh nodes in the last level of the graph.

For the processing of this massive graph, Sector/Sphere is proposed. The breadth first search (BFS) algorithm in graph analysis is used as the benchmark application for this parallel data processing engine. BFS is one of the fundamental approaches to many types of graph queries.[6]

Thus, using Sector/Sphere, this massive graph for Indian e-Governance system can be implemented and processed. Sector/Sphere has a unique data distribution feature, which will be instrumental in the implementation of a nationwide graph system. Another major advantage is the security of data provided by Sector/Sphere in the form of optional file transfer encryption. Thus, even for transfer of sensitive data, Sector/Sphere will be able to retain the confidential nature of the data.

Hence, using Sector/Sphere, the implementation of a National e-Governance System with the proposed graph structure given will be possible.

VI. CONCLUSION

Sector/Sphere supports distributed data storage, distribution, and processing over large clusters of commodity computers, either within a data center or across multiple data centers. Sector is a high performance, scalable, and secure distributed file system. Sphere is a high performance parallel data processing engine that can process Sector data files on the storage nodes with very simple programming interfaces.[7]

Sector and Sphere are designed for applications involving large, geographically distributed datasets in which the data can be naturally processed in parallel. Sector manages the large distributed datasets with high reliability, high performance IO and a uniform access. Sphere makes use of the Sector-distributed storage system to simplify data access, increase data IO bandwidth and to exploit wide-area, high-performance networks. Sphere presents a very simple programming interface by hiding data movement, load balancing and fault tolerance.

Sector/Sphere has been found to be much more efficient than Hadoop, in terms of its features and performance.

Sector/Sphere is open source and available for free download.

Sector/Sphere can be used to implement an e-Governance system in India, which will be useful in terms of the speed on communication, cost reduction, and transparency and accountability of the governing process.

ACKNOWLEDGMENT

I thank Mr. P. R. Sonawane, Assistant Professor, Army Institute of Technology, Pune for guidance and discussion on the subject.

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now

Sector And Its Application In Egovernance System

Our Service Portfolio

Want To Place An Order Quickly?

Do not panic, you are at the right place

Get 20% Discount, Now £19 £14/ Per Page14 days delivery time

Get An Instant Quote

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time