Evolution Of Grid Computing Systems And Its Framework

Published Date: 02 Nov 2017

PhD Thesis

Abstract

Grid computing means different things to different individuals. The grand vision is often presented as an analogy to power grids where users (or electrical appliances) get access to electricity through wall sockets with no care or consideration for where or how the electricity is actually generated. In this view of grid computing, computing becomes pervasive and individual users (or client applications) gain access to computing resources (processors, storage, data, applications, and so on) as needed with little or no knowledge of where those resources are located or what the underlying technologies, hardware, operating system, and so on are.

Early implementations of grid computing have tended to be internal to a particular company or organization. However, cross-organizational grids are also being implemented and will be an important part of computing and business optimization in the future.

The distinctions between intra-organizational grids and inter-organizational grids are not based in technological differences. Instead, they are based on configuration choices given: Security domains, degrees of isolation desired, type of policies and their scope, and contractual obligations between users and providers of the infrastructures. These issues are not fundamentally architectural in nature. It is in the industryâ€™s best interest to ensure that there is not an artificial split of distributed computing paradigms and models across organizational boundaries and internal IT infrastructures.

Grid computing involves an evolving set of open standards for Web services and interfaces that make services, or computing resources, available over the Internet. Very often grid technologies are used on homogeneous clusters, and they can add value on those clusters by assisting, for example, with scheduling or provisioning of the resources in the cluster. The term grid, and its related technologies, applies across this entire spectrum.

If we focus our attention on distributed computing solutions, then we could consider one definition of grid computing to be distributed computing across virtualized resources. The goal is to create the illusion of a simple yet large and powerful virtual computer out of a collection of connected (and possibly heterogeneous) systems sharing various combinations of resources.

Grid evolution means adding or removing capabilities to the grid. This research defines grid evolution as improving the performance of some nodes by adding new functions and/or equipment and removing unusable resources that affect the performance of some nodes. In this thesis researcher describes some new techniques of grid evolution which allowing it to be seamless and to operate at run time. Grid evolution also involves integration of software and hardware. Grid evolution can be of two distinct types, external and internal. Internal evolution occurs inside the grid boundary by migrating special resources such as application software from node to node inside the grid. While external evolution occurs between grids.

In this research work researcher developed a framework for grid evolution that insulates users from the different complexities of grids. This framework has a resource broker together with a grid monitor to work with internal and external evolution, the monitoring of the grid environment, increased resource utilization, advance reservation, fault tolerance and the high availability of grid resources.

Present framework of grid evolution starts when the grid receives a job whose requirements do not exist on the required node which triggers grid evolution. This node may receive all the required resources from some other nodes of the grid, this is internal evolution. If required resources are not available within the grid itself, resources may be received by this node from other nearby grid (permanent evolution) or send the job to other grid for execution (just in time) evolution. This evolution will be known as external evolution of the grid.

As a last step a simulation tool has been designed, developed and tested. It has been used for the creation of four grids, each of which has a different setup including different nodes, application software, data and policies. Different jobs have been assigned to these grids at run time. Then comparing the results and analyzing the performance of those grids that use the approach of evolution with those that do not. The results of these experiments have demonstrated that evolution of grid significantly improve the performance of grid environments and provide excellent scheduling results. These results also demonstrated that number of rejected jobs also decreasing significantly.

Declaration

Acknowledgement

Publications

List of Abbreviations

Chapter 1

Introduction

In our day-to-day requirements about computers we found that disk storage capacity, network bandwidth and processor speed requirements grows exponentially. This exponential growth requirement about computer hardware resources changes the overall scenario of information technology. High speed access of network information becomes the dominant feature of computing in the near future.

The word "grid" is taken from the phrase "electrical power grid". Power grid provides universal access to power without regard for where and how the power was produced. In the same way, a computational grid should provide pervasive access to high-end computing resources, without consideration for where and how the jobs are executed. Any user can use electricity by simply plugging into a wall socket, without being concerned with how and where the electricity used is generated. The concept of grid computing is similar to that of the power grid [48, 82].

Grid computing is an ambitious research area that addresses the

problem of efficiently using multi-institutional pools of resources. Its goal is to allow

coordinated and collaborative resource sharing and problem solving across several

institutions so as to solve large scale scientific problems that could not be easily

solved within the boundaries of a single institution. This concept of grid computing has encompassed a wide range of applications in both the scientific and

commercial fields [34].

The main goals of grid computing are [34]:

the provision of access to resources that cannot be accessed locally (known as "remote resources").

the simplification of collaboration between different organisations by the provision of direct access to computers, software and storage.

the faster execution of jobs due to their parallel execution on multiple resources (Multi-site computing/Co-allocation).

the provision of a framework for utilising unused computational power as well as unused storage capacity.

1.1 Research Motivation

In past few years, computing environments have evolved from single-user environments through multi-user environments, from single processor machines to Massively Parallel Processors (MPPs) machines, clusters of workstations and distributed systems to grid computing systems. This fast growth of computers requires scientists and engineers to solve complex problems and sophisticated applications. However every transition has brought new challenges and problems in its wake, as well as the need for technical innovation.

The this era of computing systems where millions of machines are interconnected via the Internet with various hardware and software configurations, access and authorization policies, capabilities, network topologies and so forth. This different kind of mix of hardware and software resources on the Internet has fuelled researchers' interest in investigating new ways to exploit this difficult combination of resources in an economical and efficient manner for different computer applications.

By nature Grid computing is heterogeneous. Different resources from different domain areas found on different sites whose resources are not owned or managed by a single administrator. This gives many challenges such as site autonomy, heterogeneous substrate and policy extensibility[23]. The Globus [33, 6] middleware toolkit addresses these issues by providing services to assist users in the utilisation of grid resources.

One of the components in grid computing is resource brokering, which insulates the user from the complexities of grid middleware by performing the task of mapping the user's job requirements to resources that can meet these requirements. This includes searching multiple administrative domains in order to find a single resource with which to execute the job, or scheduling a single job to use multiple resources across one or more sites. Resource brokering can be classified into two categories, system-centric and user-centric[51]. A system-centric broker allocates resources based on parameters that enhance system utilization and throughput. On the other hand, a user-centric broker such as Nimrod-G [14] adheres to the user's requirements. A key goal of grid computing is to deliver utility of computation as defined by these requirements. Grid resource brokers are thus focused on providing user-centric services. The grid must also cater for multiple users from different sites, who may potentially be interested in the same resource, without knowledge of each other's existence or interests [22, 35].

To maximise the benefits of grid computing, it is essential to keep track on all the available resources on different sites and to map the job to the most suitable resource. In grid computing, to locate a required resource on different sites and matching the job to suitable resources is a big challenge because of the large number of resources, their heterogeneity, availability and the variety of resource attributes related to software and hardware such as disk space and CPU load.

This research focuses on the problems related to the grid evolution. This also focuses on communication among the different grids and exchange of resources among the grids to satisfy job requirements and therefore reduced the number of rejection of jobs.

1.2 Research Methodology

The research method used in this research is a typical scientific research technique [50], which includes the following phases:

1. Literature Review.

The research literature phase expressed the research question by gathering the information and then studying and observing this information.

2. Modeling.

Modeling phase is to analyze and study the phenomena that expressed in the research question. So, the model has been built to understand and analyze the phenomena. This model includes both the computational model and the grid design.

3. Algorithmic Development.

In this phase, the novel technique with its algorithm has been established in order to handle and cope with the various issues involved.

4. Prototyping and Evaluation.

In this phase of research, the prototype based on the model has been built. The experiments also have been conducted and the results have been tested and evaluated by the researcher.

1.3 Research Question

Grids work better in intensive computing environment thatâ€™s why not every application is suitable or enabled to run on a grid.

So that the research question is :

How a grid can evolve in order to meet jobs requirement through obtaining the required resources within the grid or sending jobs to other grid to execute?

This question includes some of the following questions:

Transparency and evolution

How can the user be insulated from the complexities of middleware?

How can resource usage using evolution are maximized?

Fault Tolerance

How can the grid detect failures inside or in between the grids attached to each other?

How can a job complete its execution despite any hardware or software failures?

1.4 Contributions

The major contribution of this research work can be summarized as follows:

The development of the evolution framework that allows grids to communicate with each other in order to exchange application software, data and even jobs is one of the major contributions of this research work. This framework enhances performance and abilities of the evolvable grids. The result is the execution of a greater number of jobs and thus the ability to serve more users.

The design of a new grid is based on a new computational model for evolution. This will allow grid evolution to occur seamlessly and at run time when needed (just-in-time evolution).

The search of a new technique for grid monitoring inside and outside the grid environment. Because fault tolerance is an essential characteristic of grid evolution, a necessary function of such environments is required in order to avoid the loss of computational time. This technique allows grids to detect and handle, and thereby recover from failure.

The comparison between simulation environment and non simulation environment for the evaluation of the framework. The performance of the grid using the approach of evolution then compared and analyzed with that of a grid without using this approach. The results of these experiments have demonstrated that the performance of grid environment significantly improve and provide excellent scheduling results with a decrease in the number of rejected jobs.

1.6 Thesis Outline (to be update later on)

A summary of the work organize as follows.

Chapter 2 surveys the background and related work on all major research issues

covered in this thesis. Firstly, it surveys the architecture, types, characteristics and

features of grid computing. The history of grid evolution and generation is then outlined, a review of grid resource brokers is provided and overview of migration given

before major projects that utilize the grid are listed. GridFTP and its features are

surveyed, the current efforts in grid computing presented and the software evolution

is finally summarized.

Chapter 3 describe the techniques to evolution which have been classified as

external and internal. These are defined and their performance in grid environments

described.

Chapter 4 describes the behavior of the system, which consists of computational model objects and techniques, by giving the properties of its components' behaviors and their interactions.

Chapter 5 describes the system design, its structure and its components with

their functions. This chapter shows how this design is designed to support both external and internal grid evolution. It also describes how the resource broker insulates

the user from grid complexity.

Chapter 6 describes GridEvolutor, the simulation created to simulate these ideas, as well as compare the performance of the grid environment.

Chapter 7 evaluates the results using the simulation to demonstrate the notions of grid evolution, and describe the results of this evolution.

Chapter 8 concludes the thesis and outlines possibilities for future work.

Chapter 2

Background and Related Work

Objectives

To review the definition, architecture, types and aims of grid computing.

To describe grid evolution history and generation.

To describe grid technology infrastructure.

To review software evolution concepts.

To review existing resource brokers (schedulers).

To review the present state of migration in grid computing.

To discuss current efforts in grid computing.

2.1 Introduction

A distributed computer system consists of multiple software components that are on multiple computers, but run as a single system. The computers that are in a distributed system can be physically close together and connected by a local network, or they can be geographically distant and connected by a wide area network. A distributed system can consist of any number of possible configurations, such as mainframes, personal computers, workstations, minicomputers, and so on. The goal of distributed computing is to make such a network work as a single computer.

Distributed systems offer many benefits over centralized systems, including the following:

Scalability

The system can easily be expanded by adding more machines as needed.

Redundancy

Several machines can provide the same services, so if one is unavailable, work does not stop. Additionally, because many smaller machines can be used, this redundancy does not need to be prohibitively expensive.

Distributed computing systems can run on hardware that is provided by many vendors, and can use a variety of standards-based software components. Such systems are independent of the underlying software. They can run on various operating systems, and can use various communications protocols.

(Reconsider later on too)

Grid computing can be considered as an extension of distributed computing systems, one in which both the number and heterogeneity of the systems are much greater than usual. It promotes the sharing of distributed resources that may be heterogeneous in nature, and enables scientists and engineers to solve large scale computing problems. With grid computing, computer applications can efficiently utilise computing and data resources and combine them for large capacity workloads. The primary benefit of grid computing is the ability to coordinate and share resources [15, 59]. Many challenges face grid computing; one of them is to find all the required resources for a grid job, resources that may reside in other grid sites. This research proposes the evolution as a solution for this problem. This chapter surveys the architecture, types, characteristics and features of grid computing in Section 2.2. The history of grid evolution and its generation are then outlined in Section 2.3. Grid technology infrastructure is explained in Section 2.4, a review of grid resource brokers is provided in Section 2.6, current efforts in grid computing in Section 2.5, and a survey of migration is finally summarised in Section 2.7.

2.2 Grid Computing

The distinctions between intraorganizational grids and interorganizational grids are not based in technological differences. Instead, they are based on configuration choices given: Security domains, degrees of isolation desired, type of policies and their scope, and contractual obligations between users and providers of the infrastructures. These issues are not fundamentally architectural in nature. It is in the industryâ€™s best interest to ensure that there is not an artificial split of distributed computing paradigms and models across organizational boundaries and internal IT infrastructures.

Ian Foster [32] gives a simple three-point checklist to define a grid:

1. It coordinates resources that are not subject to centralised control. A grid integrates and coordinates resources and users that exist within different control domains. The grid paradigm enables the coordination and sharing of a large number of geographically dispersed heterogeneous resources.

2. It uses standard, open, general-purpose protocols and interfaces. A grid is built from multi-purpose protocols and interfaces that can be shown in its architecture.

3. It delivers nontrivial qualities of service. A grid allows its ingredient resources to be used in a coordinated style to deliver various qualities of service such as response time and security.

Grid computing has been defined as a collection of nodes connected via a network. These nodes may be identified as computational, storage and network resources. They also share services, computing power and other resources such as disk storage database and software applications. They are under the control of different administrative domains with different levels of security. Grid computing is a way to share computing resources.

2.2.1 Features and Objectives of Grid Computing

Grid computing aims to achieve the following goals [48, 34]:

The sharing of distributed and heterogeneous computing resources belonging to different organisations.

Grid computing is the sharing, selection and aggregation of a group of resources such as supercomputers, mainframes, storage systems, data sources and management systems that behave like networks of computation [55, 3]. It promotes the sharing of distributed resources that may be heterogeneous in nature. The primary benefit of grid computing is the ability to coordinate and share distributed and heterogeneous resources such as Sharing Infrastructure and REsources iN Europe (SIEREN) [46].

SIEREN is a cooperative association between twelve European countries whose purpose is to share their infrastructure and resources.

The exploitation of underutilised resources.

In most organisations and companies there are huge numbers of underutilised computing resources. Most of these resources are only busy less than 5% of the time. In most organisations these resources are relatively idle. Grid computing is designed to exploit these underutilised resources and increase the efficiency of resource usage. Users can also rent the resources that reside on the grid for executing their computationally intensive applications, instead of purchasing their own dedicated (and expensive) resources.

The enablement and simplification of collaboration among different organisations.

Another capability of grid computing is the provision of an environment for collaboration between organisations. Grid computing enables very heterogeneous and distributed systems to work together, thereby simplifying collaboration between different organisations by providing direct access to computers, software and data storage.

The provision of a single login service with secure access to grid resources while protecting security for both users and remote sites.

Grids provide a single login service to all users over all distributed resources using grid authentication mechanisms. They also provide secure access to any information anywhere over any type of network. This is achieved by providing access control mechanisms that govern these resources.

The provision of resource management, information services, monitoring and secure data transportation.

The shared resources and networks involved in grid computing are difficult to manage and monitor, but grid computing is able to meet these challenges because of its architecture and protocols.

The solution of large scale problems.

Grids are designed to exploit under utilised resources, meaning that they can employ a large number of them to solve a large scale problem; it promising solution for problems like storage and processing of massive amounts of data that even mainframe computers cannot handle such as weather forecasting. These resources may be high capability devices such as high capacity disk storage and high performance computing.

The provision of fast results, delivered more efficiently.

Grid computing obtains results quickly and more efficiently, because it enables parallel processing; it may also have high capability devices. With grid computing, businesses can efficiently utilise computing and data resources and combine them for large capacity workloads.

2.2.2 Characteristics of Grid Environments

Grid computing has emerged as a new paradigm for distributed computing. Traditionally distributed computing has been defined as systems with a fixed number of nodes. With the advent of grid computing, the number of nodes in distributed systems is constantly changing, and such system are thus known as dynamic distributed systems. It has the following characteristics [43]:

Heterogeneous resources.

Resources have highly heterogeneous configurations: there are different architectural platforms, different CPU speeds, memory hierarchy layouts and capacities, disk space and software configurations.

Dynamic resource availability.

The absence of tight controls on volunteer computing-based grid environments can cause a resource to withdraw from the computation at any time based on the wishes of its owner. Resources might non-deterministically join or leave the available set of resources which mean that these resources can join and leave the grid at any time. This renders the grid a best-effort computation medium, by analogy to the internet's best-effort communication medium.

Dynamic resource load.

Applications run in a shared environment, where external jobs compete for the same resources. The fluctuations of the machine loads are therefore expected to be high. A machine might be lightly loaded now and heavily loaded the next minute.

2.2.3 Grid Architecture

To manage distributed systems we need different technologies, but traditional distributed technologies do not allow/support an integrated approach to manage wide variety of required resources and services. These technologies do not provide flexibility and control that is needed to enable the type of resource sharing necessary. To manage and support the heterogeneous aspects of the grid there is a need of grid software infrastructure.

In [36, 37, 41], the grid strongly emphasises interoperability, as it is vital to ensure that virtual organisation participants can dynamically share heterogeneous resources. The grid infrastructure is based on a standard open architecture which provides extensibility, interoperability, portability and code sharing. This architecture divided into several layers and organises components accordingly, as shown in Fig 2.1.

Components within each layer share common characteristics, but can build on the capabilities and behaviors of any lower layer.

Application

Collective

Resource

Connectivity

Fabric

Figure 2.1: Layers of Grid Architecture [37]

1. Fabric Layer

The fabric layer comprises the resources in the grid. This resource can be either a logical (such as a distributed file system, computer cluster or distributed computer pool) or a physical resource (such as a computational resource, storage system, catalogue, network resource or sensor). This layer provides the lowest level of access to actual native resources, and implements the low-level mechanisms that allow those resources to be accessed and used. More specifically, those mechanisms must include at least state enquiry and resource management mechanisms, each of which must be implemented for a large variety of local systems.

2. Connectivity layer

The connectivity layer provides the core communication and authentication protocols required for grid-specific network transactions. These protocols provide cryptographically secure mechanisms by which to verify the identified grid users and resources. Many communication protocols in the connectivity layer are drawn from TCP/IP protocols stack such as IP [63], ICMP [62], TCP [64], UDP [61] and DNS [57].

3. Resource Layer

This layer builds on the connectivity layer in order to implement protocols that enable the use and sharing of individual resources such as the Grid Resource Access and Management protocol (GRAMP) used to allocate and monitor resources. Two fundamental components of this layer are information protocols, for querying the state of a resource by calling fabric layer functions to control and access resources, and management protocols, used to negotiate access to a shared resource.

4. Collective Layer

This layer provides protocols such as the Grid Resource Information Protocol (GRIP) [37] for interacting across collections of resources. In other words, it focuses on the coordination of multiple resources. It includes directory, coallocation, scheduling, brokerage, monitoring and diagnostics, data replication, software discovery, community accounting and payment services.

5. Application Layer

This is the final layer in grid architecture, and contains the user applications that operate in a grid environment. It includes languages and frameworks. These frameworks may themselves define protocols such as Simple Work flow Access Protocol (SWAP) [78], services, and/or an Application Program Interface (API).

2.2.4 Grid Types

From the perspective of application, grids [49, 51] have generally been classified into three categories based on the solutions they seek to provide: computational grids that provide access to heterogeneous resources, data grids that provide data services such as storage, management and data access, and service grids that supply services not provided by any single machine, as described below:

2.2.4.1 Computational Grid

A computational grid is a collection of computing resources that may represent computers on multiple networks and locations, heterogeneous platforms and separate administrative domains with several owners. The purpose of a grid is to run very large applications such as weather forecasting. In this type of grid, most machines are high performance servers. The resources are aggregated so as to act as a unified processing resource. The increase in the speed and reliability of networks, in addition to the use of distribution protocols, is an important factor that has led to the use of grids, which enable users to remotely utilise resources owned by different providers. Computational grids fall into the categories of distributed supercomputing and high throughput computing, depending on how they utilise resources. In the past, the grid executes the jobs in parallel on various resources, reducing completion times. Jobs requiring this category of grid are those that present huge problems. By contrast, high throughputs increase job completion rates. Grid computing can solve the most complex as well as the most vital problems that face scientific and industrial organisations and even governments. The possible applications of grid computing range from the financial (analysing the value of investments, scaling businesses and leveraging existing hardware in investments and resources) through product line (reducing designs, numbers of products and operational expenses) and scalability (creating scalable and flexible enterprise IT infrastructure) to the scientific (connecting research teams located in different geographical areas, accelerating research and production processes) [2].

2.2.4.2 Data Grid

In this type of grid, a large amount of data is distributed and/or replicated to remote sites, spread over worldwide. In general, a data grid refers to a system responsible for storing data and providing access to users authorised to share it. In other words, data grids provide an infrastructure for creating new information data repositories such as data warehouses or digital libraries that are distributed across several networks [51]. The aims of data grids overlap with those of heterogeneous distributed database systems, which deal with various kinds of database management systems such as hardware, operating systems and network connections distributed across a heterogeneous environment. Data grid provides infrastructures to support data storage, discovery, handling, publication and manipulation. Enterprise data usually possesses the characteristics of large scale, dynamism, autonomy and distributed data sources. In the academic world, there is a desire to share expensive experimental data in order to coordinate research. In business the need for data sharing is even more urgent. Reasons include the requirement for business data to be in real-time for applications such as e-marketing, where it is very important to maintain an up-to-date and consistent product catalogue. The objective of data grids as presented in [27] is to integrate heterogeneous data archives into a distributed data management "grid" in order to identify services for high performance, distributed, data-intensive computing, and to enable users to bring out relevant information from the distributed databases.

Data grids are compatible with computational grids and can integrate storage and computation. One practical application of a data grid is the EU's DataGrid [71]. This is a project financed by the European Union that relies upon developing computational grid technologies allowing distributed files, databases, computers, scientific tools and devices. The main goal of Data Grid is to build the next generation of computing infrastructure in order to develop and test the technological infrastructure that will enable the sharing of large-scale databases.

2.2.4.3 Service Grid

The Service grid represents and enables a set of services provided by a set of resources. Service grids fall into three categories: on-demand service grids enable real time interaction, collaborative service grids aggregate various resources to provide new services, and multimedia service grids supply the infrastructure for real-time multimedia applications.

This research is considering and concerning with both computational grid and data grid.

2.3 History of Grid Evolution

This section outlines the history of grid evolution by describing the stages that grids passed through on as they evolved into grid computing.

2.3.2 Stages of Development

As per the different topology perspective, the grid has been classified into clusters, intra-grids and extra-grids [37]. These classifications denote the stages of development of grids that are born as clusters, evolve into intra-grids and finally become extra-grids, as shown in Fig. 2.2.

2.3.2.1 Cluster

Clusters are critical to the evolution of a grid; they are the smallest grids in size and scope. They are aggregations of servers in order to provide increased computing power, as compared to stand-alone computers. Cluster computing is built on unit processors and commodity operation systems. Clusters are designed to solve problems for particular groups of people within the same department. They are implemented within campus intranets by incorporating PC's, data and servers to maximise the use of computing resources and to increase user job throughput. Cluster grids can therefore operate within a heterogeneous environment consisting of mixed server types, operating systems and workloads. Resources can be accessed at a particular known point in the grid, which has only one job queue [37].

2.3.2.2 Intragrid

A typical intragrid topology, as illustrated in Figure 1-5, exists within a single organization, providing a basic set of Grid services. The single organization could be made up of a number of computers that share a common security domain, and share data internally on a private network. The primary characteristics of an intragrid are a single security provider, bandwidth on the private network is high and always available, and there is a single environment within a single network. Within an intragrid, it is easier to design and operate computational and data grids. An intragrid provides a relatively static set of computing resources and the ability to easily share data between grid systems. The business might think as an intragrid appropriate if the business has an initiative to gain economies of scale on internal job management, or wants to start exploring the use of a grid internally first by enabling vertical enterprise applications.

Figure 1-5 : An Intragrid

2.3.2.3 Extragrid

Based on a single organization, the extragrid expands on the concept by bringing together two or more intragrids. An extragrid, as illustrated in Figure 1-5, typically involves more than one security provider, and the level of management complexity increases. The primary characteristics of an extragrid are dispersed security, multiple organizations, and remote/WAN connectivity.

Within an extragrid, the resources become more dynamic and your grid needs to be more reactive to failed resources and failed components. The design becomes more complicated and information services become relevant to ensure that grid resources have access to workload management at run time.

Figure 1-5 : An Extragrid

2.3.2.4 Inter-grid

An intergrid requires the dynamic integration of applications, resources, and services with patterns, customers, and any other authorized organizations that will obtain access to the grid via the internet/WAN. An intergrid topology, is primarily used by engineering firms, life science industries, manufacturers, and by businesses in the financial industry.

The primary characteristics of an intergrid include dispersed security, multiple organizations, and remote/WAN connectivity. The data in an intergrid is global public data, and applications (both vertical and horizontal) must be modified for a global audience. A business may deem an intergrid necessary if there is a need for peer-to-peer computing, a collaborative computing community, or simplified end-to-end processes with the organizations that will use the intergrid.

2.4 Grid Technology and its Infrastructure

Because equipment is supplied by several vendors, interoperability becomes critical and high standards are necessary, but itâ€™s not easy to interchange parts and procedures between different vendors. The multilayer architecture of grid involves a modular system structure which makes it easier. For standardising grid specifications, protocols and interfaces, the Globus alliance and Open Grid Forum (OGF) were established, as explained below.

2.4.1 Globus Alliance

Globus Alliance [6] is an international collaboration of organisations and individuals conducting research for the development of fundamental grid technologies. Globus Alliance introduced open source software called Globus Toolkit for building grid systems and applications. The Globus Toolkit (GT) has been developed since the late 1990s to support the development of service-oriented distributed applications and infrastructures. Core GT components address basic issues relating to security, resource access and management, data movement and management, resource discovery, and so forth. Other projects have contributed to a broader Globus universe of tools and components that build on core GT functionality to provide many useful application-level functions. These tools have been used to develop a wide variety of Grid systems and applications.

Version 4 of the Globus Toolkit, GT4, released in early 2005, represents a significant advance relative to earlier releases in terms of the range of components provided, functionality, standards conformance, usability, and quality of documentation. The architecture of Globus Toolkit 4 is shown in Fig. 2.3.

Figure 2.3 : Globus Toolkit 4 Architecture

As shown in the figure, GT4 comprises both a set of service implementations ("server" code) and associated "client" code. GT4 provides both Web services (WS) components (on the left) and non-WS components (on the right). The white boxes in the "client" domain denote custom applications and/or third-party tools that access GT4 services or GT4-enabled services.

2.4.1.1 Predefined GT4 Services and Other Components

GT4 provides a set of predefined services, describe in a little more detail in next section. Nine GT4 services implement Web services (WS) interfaces: job management (GRAM); reliable file transfer (RFT); delegation; MDS-Index, MDS-Trigger, and archiver (collectively termed the Monitoring and Discovery System, or MDS); community authorization (CAS); OGSA-DAI data access and integration; and GTCP Grid TeleControl Protocol for online control of instrumentation. Of these, archiver, GTCP, and OGSA-DAI are "tech previews," meaning that their interface and implementation is likely to change in the future.

For two of those services, GRAM and MDS-Index, pre-WS "legacy" implementations are provided. They will be deprecated at some future time as experience is gained with WS implementations.

For three additional GT4 services, WS interfaces are not yet provided (but will be in the future): GridFTP data transport, replica location service (RLS), and MyProxy online credential repository.

Other libraries implement various security functionality, while the eXtensible I/O (XIO) library provides convenient access to a variety of underlying transport protocols. SimpleCA is a lightweight certification authority.

2.4.1.2 Globus Universe

GT4 components do not, in general, address end-user needs directly: they are more akin to a TCP/IP library or Web server implementation than a Web browser. Instead, GT4 enables a range of end-user components and tools that provide higher-level capabilities attuned to the needs of specific user communities. These components and tools constitute, together with GT4 itself, the "Globus universe." We introduce here some of its principal elements.

For the purposes of this presentation, we assign each Globus universe component to one of the following classes.

Execution management tools are concerned with the initiation, monitoring, management, scheduling, and/or coordination of remote computations.

Data management tools are concerned with data location, transfer, and management.

Interface tools are concerned with providing or supporting the development of graphical user interfaces for end-user or system administration applications.

Security tools are concerned with such issues as mapping between Grid credentials and other forms of credential and managing authorization policies.

Monitoring and discovery tools are concerned with monitoring various aspects of system behavior, managing monitoring data, discovering services, etc.

2.4.1.2.1 Execution Management

Execution management tools are concerned with the initiation, monitoring, management, scheduling, and/or coordination of remote computations. GT4 supports the Grid Resource Allocation and Management (GRAM) interface as a basic mechanism for these purposes. Its GRAM server is typically deployed in conjunction with Delegation and GridFTP servers to address data staging, delegation of proxy credentials, and computation monitoring and management in an integrated manner.

Associated tools fall into three main classes. First, we have GRAM-enabled schedulers for clusters or other computers on a local area network (Condor, OpenPBS, Torque, PBSPro, SGE, LSF). Second, we have systems that provide different interfaces to remote computers (OpenSSH) or that implement various parallel programming models in Grid environments by using GRAM to dispatch tasks to remote computers (Condor-G, DAGman, MPICH-G2, GriPhyN VDS, Nimrod-G). Third, we have various "meta-schedulers" that map different tasks to different clusters (CSF, Maui).

Name

Purpose

Grid Resource Allocation & Management service

GRAM service supports submission, monitoring, and control of jobs on computers. Interfaces to Unix shell ("fork"), Platform LSF, PBS, and Condor schedulers; others may be developed. Includes support for MPICH-G2 jobs: multi-job submission, process coordination in a job, sub-job coordination in a multi-job.

Java CoG Kit Workflow

Uses the Karajan workflow engine that supports DAGs, conditions, & loops; directs tasks to GRAM servers for execution.

Community Scheduler Framework

CSF is an open source meta-scheduler based on the WS-Agreement specification.

GSI OpenSSH

Version of OpenSSH that supports GSI authentication. Provides remote terminal (SSH) and file copy (SCP) functions.

Condor-G

Manage the execution of jobs on remote GRAM-enabled computers, addressing job monitoring, logging, notification, policy enforcement, fault tolerance, and credential management.

DAGman

Manage the execution of directed acyclic graphs (DAGs) of tasks that communicate by writing/reading files; works with Condor-G.

MPICH-G2

Execute parallel Message Passing Interface (MPI) programs over one or more distributed computers.

Nimrod-G

Graphical specification of parameter studies, and management of their execution on distributed computers.

Ninf-G

An implementation of the GridRPC remote procedure call specification, for accessing remote services.

GriPhyN Virtual Data System

Tools for defining, scheduling, and managing complex data-intensive workflows. Workflows can be defined via a high-level virtual data language; a virtual data catalog is used to track current and past executions. Includes heuristics for job and data placement. Uses DAGman/Condor-G for execution management.

Condor, OpenPBS, Torque, PBSPro, Sun Grid Engine, Load Sharing Facility

Schedulers to which GSI-authenticated access is provided via a GRAM interface. The open source Condor is specialized for managing pools of desktop systems. OpenPBS and Torque are open source versions of the Portable Batch System (PBS) cluster scheduler; PBSPro is a commercial version produced by Altair. SGE is also available in both open source and commercial versions. LSF is a commercial system produced by Platform.

Maui Scheduler

An advanced job scheduler for use on clusters and supercomputers, with support for meta-scheduling.

2.4.1.2.2 Data Management

Data management tools are concerned with the location, transfer, and management of distributed data. GT4 provides a variety of basic tools, including GridFTP for high-performance and reliable data transport, RLS for maintaining location information for replicated files, and OGSA-DAI for accessing and integrated structured and semistructured data.

Associated tools enhance GT4 components by addressing storage reservation (NeST), providing a command-line client for GridFTP (UberFTP), providing a uniform interface to distributed data (SRB), and supporting distributed data processing pipelines (DataCutter, STORM).

Name

Purpose

GridFTP server

Enhanced FTP server supporting GSI authentication and high-performance throughput. Iinterfaces to Unix POSIX, HPSS, GFPS, and Unitree provided; others can be developed.

globus-url-copy

Non-interactive command-line client for GridFTP.

Replica Location Service

RLS is a decentralized service for registering and discovering information about replicated files.

Reliable File Transfer service

RFT controls and monitors third-party, multi-file transfers using GridFTP. Features exponential back-off on failure, all or none transfers of multi-file sets, optional use of parallel streams and TCP buffer size tuning, and recursive directory transfer.

Lightweight Data Replicator

LDR is a tool for replicating data to a set of sites. It builds on GridFTP, RLS, and pyGlobus.

OGSA Data Access & Integration

OGSA-DAI is an extensible framework for accessing and integrating data resources, including relational and XML databases and semistructured files.

Network Storage

NeST allows GridFTP clients to negotiate reservations for disk space, which then apply to subsequent transfers.

UberFTP

Interactive command-line client for GridFTP.

Storage Resource Broker

Client-server middleware that provides a uniform interface for connecting to heterogeneous, distributed data resources. GSI authentication and GridFTP transport.

DataCutter & STORM

DataCutter supports processing of large datasets via the execution of distributed pipelines of application-specific processing modules; STORM supports relational data.

2.4.1.2.3 Interface

Grid portal and user interface tools support the construction of graphical user interfaces for invoking, monitoring, and/or managing activities involving Grid resources.

Many (but not all) of these tools are concerned with enabling access to Grid systems from Web browsers. Many (but not all) of such Web browser-oriented systems are based on a three-tier architecture, in which a middle-tier portal server (e.g., uPortal with Tomcat, or GridSphere) hosts JSR 168-compliant portlets that both (a) generate the various elements of the first-tier Web interface with which users interact and (b) interact with third-tier Grid resources and services.

Name

Purpose

Java CoG Desktop

Java application that provides a "desktop" interface to a Grid, so that for example a job is run, and a file copied, by dragging and dropping its description to a computer and storage system, respectively.

WebMDS

Uses XSLT to generate custom displays of monitoring data, whether from active services or archives.

Portal User Registration Service

PURSe provides for the Web-based registration of users and the subsequent generation and management of their GSI credentials, thus allowing easy access to Grid resources by large user communities.

Open Grid Computing Environment

OGCE packages a range of components, including JSR 158-compliant portlets for proxy management, remote command execution, remote file management, and GPIR-based information services.

GridSphere

An open source JSR 168-compliant portlet environment.

Sakai

JSR 168-compatible system for distributed learning and collaborative work, with tools for chat, shared documents, etc.

2.4.1.2.4 Security

Security tools are concerned with establishing the identity of users or services (authentication), protecting communications, and determining who is allowed to perform what actions (authorization), as well as with supporting functions such as managing user credentials and maintaining group membership information.

GT4 provides distinct WS and pre-WS authentication and authorization capabilities. Both build on the same base, namely standard X.509 end entity certificates and proxy certificates, which are used to identify persistent entities such as users and servers and to support the temporary delegation of privileges to other entities, respectively.

GT4â€™s WS security [1] comprises (a) Message-Level Security mechanisms, which implement the WS-Security standard and the WS-SecureConversation specification to provide message protection for GT4â€™s SOAP messages, and (b) an Authorization Framework that allows for a variety of authorization schemes, including a "grid-mapfile" access control list, an access control list defined by a service, a custom authorization handler, and access to an authorization service via the SAML protocol. For non-WS components, GT4 provides similar authentication, delegation, and authorization mechanisms, although with fewer authorization options.

Name

Purpose

Message-Level Security

Implements WS-Security standard and WS-SecureConversation specification to provide message protection for SOAP messages.

Authorization Framework

Allows for a variety of authorization schemes, including file- and service-based access control lists, custom handles, SAML protocol.

Pre-WS A&A

Authentication, delegation, authorization for non-WS components.

Delegation Service

Enable storage and subsequent (authorized) retrieval of proxy credentials, thus enabling delegation when using WS protocols.

Community Authorization Service

Issues assertions to users granting fine-grained access rights to resources. Servers recognize and enforce the assertions. CAS is currently supported by the GridFTP server.

Simple CA

A simplified certification authority for issuing X.509 credentials.

MyProxy service

Allow federation of X509 and other authentication mechanisms (e.g., username/password, one-time passwords) via SASL/PAM.

VOMS

Database of user roles and capabilities, and user client interface that supports retrieval of attribute certificates for presentation to VOMS-enabled services.

VOX & VOMRS

Extends VOMS to provide Web registration capabilities, rather like PURSe.

PERMIS

Authorization service accessible via SAML protocol.

GUMS

Grid User Management System: an alternative to grid map files.

KX509 & KCA

KX509 is a "Kerberized" client that generates and stores proxy credentials, so users authenticated via Kerberos can access the Grid; KCA is a Kerberized certification authority used to support KX509.

PKINIT

A service that allows users with Grid credentials to authenticate to a Kerberos domain.

2.4.1.2.5 Monitoring and Discovery

Monitoring and discovery mechanisms are concerned with obtaining, distributing, indexing, archiving, and otherwise processing information about the configuration and state of services and resources. In some cases, the motivation for collecting this information is to enable discovery of services or resources; in other cases, it is to enable monitoring of system status.

GT4â€™s support in its Java, C, and Python WS Core for WSRF and WS-Notification interfaces provides useful building blocks for monitoring and discovery, enabling the definition of properties for which monitoring and discovery is be provided, and subsequent pull- and push-mode access. GT4 services such as GRAM and RFT define appropriate resource properties, providing a basis for service discovery and monitoring. Other GT4 services are designed to enable discovery and monitoring, providing for indexing, archiving, and analysis of data for significant events.

Name

Purpose

Java, C, and Python WS Cores

Implements WSRF and WS-Notification specifications, thus allowing Web services to define, and allow access to, resource properties. Container incorporates a local index, enabling discovery of services.

Index service

Collects live monitoring information from services and enable queries against that information.

Trigger service

Compares live monitoring information against rules to detect fault conditions, and notifies operators (for example, by email)

Archiver service

Store historical monitoring data data and enable queries against that data.

Aggregator framework

The aggregator framework facilitates the building of aggregating services (for example the index, trigger and archive services).

Hawkeye

Monitor individual clusters, using Condor as a base. GT4 includes data provider that makes status information available in GLUE schema.

Ganglia

Monitor individual clusters and sets of clusters. GT4 includes data provider that makes status information available in GLUE schema.

Inca

Monitor services in a distributed system by performing a set of specified tests at specified intervals; publish results of these tests.

NetLogger

Generate, collect, and analyze high frequency data from distributed system components.

2.4.2 Open Grid Forum (OGF)

The Open Grid Forum (OGF) [31] is a community of users, developers, and vendors leading the global standardization effort for distributed computing (including clusters, grids and clouds). The OGF community consists of thousands of individuals in industry and research, representing over 400 organizations in more than 50 countries. Together we work to accelerate adoption of grid computing worldwide because we believe grids will lead to new discoveries, new opportunities, and better business practices.Â Various research groups within OGF have created many standards such as Open Grid Service Architecture (OGSA) to present a service-oriented view of the shared physical resources or services supported by theses resources, Open Grid Services Infrastructure (OGSI) to define mechanisms for creating and managing grid services (this acts as technical specification for implementing grid services), GridFTP and JSDL [7, 4] ; many other issues are currently being worked on.

2.4.2.1 Job Submission Description Language (JSDL) [7]

JSDL provides an XML-based language specifically for describing single job submission requirements. Since many different job management systems exist in distributed, heterogeneous computing systems, such as grids, a primary goal of JSDL is to provide a common language for describing job submission requirements. Hence, the JSDL vocabulary is informed by a number of existing job management systems such as Condor, Globus, Load Sharing Facility, Portable Batch System, Sun Grid Engine, and Unicore.

JSDL focuses on single job submission description and it must be combined with other specifications, from OGF or other standards bodies, to address broader requirements in job or workflow management. For example, JSDL is used with the OGSA Basic Execution Service, an OGF specification that provides a job submission and management interface. JSDL can also be used with BPEL as part of workflows. JSDL can also be combined with other scheduling, service agreement [WS-Agreement], or job policy languages. Attribute and element extensions are also allowed.

JSDL provides elements for:

Job identification. This includes a JobName, a description (any string for human consumption), a JobAnnotation (any string that may contain information for machine consumption) and a JobProject to which the job belongs.

Application information. This includes a name, description, and version number. This description can be extended with application-specific information. A normative extension for describing a POSIX application, including environment settings such as file size limit and core dump limit, is specified.

Resource requirements. As to be expected, the possible resource requirements are extensive, including 27 main elements, such as OS types, CPU types, file system types, physical memory, disk space, network bandwidth, and more.

Data requirements. The data requirement elements allow files to be identified that must be staged-in (to the remote host) prior to execution, and staged-out afterwards.

2.4.2.2 GridFTP

The GridFTP facility provides secure and reliable data transfer between grid hosts. Its protocol extends the well-known FTP standard to provide additional features, including support for authentication through GSI. One of the major features of GridFTP is that it enables third-party transfer. Third-party transfer is suitable for an environment where there is a large file in remote storage and the client wants to copy it to another remote server.

Developed by the Open Grid Forum, GridFTP was designed to provide reliable, efficient and secure access, and to transfer huge amounts of data between distributed resources in the grid using many facilities such as multi-streamed transfer, auto-tuning and globus-based security. It is a kind of service offered by grid computing.

GridFTP is very important in this work because it can be the basis for grid evolution through the use of aspects of migration.

The FTP protocol was attractive for the following reasons:

It is one of the most common data transfer protocols, as well as being the most likely candidate to meet a grid's needs.

It includes many features, such as its provision of a well-defined architecture and the fact that it is used extensively.

It is a widely implemented and well-understood Internet Engineering Task Force (IETF) [44] standard protocol.

Its support for third party transfers, which means its ability to directly transfer between two servers in client/server and other transfers.

Numerous groups have added various extensions through the IETF. Some of these extensions would be particularly useful in grids.

GridFTP has the following features:

Grid Security Infrastructure (GSI) and Kerberos support.

Robust and flexible authentication, integrity, and confidentiality features are critical when transferring or accessing files. GridFTP must therefore support GSI and Kerberos authentication. It provides this capability by implementing the Generic Security Services Application Program Interface (GSSAPI) [54] authentication mechanisms.

Third-party control of data transfer.

In order to manage sets of large data for large distributed communities, it is essential to control of transfers between storage servers by providing third party. GridFTP provides this capability by adding GSSAPI security to the existing third-party transfer capability defined in the standard FTP. Third party operation allows a user at one site to initiate, control and monitor a data transfer operation between two other parties (source and destination).

Parallel data transfer.

Using multiple Transmission Control Protocol (TCP) [65] streams in parallel (even between the same source and destination) on wide-area links can improve aggregate bandwidth over using a single TCP stream. GridFTP supports parallel data transfer through FTP command extensions and data channel extensions.

Striped data transfer.

Data may be striped or interleaved across multiple servers. Striped transfers provide further bandwidth improvements over those achieved with parallel transfers. GridFTP includes extensions that initiate striped transfers, which use multiple TCP streams to transfer data that is partitioned among multiple servers.

Partial file transfer.

The transfer of partial files required from many applications such as high-energy physics analysis. However, standard FTP supports the transfer of

complete files or the transfer of the remainder of a file starting at a particular offset. GridFTP introduces new FTP commands to support transfers of subsets of file.

Automatic negotiation of TCP buffer/window sizes.

Using automatic negotiation of TCP buffer/window size is an important in achieving maximum bandwidth with TCP/IP, thus improving the transfer performance, especially in a wide area. GridFTP extends the standard FTP to support both manual settings and automatic negotiation of TCP buffer sizes for large files and for large groups of small files.

Support for reliable and restartable data transfer.

Reliable transfer is important for many applications that manage data. Fault recovery methods are needed for handling such faults as transient network failures and server outages. The FTP standard includes basic features for restarting failed transfers that are not widely implemented. The GridFTP protocol exploits these features, and extends them to cover the new data channel protocol.

Integrated instrumentation.

The protocol calls for restart and performance markers to be sent back. Moreover, there are new functionalities and recent develo

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now

Evolution Of Grid Computing Systems And Its Framework

Scalability

Redundancy

(Reconsider later on too)

Application

Collective

Resource

Connectivity

Fabric

2.4.1.1 Predefined GT4 Services and Other Components

2.4.1.2 Globus Universe

2.4.1.2.1 Execution Management

Name

Purpose

2.4.1.2.2 Data Management

2.4.1.2.3 Interface

2.4.1.2.4 Security

2.4.1.2.5 Monitoring and Discovery

Our Service Portfolio

Want To Place An Order Quickly?

Do not panic, you are at the right place

Get 20% Discount, Now £19 £14/ Per Page14 days delivery time

Get An Instant Quote

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time