The Solution Of Large Scale Problems

Published Date: 02 Nov 2017

Chapter 2

Objectives

To review the definition, architecture, types and aims of grid computing.

To describe grid evolution history and generation.

To describe grid technology infrastructure.

To review software evolution concepts.

To review existing resource brokers (schedulers).

To review the present state of migration in grid computing.

To discuss current efforts in grid computing.

2.1 Introduction

A distributed computer system consists of multiple software components that are on multiple computers, but run as a single system. The computers that are in a distributed system can be physically close together and connected by a local network, or they can be geographically distant and connected by a wide area network. A distributed system can consist of any number of possible configurations, such as mainframes, personal computers, workstations, minicomputers, and so on. The goal of distributed computing is to make such a network work as a single computer.

Distributed systems offer many benefits over centralized systems, including the following:

Scalability

The system can easily be expanded by adding more machines as needed.

Redundancy

Several machines can provide the same services, so if one is unavailable, work does not stop. Additionally, because many smaller machines can be used, this redundancy does not need to be prohibitively expensive.

Distributed computing systems can run on hardware that is provided by many vendors, and can use a variety of standards-based software components. Such systems are independent of the underlying software. They can run on various operating systems, and can use various communications protocols.

(Reconsider later on too)

Grid computing can be considered as an extension of distributed computing systems, one in which both the number and heterogeneity of the systems are much greater than usual. It promotes the sharing of distributed resources that may be heterogeneous in nature, and enables scientists and engineers to solve large scale computing problems. With grid computing, computer applications can efficiently utilise computing and data resources and combine them for large capacity workloads. The primary benefit of grid computing is the ability to coordinate and share resources [15, 59]. Many challenges face grid computing; one of them is to find all the required resources for a grid job, resources that may reside in other grid sites. This research proposes the evolution as a solution for this problem. This chapter surveys the architecture, types, characteristics and features of grid computing in Section 2.2. The history of grid evolution and its generation are then outlined in Section 2.3. Grid technology infrastructure is explained in Section 2.4, a review of grid resource brokers is provided in Section 2.6, current efforts in grid computing in Section 2.5, and a survey of migration is finally summarised in Section 2.7.

2.2 Grid Computing

Grid computing means different things to different individuals. The grand vision is often presented as an analogy to power grids where users (or electrical appliances) get access to electricity through wall sockets with no care or consideration for where or how the electricity is actually generated. In this view of grid computing, computing becomes pervasive and individual users (or client applications) gain access to computing resources (processors, storage, data, applications, and so on) as needed with little or no knowledge of where those resources are located or what the underlying technologies, hardware, operating system, and so on are.

Early implementations of grid computing have tended to be internal to a particular company or organization. However, cross-organizational grids are also being implemented and will be an important part of computing and business optimization in the future.

The distinctions between intraorganizational grids and interorganizational grids are not based in technological differences. Instead, they are based on configuration choices given: Security domains, degrees of isolation desired, type of policies and their scope, and contractual obligations between users and providers of the infrastructures. These issues are not fundamentally architectural in nature. It is in the industryâ€™s best interest to ensure that there is not an artificial split of distributed computing paradigms and models across organizational boundaries and internal IT infrastructures.

Grid computing involves an evolving set of open standards for Web services and interfaces that make services, or computing resources, available over the Internet. Very often grid technologies are used on homogeneous clusters, and they can add value on those clusters by assisting, for example, with scheduling or provisioning of the resources in the cluster. The term grid, and its related technologies, applies across this entire spectrum.

If we focus our attention on distributed computing solutions, then we could consider one definition of grid computing to be distributed computing across virtualized resources. The goal is to create the illusion of a simple yet large and powerful virtual computer out of a collection of connected (and possibly heterogeneous) systems sharing various combinations of resources.

Ian Foster [32] gives a simple three-point checklist to define a grid:

1. It coordinates resources that are not subject to centralised control. A grid integrates and coordinates resources and users that exist within different control domains. The grid paradigm enables the coordination and sharing of a large number of geographically dispersed heterogeneous resources.

2. It uses standard, open, general-purpose protocols and interfaces. A grid is built from multi-purpose protocols and interfaces that can be shown in its architecture.

3. It delivers nontrivial qualities of service. A grid allows its ingredient resources to be used in a coordinated style to deliver various qualities of service such as response time and security.

Grid computing has been defined as a collection of nodes connected via a network. These nodes may be identified as computational, storage and network resources. They also share services, computing power and other resources such as disk storage database and software applications. They are under the control of different administrative domains with different levels of security. Grid computing is a way to share computing resources.

2.2.1 Features and Objectives of Grid Computing

Grid computing aims to achieve the following goals [48, 34]:

The sharing of distributed and heterogeneous computing resources belonging to different organisations.

Grid computing is the sharing, selection and aggregation of a group of resources such as supercomputers, mainframes, storage systems, data sources and management systems that behave like networks of computation [55, 3]. It promotes the sharing of distributed resources that may be heterogeneous in nature. The primary benefit of grid computing is the ability to coordinate and share distributed and heterogeneous resources such as Sharing Infrastructure and REsources iN Europe (SIEREN) [46].

SIEREN is a cooperative association between twelve European countries whose purpose is to share their infrastructure and resources.

The exploitation of underutilised resources.

In most organisations and companies there are huge numbers of underutilised computing resources. Most of these resources are only busy less than 5% of the time. In most organisations these resources are relatively idle. Grid computing is designed to exploit these underutilised resources and increase the efficiency of resource usage. Users can also rent the resources that reside on the grid for executing their computationally intensive applications, instead of purchasing their own dedicated (and expensive) resources.

The enablement and simplification of collaboration among different organisations.

Another capability of grid computing is the provision of an environment for collaboration between organisations. Grid computing enables very heterogeneous and distributed systems to work together, thereby simplifying collaboration between different organisations by providing direct access to computers, software and data storage.

The provision of a single login service with secure access to grid resources while protecting security for both users and remote sites.

Grids provide a single login service to all users over all distributed resources using grid authentication mechanisms. They also provide secure access to any information anywhere over any type of network. This is achieved by providing access control mechanisms that govern these resources.

The provision of resource management, information services, monitoring and secure data transportation.

The shared resources and networks involved in grid computing are difficult to manage and monitor, but grid computing is able to meet these challenges because of its architecture and protocols.

The solution of large scale problems.

Grids are designed to exploit under utilised resources, meaning that they can employ a large number of them to solve a large scale problem; it promising solution for problems like storage and processing of massive amounts of data that even mainframe computers cannot handle such as weather forecasting. These resources may be high capability devices such as high capacity disk storage and high performance computing.

The provision of fast results, delivered more efficiently.

Grid computing obtains results quickly and more efficiently, because it enables parallel processing; it may also have high capability devices. With grid computing, businesses can efficiently utilise computing and data resources and combine them for large capacity workloads.

2.2.2 Characteristics of Grid Environments

Grid computing has emerged as a new paradigm for distributed computing. Traditionally distributed computing has been defined as systems with a fixed number of nodes. With the advent of grid computing, the number of nodes in distributed systems is constantly changing, and such system are thus known as dynamic distributed systems. It has the following characteristics [43]:

Heterogeneous resources.

Resources have highly heterogeneous configurations: there are different architectural platforms, different CPU speeds, memory hierarchy layouts and capacities, disk space and software configurations.

Dynamic resource availability.

The absence of tight controls on volunteer computing-based grid environments can cause a resource to withdraw from the computation at any time based on the wishes of its owner. Resources might non-deterministically join or leave the available set of resources which mean that these resources can join and leave the grid at any time. This renders the grid a best-effort computation medium, by analogy to the internet's best-effort communication medium.

Dynamic resource load.

Applications run in a shared environment, where external jobs compete for the same resources. The fluctuations of the machine loads are therefore expected to be high. A machine might be lightly loaded now and heavily loaded the next minute.

2.2.3 Grid Architecture

To manage distributed systems we need different technologies, but traditional distributed technologies do not allow/support an integrated approach to manage wide variety of required resources and services. These technologies do not provide flexibility and control that is needed to enable the type of resource sharing necessary. To manage and support the heterogeneous aspects of the grid there is a need of grid software infrastructure.

In [36, 37, 41], the grid strongly emphasises interoperability, as it is vital to ensure that virtual organisation participants can dynamically share heterogeneous resources. The grid infrastructure is based on a standard open architecture which provides extensibility, interoperability, portability and code sharing. This architecture divided into several layers and organises components accordingly, as shown in Fig 2.1.

Components within each layer share common characteristics, but can build on the capabilities and behaviors of any lower layer.

Application

Collective

Resource

Connectivity

Fabric

Figure 2.1: Layers of Grid Architecture [37]

1. Fabric Layer

The fabric layer comprises the resources in the grid. This resource can be either a logical (such as a distributed file system, computer cluster or distributed computer pool) or a physical resource (such as a computational resource, storage system, catalogue, network resource or sensor). This layer provides the lowest level of access to actual native resources, and implements the low-level mechanisms that allow those resources to be accessed and used. More specifically, those mechanisms must include at least state enquiry and resource management mechanisms, each of which must be implemented for a large variety of local systems.

2. Connectivity layer

The connectivity layer provides the core communication and authentication protocols required for grid-specific network transactions. These protocols provide cryptographically secure mechanisms by which to verify the identified grid users and resources. Many communication protocols in the connectivity layer are drawn from TCP/IP protocols stack such as IP [63], ICMP [62], TCP [64], UDP [61] and DNS [57].

3. Resource Layer

This layer builds on the connectivity layer in order to implement protocols that enable the use and sharing of individual resources such as the Grid Resource Access and Management protocol (GRAMP) used to allocate and monitor resources. Two fundamental components of this layer are information protocols, for querying the state of a resource by calling fabric layer functions to control and access resources, and management protocols, used to negotiate access to a shared resource.

4. Collective Layer

This layer provides protocols such as the Grid Resource Information Protocol (GRIP) [37] for interacting across collections of resources. In other words, it focuses on the coordination of multiple resources. It includes directory, coallocation, scheduling, brokerage, monitoring and diagnostics, data replication, software discovery, community accounting and payment services.

5. Application Layer

This is the final layer in grid architecture, and contains the user applications that operate in a grid environment. It includes languages and frameworks. These frameworks may themselves define protocols such as Simple Work flow Access Protocol (SWAP) [78], services, and/or an Application Program Interface (API).

2.2.4 Grid Types

From the perspective of application, grids [49, 51] have generally been classified into three categories based on the solutions they seek to provide: computational grids that provide access to heterogeneous resources, data grids that provide data services such as storage, management and data access, and service grids that supply services not provided by any single machine, as described below:

2.2.4.1 Computational Grid

A computational grid is a collection of computing resources that may represent computers on multiple networks and locations, heterogeneous platforms and separate administrative domains with several owners. The purpose of a grid is to run very large applications such as weather forecasting. In this type of grid, most machines are high performance servers. The resources are aggregated so as to act as a unified processing resource. The increase in the speed and reliability of networks, in addition to the use of distribution protocols, is an important factor that has led to the use of grids, which enable users to remotely utilise resources owned by different providers. Computational grids fall into the categories of distributed supercomputing and high throughput computing, depending on how they utilise resources. In the past, the grid executes the jobs in parallel on various resources, reducing completion times. Jobs requiring this category of grid are those that present huge problems. By contrast, high throughputs increase job completion rates. Grid computing can solve the most complex as well as the most vital problems that face scientific and industrial organisations and even governments. The possible applications of grid computing range from the financial (analysing the value of investments, scaling businesses and leveraging existing hardware in investments and resources) through product line (reducing designs, numbers of products and operational expenses) and scalability (creating scalable and flexible enterprise IT infrastructure) to the scientific (connecting research teams located in different geographical areas, accelerating research and production processes) [2].

2.2.4.2 Data Grid

In this type of grid, a large amount of data is distributed and/or replicated to remote sites, spread over worldwide. In general, a data grid refers to a system responsible for storing data and providing access to users authorised to share it. In other words, data grids provide an infrastructure for creating new information data repositories such as data warehouses or digital libraries that are distributed across several networks [51]. The aims of data grids overlap with those of heterogeneous distributed database systems, which deal with various kinds of database management systems such as hardware, operating systems and network connections distributed across a heterogeneous environment. Data grid provides infrastructures to support data storage, discovery, handling, publication and manipulation. Enterprise data usually possesses the characteristics of large scale, dynamism, autonomy and distributed data sources. In the academic world, there is a desire to share expensive experimental data in order to coordinate research. In business the need for data sharing is even more urgent. Reasons include the requirement for business data to be in real-time for applications such as e-marketing, where it is very important to maintain an up-to-date and consistent product catalogue. The objective of data grids as presented in [27] is to integrate heterogeneous data archives into a distributed data management "grid" in order to identify services for high performance, distributed, data-intensive computing, and to enable users to bring out relevant information from the distributed databases.

Data grids are compatible with computational grids and can integrate storage and computation. One practical application of a data grid is the EU's DataGrid [71]. This is a project financed by the European Union that relies upon developing computational grid technologies allowing distributed files, databases, computers, scientific tools and devices. The main goal of Data Grid is to build the next generation of computing infrastructure in order to develop and test the technological infrastructure that will enable the sharing of large-scale databases.

2.2.4.3 Service Grid

The Service grid represents and enables a set of services provided by a set of resources. Service grids fall into three categories: on-demand service grids enable real time interaction, collaborative service grids aggregate various resources to provide new services, and multimedia service grids supply the infrastructure for real-time multimedia applications.

This research is considering and concerning with both computational grid and data grid.

2.3 History of Grid Evolution

This section outlines the history of grid evolution by describing the stages that grids passed through on as they evolved into grid computing.

2.3.2 Stages of Development

As per the different topology perspective, the grid has been classified into clusters, intra-grids and extra-grids [37]. These classifications denote the stages of development of grids that are born as clusters, evolve into intra-grids and finally become extra-grids, as shown in Fig. 2.2.

2.3.2.1 Cluster

Clusters are critical to the evolution of a grid; they are the smallest grids in size and scope. They are aggregations of servers in order to provide increased computing power, as compared to stand-alone computers. Cluster computing is built on unit processors and commodity operation systems. Clusters are designed to solve problems for particular groups of people within the same department. They are implemented within campus intranets by incorporating PC's, data and servers to maximise the use of computing resources and to increase user job throughput. Cluster grids can therefore operate within a heterogeneous environment consisting of mixed server types, operating systems and workloads. Resources can be accessed at a particular known point in the grid, which has only one job queue [37].

2.3.2.2 Intragrid

A typical intragrid topology, as illustrated in Figure 1-5, exists within a single organization, providing a basic set of Grid services. The single organization could be made up of a number of computers that share a common security domain, and share data internally on a private network. The primary characteristics of an intragrid are a single security provider, bandwidth on the private network is high and always available, and there is a single environment within a single network. Within an intragrid, it is easier to design and operate computational and data grids. An intragrid provides a relatively static set of computing resources and the ability to easily share data between grid systems. The business might think as an intragrid appropriate if the business has an initiative to gain economies of scale on internal job management, or wants to start exploring the use of a grid internally first by enabling vertical enterprise applications.

Figure 1-5 : An Intragrid

2.3.2.3 Extragrid

Based on a single organization, the extragrid expands on the concept by bringing together two or more intragrids. An extragrid, as illustrated in Figure 1-5, typically involves more than one security provider, and the level of management complexity increases. The primary characteristics of an extragrid are dispersed security, multiple organizations, and remote/WAN connectivity.

Within an extragrid, the resources become more dynamic and your grid needs to be more reactive to failed resources and failed components. The design becomes more complicated and information services become relevant to ensure that grid resources have access to workload management at run time.

Figure 1-5 : An Extragrid

2.3.2.4 Inter-grid

An intergrid requires the dynamic integration of applications, resources, and services with patterns, customers, and any other authorized organizations that will obtain access to the grid via the internet/WAN. An intergrid topology, is primarily used by engineering firms, life science industries, manufacturers, and by businesses in the financial industry.

The primary characteristics of an intergrid include dispersed security, multiple organizations, and remote/WAN connectivity. The data in an intergrid is global public data, and applications (both vertical and horizontal) must be modified for a global audience. A business may deem an intergrid necessary if there is a need for peer-to-peer computing, a collaborative computing community, or simplified end-to-end processes with the organizations that will use the intergrid.

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now