Drg For Distributed Databases

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Allam Maalla1, Wang Hu2

School of Management ,Department of Management Information System ,Wuhan University of Technology, Wuhan 430070. Hubei, China

As in DRG, sites are logically organized in a two-dimensional grid. For each site, sets of diagonal sites in the grid are selected. The primary copy of the data is placed on the site while the replicas of the data are distributed on to the diagonal sites. It shows that, the DRG technique considering only the diagonal nodes to have the replicated data. In comparison to the previous techniques, DRG requires a lower communication cost for an operation, while providing higher system availability, which is preferred for large systems., this paper present a new technique called Diagonal Replication on Grid (DRG), to manage data replication in distributed database systems.

Keywords: Read-One-Write-All Protocol (ROWA);Voting Protocol (VT);Tree Quorum Protocol; Communication Costs Analysis.

INTRODUCTION:

a number of articles regarding distributed databases were published. Among them were those by A. Heddaya,[1], B. Bhargava,[2], D. Agrawal [3]Agrawal & Abbadi [5,4], It clearly shows in these published articles, we can derive the fact that data replication management is a current issues which is still unsolved in distributed databases. Distributed databases are becoming an attractive alternative in today’s computing environment because of the distributed nature of many businesses that rely on information to operate[D. Bell[6]].let us explain the meaning of distributed database.Specifically,a distributed database could be defined as a collection of databases that are connected via a network, either local area or wide area,which may involve different database management systems,running on different architectures, that distributes the execution of transactions.

Data Replication

There are two approaches commonly used for replication, namely synchronous and asynchronous. With synchronous replication, all copies of data are kept exactly synchronized and consistent. If any copy is updated, the update will be immediately applied to all other copies within the same transaction. Thus, synchronous replication provides what is called "tight consistency" between data stores. This means the latency between data consistency is zero (0) and all data at the sites or replicas always have the same value, disregarding which sites started replication. However, by using asynchronous replication, copies or replicates of data will become temporarily out of sync with each other. This describes that, if one copy is updated, the change will be propagated and applied to the other copies as a second step, within separate transactions, that may occur over seconds, minutes, hours, or even days later. Copies therefore can be temporarily out of sync, but over time the data should converge to the same values at all sites. In addition, asynchronous replication provides what is called "loose consistency" between data stores.

Read-One-Write-All Protocol (ROWA)

The simplest techniques for managing replicated data is Read-One-Write-All, which converts a logical read to a read on any one of the replicas, and converts a logical write to a write an all the replicas. Thus, when the update transaction commits, all of the replicas have the same value. This protocol works correctly since a transaction processes from one correct state to another correct state. It also shows that, ROWA has the lowest read cost because only one replica is accessed by a read operation.

Voting Protocol (VT)

3

2

4

7

5

6

10

8

9

13

11

12

1

Voting as a technique for managing concurrent data accesses has been proposed by a number of researchers.Voting technique became popular because they are flexible and are easily implemented. The initial voting algorithm was discussed in Thomas, 1979 and an early suggestion to use quorum-based voting for replica control is due to Gifford, 1979. Thomas’s algorithm works on fully replicated databases and assigns an equal vote to each site. For any operation of a transaction to execute, it must collect affirmative votes from a majority of the sites. On the other hand, Gifford’s algorithm works with partially replicated databases (as well as with fully replicated ones) and assigns a vote to each copy of a replicated data item..

5.Tree Quorum Protocol

Agrawal and El- Abbadi proposed a logical tree structure over a network of sites to achieve fault-tolerant distributed mutual exclusion. In Figure 5.1, an example of ternary tree with thirteen copies is illustrated. For read operations, the root or the majority of the children of the root can form a read quorum. If any node in the majority of the children of the root fails, the majority of the children, and so on recursively can replace it. In the best case, a read quorum consists of only root, {1}. As the root fails, a quorum is formed by the majority of the copies at level 1, e.g., {2,3}, {3,4}, or {2,4}. The majority of their children replace nodes 2 and 3, respectively, if no majority at level 1 is accessible and only site 4 is accessible. Such quorum are {4,5,7} or {4,8,9}. The majority at level one (1), e.g., {5,6,9,10} or {8,10,11,13} form a quorum, if copies at level zero (0) and level one (1) fail. For write operations, the size of a write quorum for a given tree is fixed, but the members can be different. Such quorums are {1,2,3,6,7,9,10}, {1,3,4,8,9,11,12}, etc.

Figure 5.1: A tree organization of 13 copies of a data object

6.Communication Costs Analysis

The communication cost is probably the most important factor considered in distributed databases. The communication cost of an operation is directly proportional to the size of the quorum required to execute the operation. Therefore, we symbolize the communication cost in terms of the quorum size. CX,Y denotes the communication cost with X protocol for Y operation, which is R (read) or W (write).

6.1Read-One-Write-All (ROWA)

In Read-One-Write-All (ROWA), a read operation could read any copy of the data item, while write operation needs to writes all copies (n) in the system. Thus, the communication cost of a read operation is:

CROWA,R = 1

and the communication cost of a write operation is :

CROWA,W = n … (1)

6.2 Three Quorum

Let h denotes the height of the tree, D is the degree of the copies in the tree, and M=[(D+1)/2] is the majority of the degree of copies. When the root is accessible, the read quorum size is 1. As the root fail, the majority of its children replace it, thus the quorum size increases to M. Therefore, for a tree of height h, the maximum quorum size is Mh. Hence, the cost of read operation, CTQ,R, ranges from 1 to Mh [3,4] i.e 1ï‚£CTQ,Rï‚£ Mh.

For read operations, the root or the majority of the children of the root can form a read quorum. If any node in the majority of the children of the root fails, the majority of its children, and so on recursively can replace it. In the best case, a read quorum consists of only root, {1}. As the root fails, the majority of the copies at level 1 structure a quorum. Nodes 2 and 3 are replaced by the majority of their children, respectively, if no majorities of the children of the selected majority at level 1, i.e., {5,6,9,10} or {8,10,12,13}. Hence, the size of a read quorum is at most equal to |{5,6,9,10}| or |{8,10,12,13}| = 4 = 22 = Mh.

For write operations, the size of a write quorum for a given tree is fixed, but the members can be different. Thus, the size of the write equal to |{1,2,3,5,7,9,10}| or |{1,3,4,8,9,11,12}| = 7 =  Mi, i = 0 … h.

6.3 Grid Configuration

Let n be the number of copies which are organized as a grid of dimension n x n. Read operations on the replicated data are executed by acquiring a read quorum that consists of a copy from each column in the grid and write operations are executed by acquiring a write quorum that consists of all copies in one column and a copy from each of the remaining columns, Therefore, the communication cost, CGC,R, can be represented as :

CGC,R = n … (2)

And the communication cost, CGC,R , can be represented as :

CGC,W = n + (n – 1) = 2n – 1 (3)

6.4 Operation Availability Model

The study first introduced the k-out-of-n model, which will be used for estimating the operation availability. The availability of a data item, i is define as the probability of a data item being accessible for an operation at any given time, which can be represented using the k-out-of-n model. The assumptions of k-out-of-n model are as follows:

The data item and its copies are in one of the two states: accessible or inaccessible,

The states of the copies are changed independently,

The data item is available for an operation if at least k of its n copies is accessible.

Therefore, the k-out-of-n model can be formulated as:

k-out-of-n = pi (1-p)n-i ,k  1 ….(4)

Performance Analysis

The communication cost of an operation is directly proportional to the size of the quorum required to execute the operation. Therefore, we represent the communication cost in terms of the quorum size. In estimating the availability, all copies are assumed to have the same availability p. CX,Y denotes the communication cost with X technique for Y operation, which is R(read) or W(write).

7.DRG Technique

Let pi denote the availability of site i. Read operations on the replicated data are executed by acquiring a read quorum and write operations are executed by acquiring a write quorum. For simplicity, we choose the read quorum equals to the write quorum. Thus, the communication cost for read and write operations equals to LBa/2, that is,

CDRG,R = CDRG,W = LBa/2. For example, if the primary site has four neighbors, each of which has vote one, then CDRG,R = CDRG,W = 5/2 = 3.

For any assignment B and quorum q for the data file x, define (Ba,q) to be the probability that at least q sites in S(Ba) are available, then

( Ba,q) = Pr{at least q sites in S(Ba) are available }

… (5)

Thus, the availability of read and write operations for the data file a, are (Ba,R) and (Ba,W), respectively. Let Av(Ba, R, W) denote the read/ write availability corresponding to the assignment Ba, read quorum R and write quorum W. If the probability that an arriving operation of read and write for data file a are f and (1-f ), respectively, then

Av(Ba,R,W) = f (Ba,R) + (1-f) (Ba,W). … (6)

Definition Let Av(x) be the availability function with respect to x. Av(x) is in the closed form if 0ï‚£xï‚£1 then 0ï‚£Av(x)ï‚£1.

Theorem The read/write availability under DRG technique is in the closed form.

Proof: For the case of read availability, from equation (3) and by definition 3.1.2, as 0pi1, i=1, 2, …, LBa then 0 (Ba, R)1. Similarly, for the case of write availability where 0 (Ba, W)1 as 0 pi 1.

we will compare the performance on the system availability of the GS technique based on equation (1) and (2), and our DRG technique based on equations (3) and (4) for the case of N = 36 and 64. In estimating the availability of operations, all copies are assumed to have the same availability.

Table 7.1 Comparison of the read availability and write availability under the different set of copies with p = 0.7.

Techniques

Number of Nodes (N)

25

49

81

DRG ( R )

0.837

0.874

0.901

DRG ( W )

0.837

0.874

0.901

GS ( R )

0.988

0.998

1.000

GS ( W )

0.595

0.451

0.310

Table 71., Figure 7.1 show the read and write costs of the two techniques between GS and DRG for different total number of copies. for GS technique, it will decrease the write availability in the system when the number of nodes increased.

Figure 7.1 Comparison of read availability between DRG and GS when p = 0.7

Conclusions

The problem of protocol for maintaining replicated data has been widely studied. The analysis of the DRG technique and the other protocols were presented in terms of system availability and communication costs. It showed that, the DRG technique provides a convenient approach to achieve a high availability for update-frequent operations. This is due to the minimum number of quorum size required. In comparison to the Tree Quorum and Grid Structure techniques, DRG requires significantly lower communication cost for an operation, while providing higher system availability, which is preferred for a large system. Thus, this technique represents an alternative design philosophy to the quorum approach.

We are planning to implement the proposed protocol on various architectures including Data Grid, Peer-to-Peer, Cluster Commuting and Mobile systems.



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now