The Reliable Cloud Storage Systems

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

According to National Institute of Standards and Technology (NIST) Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage,

applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction [1]. Cloud computing users do not own the physical infrastructure; rather they rent service from a cloud provider. They consume resources and pay only for what they use. Cloud storage systems are housed in facilities called datacenters. Client can communicate with one of the servers. But there can be several datacenters on which data can be stored. Since users don’t have to care about the difficulties of direct hardware management, moving files into the cloud offers great convenience to them. A malicious cloud provider or any other outsider can modify the client’s data and hence, the integrity of the data will be lost. Maintaining the correctness of user files is not so easy because the data files tend to be updated frequently by the user.

The coolest approach to provide integrity for data is to replicate it. But evidently, replication uses a lot of storage. The storage cost, or overhead will be high. Erasure coding schemes are the other key solution to address this problem. In erasure coding, a file F consisting of k blocks can be encoded using an (n; k) erasure code to create n coded blocks out of the original k file blocks, and keeps them at n servers on the basis of one encoded block per server. Thus, the original file can be regenerated from any k out of the n servers. When the client discovers modification of one of the encoded blocks, it can make use of the remaining healthy blocks to reconstruct the corrupted block. The desirable properties of an erasure coding methods used in a cloud storage system should possess the following characteristics:

Reconstruction of data from minimum number of chunks

Less storage overhead

Local Reconstruction Codes (LRC) provides the above properties. In addition, we can analyze the Reed Solomon encoding which is a widely accepted erasure code along with Local Reconstruction Codes.

RELATED WORKS

Cloud computing make it possible to store large amounts of data. But data integrity is the main problem with cloud storage. Juels et al. [2] described a proper "proof of retrievability" (POR) model for ensuring the remote data correctness. Their scheme ensures both ownership and retrievability of files on archive service systems. Shacham et al. [3] built a random linear function based homomorphic authenticator which enables infinite number of queries and requires not as much of communication overhead. Bowers et al. [4] proposed an better framework for POR protocols that simplifies both Juels and Shacham’s effort. Later in their succeeding work, Bowers et al. [7] stretched POR model to distributed systems as a High Availability and Integrity Layer (HAIL). Still, all these systems are concentrating on static data. Any alterations to the contents of the file, even few bits, must disseminate through the error correcting code, consequently introducing substantial computation and communication complexity.

A "provable data possession" (PDP) was proposed by Ateniese et al. [5] which assures possession of file on untrusted storages based on public key cryptography, therefore ensuring public verifiability. But, when compared with others this approach is suffering from sufficient computation overhead and can be expensive. Improved PDP scheme is defined by Ateniese et al. [6] based on symmetric key encryption for which overhead is lesser than preceding scheme and permits future modifications. But, this approach concentrates only on single server. Curtmola et al. [9] suggested a solution to this problem by allowing data possession of multiple replicas. Shah et al. [10] kept online storage honest by first scrambling the data then transferring a number of pre-computed symmetric keyed hashes over the encrypted data to the arbitrator. But, their scheme only works for encrypted files.

LOCAL RECONSTRUCTION CODES

If encoding and decoding takes a lot of computational resources then it annoys other operations like encryption, compression and so on. Because of the generality of the Reed-Solomon codes they are widely used in distributed storage systems. Reed-Solomon coding is designed for deep space communication, because mistakes occur frequently in this type of communication. However, the data centers act in a different way from that of deep space communication. A well-designed and supervised server has very less failure rate which in turn entails that furthermost data files in the data centers are strong, with no failures. Even though some errors are present they will be very less in amount and will last for a short duration. Also error will be repaired as fast as possible and will be brought them to healthy state.

The new coding approach enables data to be reconstructed more rapidly than with Reed-Solomon codes. This is because lesser data chunks must be accessed to regenerate the original data. Only half the chunks are necessary. Moreover, these codes are mathematically simpler than former procedures. The local in the coding technique's name denotes, in the event of a failure, the code required to recreate data is not spread across all the servers.

Data durability is outstanding with these type of codes such that a data fragment can be rebuilt with complete accuracy. The reduction in communication overhead helps to decrease the storage cost and it has faster reconstruction costs than previously available codes.

Erasure-correcting code may be used to survive numerous failures in distributed storage systems. Message to be communicated is divided into several blocks. An (m,k) Reed-Solomon erasure-correcting code is used to produce k redundancy parity vectors from m data vectors in such a way that the original m data vectors can be recreated from any m out of the m + k data and parity vectors. Place each of the m + k vectors on different servers, the original data file can tolerate the failure of any k of the m + k servers without any damage, with a storage overhead of k=m [8].

Fig. 1 Reed Solomon Codes Definitions

As an example [13] consider a (6, 3) Reed-Solomon code with 6 data chunks and 3 parity chunks, such that each parity is calculated from all the 6 data chunks. During failure all the 6 chunks are required for the recreating data. Consider reconstruction cost is same as the number of chunks required to rebuild an inaccessible data fragment. Therefore here it is equals to 6. The aim of LRC is to decrease the reconstruction cost. It attains this by calculating some of the parities from a subset of the data chunks.

With the same example LRC creates 4 parities. Global parities p0 and p1 are computed from all the data chunks. LRC splits the data chunks into two equal size clusters and computes a local parity for each cluster.

Fig. 2 Local Reconstruction Code Example

When we analyze this type of codes we can realize that the reconstruction of any single data fragment needs only 3 chunks, half the number needed by the Reed-Solomon code. LRC increases parity by one than Reed-Solomon.

A (k,l,r) LRC divides k data chunks into l groups, with k/l data chunks in each group. It computes one local parity within each group. Also, it computes r global parities from all the data chunks. Total number chunks (data + parity) is denoted by n where n = k + l + r. Therefore, the normalized storage overhead is n/k = 1 + (l + r)/k. So a (6, 2, 2) LRC has storage cost of 1 + 4/6 = 1.67x, as depicted in Figure 3 [13].

Fig. 3 Decoding Failures in LRC

We must select the coding equations such that LRC can attain the Maximally Recoverable (MR) property [14], which means it can interpret any failure which is information-theoretically decodable. Given the (6, 2, 2) LRC, with 4 parity chunks and can survive up to 4 failures. But, it is not Maximum Distance Separable and thus cannot tolerate arbitrary 4 failures. For example, consider x1, x2, x3 and px are failed. This is non-decodable because x cannot be reconstructed from any of the global or local parities. Failure patterns that are likely to recreate are so called information-theoretically decodable.

In common, the vital properties of a (k,l,r) LRC are: i) single data chunk damage can be interpreted from k/l chunks; ii) arbitrary failures up to r + 1 can be decoded. Based on the subsequent theorem, these properties enforce a lower bound on the number of parities [12].

Theorem 1. For any (n,k) linear code (with k data sym-bols and n − k parity symbols) to have the property:

Arbitrary r + 1 symbol failures can be decoded;

Single data symbol failure can be recovered from symbols,

The following condition is necessary:

n − k ≥ l + r.

Since the number of parities of LRC meets the lower bound exactly, LRC accomplishes its properties with the minimal number of parities.

Before uploading the file into the provider’s site owner of the file performs LRC encoding and outcome of this procedure will be a collection of data chunks with local and global parities. After this stage cloud user calculates verification tokens on separate chunks. Blind the parity information before transferring it to arbitrator’s site. Cloud user then scatters all encoded chunks to the datacenters. In future user can challenge the cloud server and data correctness is verified by the auditor. Intervention of auditor lessens the computation overhead of the user. This approach also promotes future modifications to the file in an efficient and well-organized manner [11].

CONCLUSIONS

Erasure coding is important to lessen the expense of cloud storage. Most important factor in these codes is fast reconstruction of offline data chunks. A (k,l,r) LRC divides k data chunks into l local groups and encodes l local parities, one for each local group, and r global parities. Single data chunk failure can be decoded from k/l chunks within its local cluster. Additionally, LRC reaches Maximally Recoverable property. It accepts up to r + 1 arbitrary fragment failures. It also tolerates failures more than r +1 (up to l +r), as long as those are information-theoretically decodable. As a final point, LRC provides low storage overhead. For the reconstruction, LRC needs the minimum number of parities. Local Reconstruction Codes as a way to decrease the number of chunks that need to be read from to perform this reconstruction and has similar latency for small I/Os and improved latency for large I/Os.



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now