Toward Secure And Dependable

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Storage Services in Cloud Computing

Cong Wang,Student Member, IEEE, Qian Wang, Student Member, IEEE,

Kui Ren,Senior Member, IEEE, Ning Cao, and Wenjing Lou, Senior Member, IEEE

Abstract—Cloud storage enables users to remotely store their data and enjoy the on-demand high quality cloud applications without

the burden of local hardware and software management. Though the benefits are clear, such a service is also relinquishing users’

physical possession of their outsourced data, which inevitably poses new security risks toward the correctness of the data in cloud. In

order to address this new problem and further achieve a secure and dependable cloud storage service, we propose in this paper a

flexible distributed storage integrity auditing mechanism, utilizing the homomorphic token and distributed erasure-coded data. The

proposed design allows users to audit the cloud storage with very lightweight communication and computation cost. The auditing result

not only ensures strong cloud storage correctness guarantee, but also simultaneously achieves fast data error localization, i.e., the

identification of misbehaving server. Considering the cloud data are dynamic in nature, the proposed design further supports secure

and efficient dynamic operations on outsourced data, including block modification, deletion, and append. Analysis shows the proposed

scheme is highly efficient and resilient against Byzantine failure, malicious data modification attack, and even server colluding attacks.

Index Terms—Data integrity, dependable distributed storage, error localization, data dynamics, cloud computing.

Ç

1INTRODUCTION

S

EVERALtrends are opening up the era of cloud comput-ing, which is an Internet-based development and use of

computer technology. The ever cheaper and more powerful

processors, together with the Software as a Service (SaaS)

computing architecture, are transforming data centers into

pools of computing service on a huge scale. The increasing

network bandwidth and reliable yet flexible network

connections make it even possible that users can now

subscribe high quality services from data and software that

reside solely on remote data centers.

Moving data into the cloud offers great convenience to

users since they don’t have to care about the complexities

of direct hardware management. The pioneer of cloud

computing vendors, Amazon Simple Storage Service (S3),

and Amazon Elastic Compute Cloud (EC2) [2] are both

well-known examples. While these internet-based online

services do provide huge amounts of storage space

and customizable computing resources, this computing

platform shift, however, is eliminating the responsibility of

local machines for data maintenance at the same time. As a

result, users are at the mercy of their cloud service

providers (CSP) for the availability and integrity of their

data [3], [4]. On the one hand, although the cloud

infrastructures are much more powerful and reliable than

personal computing devices, broad range of both internal

and external threats for data integrity still exist. Examples

of outages and data loss incidents of noteworthy cloud

storage services appear from time to time [5], [6], [7], [8],

[9]. On the other hand, since users may not retain a local

copy of outsourced data, there exist various incentives for

CSP to behave unfaithfully toward the cloud users

regarding the status of their outsourced data. For example,

to increase the profit margin by reducing cost, it is possible

for CSP to discard rarely accessed data without being

detected in a timely fashion [10]. Similarly, CSP may even

attempt to hide data loss incidents so as to maintain a

reputation [11], [12], [13]. Therefore, although outsourcing

data into the cloud is economically attractive for the cost

and complexity of long-term large-scale data storage, its

lacking of offering strong assurance of data integrity and

availability may impede its wide adoption by both

enterprise and individual cloud users.

In order to achieve the assurances of cloud data integrity

and availability and enforce the quality of cloud storage

service, efficient methods that enable on-demand data

correctness verification on behalf of cloud users have to

be designed. However, the fact that users no longer have

physical possession of data in the cloud prohibits the direct

adoption of traditional cryptographic primitives for the

purpose of data integrity protection. Hence, the verification

of cloud storage correctness must be conducted without

explicit knowledge of the whole data files [10], [11], [12],

[13]. Meanwhile, cloud storage is not just a third party data

warehouse. The data stored in the cloud may not only be

220 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 5, NO. 2, APRIL-JUNE 2012

. C. Wang is with the Department of Electrical and Computer Engineering,

Illinois Institute of Technology, 1451 East 55th St., Apt. 1017 N, Chicago,

IL 60616. E-mail: [email protected].

. Q. Wang is with the Department of Electrical and Computer Engineering,

Illinois Institute of Technology, 500 East 33rd St., Apt. 602, Chicago, IL

60616. E-mail: [email protected].

. K. Ren is with the Department of Electrical and Computer Engineering,

Illinois Institute of Technology, 3301 Dearborn St., Siegel Hall 319,

Chicago, IL 60616. E-mail: [email protected].

. N. Cao is with the Department of Electrical and Computer Engineering,

Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA

01609. E-mail: [email protected].

. W. Lou is with the Department of Computer Science, Virginia Polytechnic

Institute and State University, Falls Church, VA 22043.

E-mail: [email protected].

Manuscript received 4 Apr. 2010; revised 14 Sept. 2010; accepted 25 Dec.

2010; published online 6 May 2011.

For information on obtaining reprints of this article, please send e-mail to:

[email protected] and reference IEEECS Log Number TSCSI-2010-04-0033.

Digital Object Identifier no. 10.1109/TSC.2011.24.

1939-1374/12/$31.002012 IEEE Published by the IEEE Computer Society

accessed but also be frequently updated by the users [14],

[15], [16], including insertion, deletion, modification, ap-pending, etc. Thus, it is also imperative to support the

integration of this dynamic feature into the cloud storage

correctness assurance, which makes the system design even

more challenging. Last but not the least, the deployment of

cloud computing is powered by data centers running in a

simultaneous, cooperated, and distributed manner [3]. It is

more advantages for individual users to store their data

redundantly across multiple physical servers so as to

reduce the data integrity and availability threats. Thus,

distributed protocols for storage correctness assurance will

be of most importance in achieving robust and secure cloud

storage systems. However, such important area remains to

be fully explored in the literature.

Recently, the importance of ensuring the remote data

integrity has been highlighted by the following research

works under different system and security models [10], [11],

[12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22]. These

techniques, while can be useful to ensure the storage

correctness without having users possessing local data, are

all focusing on single server scenario. They may be useful for

quality-of-service testing [23], but does not guarantee the

data availability in case of server failures. Although direct

applying these techniques to distributed storage (multiple

servers) could be straightforward, the resulted storage

verification overhead would be linear to the number of

servers. As an complementary approach, researchers have

also proposed distributed protocols [23], [24], [25] for

ensuring storage correctness across multiple servers or peers.

However, while providing efficient cross server storage

verification and data availability insurance, these schemes

are all focusing on static or archival data. As a result, their

capabilities of handling dynamic data remains unclear,

which inevitably limits their full applicability in cloud

storage scenarios.

In this paper, we propose an effective and flexible

distributed storage verification scheme with explicit dy-namic data support to ensure the correctness and avail-ability of users’ data in the cloud. We rely on erasure-correcting code in the file distribution preparation to

provide redundancies and guarantee the data dependability

against Byzantine servers [26], where a storage server may

fail in arbitrary ways. This construction drastically reduces

the communication and storage overhead as compared to

the traditional replication-based file distribution techniques.

By utilizing the homomorphic token with distributed

verification of erasure-coded data, our scheme achieves

the storage correctness insurance as well as data error

localization: whenever data corruption has been detected

during the storage correctness verification, our scheme can

almost guarantee the simultaneous localization of data

errors, i.e., the identification of the misbehaving server(s). In

order to strike a good balance between error resilience and

data dynamics, we further explore the algebraic property of

our token computation and erasure-coded data, and

demonstrate how to efficiently support dynamic operation

on data blocks, while maintaining the same level of storage

correctness assurance. In order to save the time, computa-tion resources, and even the related online burden of users,

we also provide the extension of the proposed main scheme

to support third-party auditing, where users can safely

delegate the integrity checking tasks to third-party auditors

(TPA) and be worry-free to use the cloud storage services.

Our work is among the first few ones in this field to

consider distributed data storage security in cloud comput-ing. Our contribution can be summarized as the following

three aspects: 1) Compared to many of its predecessors,

which only provide binary results about the storage status

across the distributed servers, the proposed scheme

achieves the integration of storage correctness insurance

and data error localization, i.e., the identification of

misbehaving server(s). 2) Unlike most prior works for

ensuring remote data integrity, the new scheme further

supports secure and efficient dynamic operations on data

blocks, including: update, delete, and append. 3) The

experiment results demonstrate the proposed scheme is

highly efficient. Extensive security analysis shows our

scheme is resilient against Byzantine failure, malicious data

modification attack, and even server colluding attacks.

The rest of the paper is organized as follows: Section 2

introduces the system model, adversary model, our design

goal, and notations. Then we provide the detailed

description of our scheme in Sections 3 and 4. Section 5

gives the security analysis and performance evaluations,

followed by Section 6 which overviews the related work.

Finally, Section 7 concludes the whole paper.

2PROBLEMSTATEMENT

2.1 System Model

A representative network architecture for cloud storage

service architecture is illustrated in Fig. 1. Three different

network entities can be identified as follows:

. User: an entity, who has data to be stored in the

cloud and relies on the cloud for data storage and

computation, can be either enterprise or individual

customers.

. Cloud Server (CS): an entity, which is managed by

cloud service provider (CSP) to provide data storage

service and has significant storage space and

computation resources (we will not differentiate CS

and CSP hereafter).

. Third-Party Auditor: an optional TPA, who has

expertise and capabilities that users may not have, is

trusted to assess and expose risk of cloud storage

services on behalf of the users upon request.

In cloud data storage, a user stores his data through a

CSP into a set of cloud servers, which are running in a

WANG ET AL.: TOWARD SECURE AND DEPENDABLE STORAGE SERVICES IN CLOUD COMPUTING 221

Fig. 1. Cloud storage service architecture.

simultaneous, cooperated, and distributed manner. Data

redundancy can be employed with a technique of erasure-correcting code to further tolerate faults or server crash as

user’s data grow in size and importance. Thereafter, for

application purposes, the user interacts with the cloud

servers via CSP to access or retrieve his data. In some cases,

the user may need to perform block level operations on his

data. The most general forms of these operations we are

considering are block update, delete, insert, and append.

Note that in this paper, we put more focus on the support of

file-oriented cloud applications other than nonfile applica-tion data, such as social networking data. In other words,

the cloud data we are considering is not expected to be

rapidly changing in a relative short period.

As users no longer possess their data locally, it is of

critical importance to ensure users that their data are being

correctly stored and maintained. That is, users should be

equipped with security means so that they can make

continuous correctness assurance (to enforce cloud storage

service-level agreement) of their stored data even without

the existence of local copies. In case that users do not

e can explicitly and

efficiently handle dynamic data operations for cloud data

storage, by utilizing the linear property of Reed-Solomon

code and verification token construction.

4.1 Update Operation

In cloud data storage, a user may need to modify some data

preparation

of update information, the user has to amend those unused

tokens for each vector GðjÞ

to maintain the same storage

correctness assurance. In other words, for all the unused

tokens, the user needs to exclude every occurrence of the old

data block and replace it with the new one. Thanks to the

homomorphic construction of our verification token, the user

can perform the token update efficiently. To give more

details, suppose a blockGðjÞ

½Is, which is covered by the

specific tokenv

ðjÞ

i

, has been changed to GðjÞ

½IsþGðjÞ

½Is,

whereIs¼

k

ðiÞ

prp

ðsÞ. To maintain the usability of tokenv

ðjÞ

i

,itis

not hard to verify that the user can simply update it byv

ðjÞ

i

v

ðjÞ

i þ

s

i

GðjÞ

½Is, without retrieving other r1blocks

required in the precomputation ofv

ðjÞ

i

.

After the amendment to the affected tokens,

2

the user

needs to blind the update informationg

ðjÞ

i for each parity

block inðGðmþ1Þ

;...;GðnÞ

Þto hide the secret matrixPby

g

ðjÞ

i g

ðjÞ

i þfkj

ðs

ver

ij

Þ;i2f1;...;lg. Here, we use a new

seeds

ver

ij

for the PRF. The version numberverfunctions like

a counter which helps the user to keep track of the blind

information on the specific parity blocks. After blinding, the

user sends update information to the cloud servers, which

perform the update operation as GðjÞ GðjÞ þGðjÞ

;

ðj2f1;...;ngÞ.

Discussion.Note that by using the new seeds

ver

ij

for the

PRF functions every time (for a block update operation), we

can ensure the freshness of the random value embedded into

parity blocks. In other words, the cloud servers cannot

simply abstract away the random blinding information on

parity blocks by subtracting the old and newly updated

parity blocks. As a result, the secret matrixPis still being well

protected, and the guarantee of storage correctness remains.

4.2 Delete Operation

Sometimes, after being stored in the cloud, certain data

blocks may need to be deleted. The delete operation we are

considering is a general one, in which user replaces the

data block with zero or some special reserved data symbol.

From this point of view, the delete operation is actually a

special case of the data update operation, where the

original data blocks can be replaced with zeros or some

predetermined special blocks. Therefore, we can rely on the

226 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 5, NO. 2, APRIL-JUNE 2012

2. In practice, it is possible that only a fraction of tokens need

amendment, since the updated blocks may not be covered by all the tokens.

Fig. 2. Logical representation of data dynamics, including block update, append, and delete.

update procedure to support delete operation, i.e., by

settingfij in Fto befij. Also, all the affected tokens

have to be modified and the updated parity information

has to be blinded using the same method specified in an

update operation.

4.3 Append Operation

In some cases, the user may want to increase the size of his

stored data by adding blocks at the end of the data file,

which we refer as data append. We anticipate that the most

frequent append operation in cloud data storage is bulk

append, in which the user needs to upload a large number

of blocks (not a single block) at one time.

Given the file matrixFillustrated in file distribution

preparation, appending blocks toward the end of a data file

is equivalent to concatenate corresponding rows at the

bottom of the matrix layout for fileF(See Fig. 2). In the

beginning, there are only l rows in the file matrix. To

simplify the presentation, we suppose the user wants to

append mblocks at the end of file F, denoted as

ðflþ1;1;flþ1;2;...;flþ1;mÞ (We can always use zero-padding

to make a row of melements). With the secret matrixP,

the user can directly calculate the append blocks for each

parity server asðflþ1;1;...;flþ1;m

ÞP¼ðg

ðmþ1Þ

lþ1

;...;g

ðnÞ

lþ1

Þ.

To ensure the newly appended blocks are covered by

our challenge tokens, we need a slight modification to our

token precomputation. Specifically, we require the user to

expect the maximum size in blocks, denoted aslmax, for

each of his data vector. This idea of supporting block

append was first suggested by Ateniese et al. [14] in a

single server setting, and it relies on both the initial budget

for the maximum anticipated data size lmax in each

encoded data vector and the system parameter rmax¼

drðlmax=lÞe for each precomputed challenge-response

token. The precomputation of the ith token on server jis

modified as follows:vi ¼

Prmax

q¼1

q

i

GðjÞ

½Iq

, where

G

ðjÞ

½Iq¼

GðjÞ

½

k

ðiÞ

prp

ðqÞ; ½

k

ðiÞ

prp

ðqÞ l;

0; ½

k

ðiÞ

prp

ðqÞ>l;

(

and the PRP ðÞ is modified as: ðÞ: f0;1g

log

2

ðlmaxÞ

key!f0;1g

log

2

ðlmaxÞ

. This formula guarantees that on average,

there will berindices falling into the range of existinglblocks.

Because the cloud servers and the user have the agreement on

the number of existing blocks in each vectorGðjÞ

, servers will

follow exactly the above procedure when recomputing the

token values upon receiving user’s challenge request.

Now when the user is ready to append new blocks, i.e.,

both the file blocks and the corresponding parity blocks are

generated, the total length of each vector GðjÞ

will be

increased and fall into the range ½l; lmax. Therefore, the

user will update those affected tokens by adding

s

i

GðjÞ

½Isto the old vi wheneverGðjÞ

½Is6¼0for Is >l,

whereIs ¼

k

ðiÞ

prp

ðsÞ. The parity blinding is similar as

introduced in update operation, and thus is omitted here.

4.4 Insert Operation

An insert operation to the data file refers to an append

operation at the desired index position while maintaining

the same data block structure for the whole data file, i.e.,

inserting a block F½jcorresponds to shifting all blocks

starting with index jþ1 by one slot. Thus, an insert

operation may affect many rows in the logical data file

matrixF, and a substantial number of computations are

required to renumber all the subsequent blocks as well as

recompute the challenge-response tokens. Hence, a direct

insert operation is difficult to support.

In order to fully support block insertion operation,

recent work [15], [16] suggests utilizing additional data

structure (for example, Merkle Hash Tree [32]) to maintain

and enforce the block index information. Following this line

of research, we can circumvent the dilemma of our block

insertion by viewing each insertion as a logical append

operation at the end of fileF. Specifically, if we also use

additional data structure to maintain such logical to

physical block index mapping information, then all block

insertion can be treated via logical append operation,

which can be efficiently supported. On the other hand,

using the block index mapping information, the user can

still access or retrieve the file as it is. Note that as a tradeoff,

the extra data structure information has to be maintained

locally on the user side.

5SECURITYANALYSIS ANDPERFORMANCE

EVALUATION

In this section, we analyze our proposed scheme in terms of

correctness, security, and efficiency. Our security analysis

focuses on the adversary model defined in Section 2. We

also evaluate the efficiency of our scheme via implementa-tion of both file distribution preparation and verification

token precomputation.

5.1 Correctness Analysis

First, we analyze the correctness of the verification

procedure. Upon obtaining all the response R

ðjÞ

i

s from

servers and taking away the random blind values from

R

ðjÞ

i

ðj2fmþ1;...;ngÞ, the user relies on the equation

ðR

ð1Þ

i

;...;R

ðmÞ

i

ÞP¼

?

ðR

ðmþ1Þ

i

;...;R

ðnÞ

i

Þto ensure the storage

correctness. To see why this is true, we can rewrite the

equation according to the token computation:

Xr

q¼1

q

i

g

ð1Þ

Iq

;...;

Xr

q¼1

q

i

g

ðmÞ

Iq

!

P

¼

Xr

q¼1

q

i

g

ðmþ1Þ

Iq

;.. .;

Xr

q¼1

q

i

g

ðnÞ

Iq

!

;

and, hence, the left-hand side (LHS) of the equation

expands as

LHS¼

i;

2

i

;...;

r

i

g

ð1Þ

I1

g

ð2Þ

I1

... g

ðmÞ

I1

g

ð1Þ

I2

g

ð2Þ

I2

... g

ðmÞ

I2

. . .

.

.

.

g

ð1Þ

Ir

g

ð2Þ

Ir

... g

ðmÞ

Ir

0

B

@

1

C

A

P

¼

i;

2

i

;...;

r

i

g

ðmþ1Þ

I1

g

ðmþ2Þ

I1

... g

ðnÞ

I1

g

ðmþ1Þ

I2

g

ðmþ2Þ

I2

... g

ðnÞ

I2

. . .

.

.

.

g

ðmþ1Þ

Ir

g

ðmþ2Þ

Ir

... g

ðnÞ

Ir

0

B

@

1

C

A

;

WANG ET AL.: TOWARD SECURE AND DEPENDABLE STORAGE SERVICES IN CLOUD COMPUTING 227

which equals the right hand side as required. Thus, it is

clear to show that as long as each server operates on the

same specified subset of rows, the above checking equation

will always hold.

5.2 Security Strength

5.2.1 Detection Probability against Data Modification

In our scheme, servers are required to operate only on

specified rows in each challenge-response protocol execu-tion. We will show that this "sampling" strategy on selected

rows instead of all can greatly reduce the computational

overhead on the server, while maintaining high detection

probability for data corruption.

Supposenc servers are misbehaving due to the possible

compromise or Byzantine failure. In the following analysis,

we do not limit the value ofnc

, i.e., nc n. Thus, all the

analysis results hold even if all the servers are compromised.

We will leave the explanation on collusion resistance of our

scheme against this worst case scenario in a later section.

Assume the adversary modifies the data blocks inzrows out

of thelrows in the encoded file matrix. Letrbe the number

of different rows for which the user asks for checking in a

challenge. Let Xbe a discrete random variable that is

defined to be the number of rows chosen by the user that

matches the rows modified by the adversary. We first

analyze the matching probability that at least one of the rows

picked by the user matches one of the rows modified by the

adversary:P

r

m¼1PfX¼0g¼1

Qr1

i¼0

ð1minf

z

li

;1gÞ

1ðlz

l

Þ

r

.Ifnoneofthespecifiedr rows in the ith

verification process are deleted or modified, the adversary

avoids the detection.

Next, we study the probability of a false negative result

that there exists at least one invalid response calculated

from those specified r rows, but the checking equation

still holds. Consider the responses R

ð1Þ

i

;...;R

ðnÞ

i returned

from the data storage servers for the ith challenge, each

response valueR

ðjÞ

i , calculated within GFð2

p

Þ, is based on r

blocks on serverj. The number of responsesRðmþ1Þ

;...;R

ðnÞ

from parity servers is k¼nm. Thus, according to

proposition 2 of our previous work in [33], the false

negative probability isP

r

f¼Pr1þPr2, where Pr1¼

ð1þ2

p

Þ

nc1

2

nc1

andPr2¼ð1Pr1Þð2

p

Þ

k

.

Based on above discussion, it follows that the probability

of data modification detection across all storage servers is

Pd¼P

r

mð1P

r

f

Þ. Fig. 3 plots Pd for different values of

l; r; z while we setp¼16, nc ¼10, and k¼5.

3

From the

figure we can see that if more than a fraction of the data file

is corrupted, then it suffices to challenge for a small

constant number of rows in order to achieve detection with

high probability. For example, ifz¼1%ofl, every token

only needs to cover 460 indices in order to achieve the

detection probability of at least 99 percent.

5.2.2 Identification Probability for Misbehaving Servers

We have shown that, if the adversary modifies the data

blocks among any of the data storage servers, our sampling

checking scheme can successfully detect the attack with

high probability. As long as the data modification is

caught, the user will further determine which server is

malfunctioning. This can be achieved by comparing the

response valuesR

ðjÞ

i

with the prestored tokensv

ðjÞ

i

, where

j2f1;...;ng. The probability for error localization or

identifying misbehaving server(s) can be computed in a

similar way. It is the product of the matching probability

for sampling check and the probability of complementary

event for the false negative result. Obviously, the matching

probability isb P

r

m¼1

Qr1

i¼0

ð1minf

^ z

li

;1gÞ, where ^ zz.

Next, we consider the false negative probability that

R

ðjÞ

i ¼v

ðjÞ

i

when at least one of^ z blocks is modified.

According to [33, proposition 1], tokens calculated in

GFð2

p

Þ for two different data vectors collide with prob-ability b P

r

f ¼2

p

. Thus, the identification probability for

misbehaving server(s) isb Pd¼b P

r

mð1b P

r

f

Þ. Along with the

analysis in detection probability, ifz¼1%ofl and each

228 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 5, NO. 2, APRIL-JUNE 2012

3. Note that nc and k only affect the false negative probability P

r

f

.

However in our scheme, sincep¼16almost dominates the negligibility of

P

r

f

, the value of nc andkhave little effect in the plot ofPd.

Fig. 3. The detection probabilityPdagainst data modification. We showPdas a function ofl(the number of blocks on each cloud storage server) and

r(the number of rows queried by the user, shown as a percentage ofl) for two values ofz(the number of rows modified by the adversary). Both

graphs are plotted underp¼16, nc ¼10, and k¼5, but with different scale. (a) z¼1%ofl. (b) z¼10%ofl.

token covers 460 indices, the identification probability for

misbehaving servers is at least 99 percent. Note that if the

number of detected misbehaving servers is less than the

parity vectors, we can use erasure-correcting code to

recover the corrupted data, achieving storage dependability

as shown at Section 3.4 and Algorithm 3.

5.2.3 Security Strength against Worst Case Scenario

We now explain why it is a must to blind the parity blocks

and how our proposed schemes achieve collusion resistance

against the worst case scenario in the adversary model.

Recall that in the file distribution preparation, the

redundancy parity vectors are calculated via multiplying

the file matrix Fby P, where Pis the secret parity

generation matrix we later rely on for storage correctness

assurance. If we disperse all the generated vectors directly

after token precomputation, i.e., without blinding, mal-icious servers that collaborate can reconstruct the secret P

matrix easily: they can pick blocks from the same rows

among the data and parity vectors to establish a set ofmk

linear equations and solve for the mkentries of the parity

generation matrixP. Once they have the knowledge of P,

those malicious servers can consequently modify any part

of the data blocks and calculate the corresponding parity

blocks, and vice versa, making their codeword relationship

always consistent. Therefore, our storage correctness

challenge scheme would be undermined—even if those

modified blocks are covered by the specified rows, the

storage correctness check equation would always hold.

To prevent colluding servers from recoveringPand

making up consistently-related data and parity blocks, we

utilize the technique of adding random perturbations to the

encoded file matrix and hence hide the secret matrixP.We

make use of a keyed pseudorandom functionfkj

ðÞ with

keykj and seeds

ver

ij , both of which has been introduced

previously. In order to maintain the systematic layout of

data file, we only blind the parity blocks with random

perturbations (We can also only blind data blocks and

achieve privacy-preserving third party auditing, as shown

in Section 3.5). Our purpose is to add "noise" to the set of

linear equations and make it computationally infeasible to

solve for the correct secret matrixP. By blinding each parity

block with random perturbation, the malicious servers no

longer have all the necessary information to build up the

correct linear equation groups and therefore cannot derive

the secret matrix P.

5.3 Performance Evaluation

We now assess the performance of the proposed storage

auditing scheme. We focus on the cost of file distribution

preparation as well as the token generation. Our experiment

is conducted on a system with an Intel Core 2 processor

running at 1.86 GHz, 2,048 MB of RAM, and a 7,200 RPM

Western Digital 250 GB Serial ATA drive. Algorithms are

implemented using open-source erasure coding library

Jerasure [34] written in C. All results represent the mean

of 20 trials.

5.3.1 File Distribution Preparation

As discussed, file distribution preparation includes the

generation of parity vectors (the encoding part) as well as

the corresponding parity blinding part. We consider two

sets of different parameters for the ðm; kÞ Reed-Solomon

encoding, both of which work overGFð2

16

Þ. Fig. 4 shows

the total cost for preparing a 1 GB file before outsourcing. In

the figure on the left, we set the number of data vectors m

constant at 10, while decreasing the number of parity

vectorskfrom 10 to 2. In the one on the right, we keep the

total number of data and parity vectors mþkfixed at 22,

and change the number of data vectorsmfrom 18 to 10.

From the figure, we can see the numberkis the dominant

factor for the cost of both parity generation and parity

blinding. This can be explained as follows: on the one hand,

kdetermines how many parity vectors are required before

data outsourcing, and the parity generation cost increases

almost linearly with the growth ofk; on the other hand, the

growth of k means the larger number of parity blocks

required to be blinded, which directly leads to more calls to

our nonoptimized PRF generation in C. By using more

practical PRF constructions, such as HMAC [35], the parity

blinding cost is expected to be further improved.

Compared to the existing work [23], it can be shown from

Fig. 4 that the file distribution preparation of our scheme is

more efficient. This is because in [23] an additional layer of

error-correcting code has to be conducted on all the data and

parity vectors right after the file distribution encoding. For

the same reason, the two-layer coding structure makes the

solution in [23] more suitable for static data only, as any

WANG ET AL.: TOWARD SECURE AND DEPENDABLE STORAGE SERVICES IN CLOUD COMPUTING 229

(a) (b)

Fig. 4. Performance comparison between two different parameter settings for 1 GB file distribution preparation. Theðm; kÞ denotes the chosen

parameters for the underlying Reed-Solomon coding. For example, (10,2) means we divide file into 10 data vectors and then generate two redundant

parity vectors. (a)mis fixed, and kis decreasing. (b) mþkis fixed.

change to the contents of fileFmust propagate through the

two-layer error-correcting code, which entails both high

communication and computation complexity. But in our

scheme, the file update only affects the specific "rows" of

the encoded file matrix, striking a good balance between

both error resilience and data dynamics.

5.3.2 Challenge Token Computation

Although in our scheme the number of verification tokentis

a fixed priori determined before file distribution, we can

overcome this issue by choosing sufficient largetin practice.

For example, whentis selected to be 7,300 and 14,600, the

data file can be verified every day for the next 20 years and

40 years, respectively, which should be of enough use in

practice. Note that instead of directly computing each token,

our implementation uses the Horner algorithm suggested in

[24] to calculate token v

ðjÞ

i

from the back, and achieves a

slightly faster performance. Specifically

v

ðjÞ

i ¼

Xr

q¼1

rþ1q

i

G

ðjÞ

½Iq¼ ðððG

ðjÞ

½I1i

þG

ðjÞ

½I2Þ i þG

ðjÞ

½I3...Þi þG

ðjÞ

½IrÞ i

;

which only requiresr multiplication andðr1Þ XOR

operations. With Jerasure library [34], the multiplication over

GFð2

16

Þin our experiment is based on discrete logarithms.

Following the security analysis, we select a practical

parameter r¼460 for our token precomputation (see

Section 5.2.1), i.e., each token covers 460 different indices.

Other parameters are along with the file distribution

preparation. Our implementation shows that the average

tokenprecomputationcostisabout0.4ms. Thisis

significantly faster than the hash function based token

precomputation scheme proposed in [14]. To verify encoded

data distributed over a typical number of 14 servers, the total

cost for token precomputation is no more than 1 and

1.5 minutes, for the next 20 years and 40 years, respectively.

Note that each token is only an element of fieldGFð2

16

Þ, the

extra storage for those precomputed tokens is less than 1MB,

and thus can be neglected. Table 1 gives a summary of

storage and computation cost of token precomputation for

1GB data file under different system settings.

6RELATEDWORK

Juels and Kaliski Jr. [10] described a formal "proof of

retrievability" (POR) model for ensuring the remote data

integrity. Their scheme combines spot-checking and error-correcting code to ensure both possession and retrievability

of files on archive service systems. Shacham and Waters

[17] built on this model and constructed a random linear

function-based homomorphic authenticator which enables

unlimited number of challenges and requires less commu-nication overhead due to its usage of relatively small size of

BLS signature. Bowers et al. [18] proposed an improved

framework for POR protocols that generalizes both Juels

and Shacham’s work. Later in their subsequent work,

Bowers et al. [23] extended POR model to distributed

systems. However, all these schemes are focusing on static

data. The effectiveness of their schemes rests primarily on

the preprocessing steps that the user conducts before

outsourcing the data fileF. Any change to the contents of

F, even few bits, must propagate through the error-correcting code and the corresponding random shuffling

process, thus introducing significant computation and

communication complexity. Recently, Dodis et al. [20] gave

theoretical studies on generalized framework for different

variants of existing POR work.

Ateniese et al. [11] defined the "provable data posses-sion" (PDP) model for ensuring possession of file on

untrusted storages. Their scheme utilized public key-based

homomorphic tags for auditing the data file. However, the

precomputation of the tags imposes heavy computation

overhead that can be expensive for an entire file. In their

subsequent work, Ateniese et al. [14] described a PDP

scheme that uses only symmetric key-based cryptography.

This method has lower overhead than their previous

scheme and allows for block updates, deletions, and

appends to the stored file, which has also been supported

in our work. However, their scheme focuses on single

server scenario and does not provide data availability

guarantee against server failures, leaving both the distrib-uted scenario and data error recovery issue unexplored. The

explicit support of data dynamics has further been studied

in the two recent work [15] and [16]. Wang et al. [15]

proposed to combine BLS-based homomorphic authentica-tor with Merkle Hash Tree to support fully data dynamics,

while Erway et al. [16] developed a skip list-based scheme

to enable provable data possession with fully dynamics

support. The incremental cryptography work done by

Bellare et al. [36] also provides a set of cryptographic

building blocks such as hash, MAC, and signature functions

that may be employed for storage integrity verification

while supporting dynamic operations on data. However,

this branch of work falls into the traditional data integrity

protection mechanism, where local copy of data has to be

maintained for the verification. It is not yet clear how the

work can be adapted to cloud storage scenario where users

no longer have the data at local sites but still need to ensure

the storage correctness efficiently in the cloud.

230 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 5, NO. 2, APRIL-JUNE 2012

TABLE 1

The Storage and Computation Cost of Token Precomputation

for 1 GB Data File under Different System Settings

Theðm; kÞdenotes the parameters for the underlying Reed-Solomon coding, as illustrated in Fig. 4.

In other related work, Curtmola et al. [19] aimed to

ensure data possession of multiple replicas across the

distributed storage system. They extended the PDP scheme

to cover multiple replicas without encoding each replica

separately, providing guarantee that multiple copies of data

are actually maintained. Lillibridge et al. [25] presented a

P2P backup scheme in which blocks of a data file are

dispersed acrossmþkpeers using anðm; kÞ-erasure code.

Peers can request random blocks from their backup peers

and verify the integrity using separate keyed cryptographic

hashes attached on each block. Their scheme can detect data

loss from free-riding peers, but does not ensure all data are

unchanged. Filho and Barreto [37] proposed to verify data

integrity using RSA-based hash to demonstrate uncheatable

data possession in peer-to-peer file sharing networks.

However, their proposal requires exponentiation over the

entire data file, which is clearly impractical for the server

whenever the file is large. Shah et al. [12], [13] proposed

allowing a TPA to keep online storage honest by first

encrypting the data then sending a number of precomputed

symmetric-keyed hashes over the encrypted data to the

auditor. However, their scheme only works for encrypted

files, and auditors must maintain long-term state. Schwarz

and Miller [24] proposed to ensure static file integrity across

multiple distributed servers, using erasure-coding and

block-level file integrity checks. We adopted some ideas

of their distributed storage verification protocol. However,

our scheme further support data dynamics and explicitly

study the problem of misbehaving server identification,

while theirs did not. Very recently, Wang et al. [31] gave a

study on many existing solutions on remote data integrity

checking, and discussed their pros and cons under different

design scenarios of secure cloud storage services.

Portions of the work presented in this paper have

previously appeared as an extended abstract in [1]. We have

revised the paper a lot and add more technical details as

compared to [1]. The primary improvements are as follows:

First, we provide the protocol extension for privacy-preserving third-party auditing, and discuss the application

scenarios for cloud storage service. Second, we add correct-ness analysis of proposed storage verification design. Third,

we completely redo all the experiments in our performance

evaluation part, which achieves significantly improved

result as compared to [1]. We also add detailed discussion

on the strength of our bounded usage for protocol verifica-tions and its comparison with state of the art.

7CONCLUSION

In this paper, we investigate the problem of data security in

cloud data storage, which is essentially a distributed storage

system. To achieve the assurances of cloud data integrity

and availability and enforce the quality of dependable

cloud storage service for users, we propose an effective and

flexible distributed scheme with explicit dynamic data

support, including block update, delete, and append. We

rely on erasure-correcting code in the file distribution

preparation to provide redundancy parity vectors and

guarantee the data dependability. By utilizing the homo-morphic token with distributed verification of erasure-coded data, our scheme achieves the integration of storage

correctness insurance and data error localization, i.e.,

whenever data corruption has been detected during the

storage correctness verification across the distributed

servers, we can almost guarantee the simultaneous identi-fication of the misbehaving server(s). Considering the time,

computation resources, and even the related online burden

of users, we also provide the extension of the proposed

main scheme to support third-party auditing, where users

can safely delegate the integrity checking tasks to third-party auditors and be worry-free to use the cloud storage

services. Through detailed security and extensive experi-ment results, we show that our scheme is highly efficient

and resilient to Byzantine failure, malicious data modifica-tion attack, and even server colluding attacks.

ACKNOWLEDGMENTS

This work was supported in part by the US National

Science Foundation under grants CNS-1054317, CNS-1116939, CNS-1156318, and CNS-1117111, and by an

Amazon web service research grant. A preliminary

version [1] of this paper was presented at the 17th IEEE

International Workshop on Quality of Service (IWQoS ’09).



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now