Database security and encryption

23 Mar 2015 18 May 2017

This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.


Organisations are increasingly relying on the distributed information systems to gain productivity and efficiency advantages, but at the same time are becoming more vulnerable to security threats. Database systems are an integral component of this distributed information system and hold all the data which enables the whole system to work. A database can be defined as a shared collection of logically related data and a description of this data, designed to meet the information needs of an organization. A database system is considered as a collection of related data, database management system (DBMS) - a software that manages (define, create and maintain) and controls the access to the database, and a collection of database application(s) - a program that interacts with the database at some point in its execution (typical example is a SQL statement) along with the DBMS and the database itself [1].

Organisations have adopted database systems as the key data management technology for decision-making and day-to-day operations. Databases are designed to hold large amounts of data and management of data involves both defining structures for storage of information and providing mechanisms for manipulation of information. As the data is to be shared among several users the system must avoid anomalous results and ensure the safety of the information stored despite system crashes and attempts at unauthorized access. The data involved here can be highly sensitive or confidential, thus making the security of the data managed by these systems even more crucial as any security breach does not affect only a single application or user but can have disastrous consequences on the entire organisation. A number of security techniques have been suggested over the period of time to tackle the security issues. These can be classified as access control, inference control, flow control, and encryption.

1.1 A Short History

Starting from the day one when database applications were build using hierarchical and network systems to today's date when we have so many different database systems like relational databases (RDBMS), object-oriented databases (OODBMS), object-relational databases (ORDBMS), eXtended Query (XQUERY); one factor which was, is, and will be of the utmost importance is the security of the data involved. Data always has been a valuable asset for companies and must be protected. Organizations spend millions these days in order to achieve the best security standards for the DBMS. Most of an organizations sensitive and proprietary data resides in a DBMS, thus the security of the DBMS is a primary concern. When we talk of securing a DBMS, this is with respect to both the internal and the external users. The internal users are the organization employees like database administrators, application developers, and end users who just use the application interface, which fetch its data from one of the databases and the external users can be the employees who do not have access to the database or an outsider who has nothing to do with the organization. The other factors which has made data security more crucial is the recent rapid growth of the web based information systems and applications and the concept of mobile databases.

Any intentional or accidental event that can adversely affect a database system is considered as a threat to database and database security can be defined as a mechanism that protects the database against such intentional or accidental threats. Security breaches can be classified as unauthorized data observation, incorrect data modification, and data unavailability, which can lead to loss of confidentiality, availability, integrity, privacy, and theft and fraud. Unauthorized data observation results in disclosure of information to users who might not be entitled to have access to such kind of information. Incorrect data modification intentional or unintentional leaves the database in an incorrect state. Data can hamper the functionality of an entire organization in a proper way if not available when needed. Thus the security in terms of databases can be broadly classified into access security and internal security. Access security refers to the mechanisms implemented to restrict any sort of unauthorized access to the database; examples can be authorization methods such as every user has a unique username and password to establish him as a legitimate user when trying to connect to the database. When the user tries to connect to the database the login credentials will be checked against a set of usernames and password combinations setup under a security rule by a security administrator. Internal security can be referred to as an extra level of security, which comes into picture if someone has already breached the access security such as getting hold of a valid username and password, which can help getting access to the database. So the security mechanism implemented within the database such as encrypting the data inside the database can be classed as internal security, which prevents the data to be compromised even if someone has got unauthorized access to the database.

Every organization needs to identify the threats they might be subjected to and the subsequently appropriate security plans and countermeasures should be taken, taking into consideration their implementation costs and effects on performance. Addressing these threats helps the enterprise to meet the compliance and risk mitigation requirements of the most regulated industries in the world.

1.2 How Databases are Vulnerable

According to David Knox [2], "Securing the Database may be the single biggest action an organization can take, to protect its assets". Most commonly used database in an enterprise organization is relational database. Data is a valuable resource in an enterprise organization. Therefore they have a very strong need of strictly controlling and managing it. As discussed earlier it is the responsibility of the DBMS to make sure that the data is kept secure and confidential as it the element which controls the access to the database. Enterprise database infrastructure is subject to an overwhelming range of threats most of the times. The most common threats which an Enterprise Database is exposed to are:

  • Excessive Privilege Abuse - when a user or an application has been granted database access privileges which exceeds the requirements of their job functions. For example an academic institute employee whose job only requires only the ability to change the contact information for a student can also change the grades for the student.
  • Legitimate Privilege Abuse - legitimate database access privileges can also be abused for malicious purposes. We have two risks to consider in this situation. The first one is confidential/sensitive information can be copied using the legitimate database access privilege and then sold for money. The second one and perhaps the more common is retrieving and storing large amounts of information on client machine for no malicious reason, but when the data is available on an endpoint machine rather than the database itself, it is more susceptible to Trojans, laptop theft, etc.
  • Privilege Elevation - software vulnerabilities which can be found in stored procedures, built-in functions, protocol implementations or even SQL statements. For example, a software developer can gain the database administrative privileges by exploiting the vulnerabilities in a built-in function.
  • Database Platform Vulnerabilities - any additional services or the operating system installed on the database server can lead to an authorized access, data corruption, or denial of service. For example the Blaster Worm which took advantage of vulnerability in Windows 2000 to create denial of service.
  • SQL Injection - the most common attack technique. In a SQL injection attack, the attacker typically inserts unauthorized queries into the database using the vulnerable web application input forms and they get executed with the privileges of the application. This can be done in the internal applications or the stored procedures by internal users. Access to entire database can be gained using SQL injection
  • Weak Audit - a strong database audit is essential in an enterprise organization as it helps them to fulfill the government regulatory requirements, provides investigators with forensics link intruders to a crime deterring the attackers. Database Audit is considered as the last line of database defense. Audit data can identify the existence of a violation after the fact and can be used to link it to a particular user and repair the system in case corruption or a denial of service attack has occurred. The main reasons for a weak audit are: it degrades the performance by consuming the CPU and disk resources, administrators can turn off audit to hide an attack, organizations with mixed database environments cannot have a uniform, scalable audit process over the enterprise as the audit processes are unique to database server platform
  • Denial of Service - access to network applications or data is denied to the intended users. A simple example can be crashing a database server by exploiting vulnerability in the database platform. Other common denial of service techniques are data corruption, network flooding, server resource overload (common in database environments).
  • Database Protocol Vulnerabilities - SQL Slammer worm took advantage of a flaw in the Microsoft SQL Server protocol to force denial of service conditions. It affected 75,000 victims just over 30 minutes dramatically slowing down general internet traffic. [Analysis of BGP Update Surge during Slammer Worm Attack]
  • Weak Authentication - obtaining legitimate login credentials by improper way contributes to weak authentication schemes. The attackers can gain access to a legitimate users login details by various ways: by repeatedly entering the username/password combination until he finds the one which works (common or weak passwords can be guessed easily), by convincing someone to share their login credentials, by stealing the login credentials by copying the password files or notes.
  • Backup Data Exposure - there are several cases of security breaches involving theft of database backup tapes and hard disks as this media is thought of as least prone to attack and is often completely unprotected form attack [3].

All these security threats can be accounted for unauthorized data observation, incorrect data modification and data unavailability. A complete data security solution must take into consideration the secrecy/confidentiality, integrity and availability of data. Secrecy or confidentiality refers to the protection of data against unauthorized disclosure, integrity refers to prevention of incorrect data modification and availability refers to prevention of hardware/software errors and malicious data access denials making the database unavailable.

1.3 Security Techniques

As organizations increase their adoption of database systems as the key data management technology for day-to-day operations and decision-making, the security of data managed by these systems has become crucial. Damage and misuse of data affect not only a single user or application, but may have disastrous consequences on the entire organization. There are four main control measures which can be used to provide security of data in databases. These are:

  • Access Control
  • Inference Control
  • Flow Control
  • Data Encryption

Chapter - 2

Literature Review

Secure and secret means of communication has been always desired for in the field of database systems. There is always a possibility of interception by a party outside of the sender-receiver domain when data is transmitted. Modern digital-based encryption methods form the basis of today's world database security. Encryption in its earlier days was used by military and government organizations to facilitate secret information but in present times it is used for protecting information within many kinds of civilian systems. In 2007 the U.S. government reported that 71% of companies surveyed utilized encryption or some of their data in transit [4].

2.1 Encryption

Encryption is defined as the process of transforming information (plaintext) using an encryption algorithm (cipher) into unreadable form (encrypted information called as ciphertext) making it inaccessible to anyone without possessing special knowledge to decrypt the information. "The encoding of the data by a special algorithm that renders the data unreadable by any program without the decryption key", is called encryption [1].

The code and cipher are the two methods of encrypting data. The encryption of data or a message is accomplished by one, or both, of the methods of encoding or enciphering. Each involves distinct methodologies and the two are differentiated by the level at which they are carried out. Encoding is performed at the word or block level and deals with the manipulation of groups of characters. Enciphering works at the character level. This includes scrambling individual characters in a message, referred to as transposition, and substitution, or replacing characters with others. Codes generally are designed to replace entire words or blocks of data in a message with other words or blocks of data. Languages can be considered codes, since words and phrases represent ideas, objects, and actions. There are codes that substitute entire phrases or groups of numbers or symbols with others. A single system may employ both levels of encoding. For example, consider a code encryption scheme as follows: the = jam, man = barn, is = fly, dangerous = rest. Then the message, the man is dangerous, would read in encrypted form, jam barn fly rest. Although overly-simplistic, this example illustrates the basis of codes. With the advent of electrical-based communications, codes became more sophisticated in answer to the needs of the systems. For example, the inventions of Morse code and the telegraph dictated a need for secure transmission that was more sophisticated. Codes are very susceptible to breaking and possess a large exposure surface with regard to interception and decryption via analysis. Also, there are no easily-implemented means by which to detect breaches in the system. The other method of encryption is the cipher. Instead of replacing words or blocks of numbers or symbols with others, as does the code, the cipher replaces individual or smaller sets of letters, numbers, or characters with others, based on a certain algorithm and key. Digital data and information, including video, audio, and text, can be separated into groups, or blocks, of bits, and then manipulated for encryption by such methods as XOR (exclusive OR), encoding-decoding, and rotation. As an example, let us examine the basics of the XOR method. Here, a group of bits (e.g., a byte) of the data is compared to a digital key, and the exclusive-or operation is performed on the two to produce an encrypted result. Figure 2 illustrates the process.

Figure 2: The XOR process for Encryption

When the exclusive-or operation is performed on the plaintext and key, the ciphertext emerges and is sent. The receiver performs the exclusive-or operation on the ciphertext and the same key, and the original plaintext is reproduced [5].

Encryption can be reversible and irreversible. Irreversible techniques do not allow the encrypted data to be decrypted, but at the same time the encrypted data can be used to obtain valid statistical information. Irreversible techniques are rarely used as compared to the reversible ones. The whole process of transmitting data securely over an insecure network system is called as cryptosystem that includes

û An encryption key to encrypt the data (plaintext)

û An encryption algorithm that transforms the plaintext into encrypted information (ciphertext) with the encryption key

û A decryption key to decrypt the ciphertext

û A decryption algorithm that transforms the ciphertext back into plaintext using the decryption key [1].

2.2 Encryption Techniques

The goals in digital encryption are no different than those of historical encryption schemes. The difference is found in the methods, not the objectives. Secrecy of the message and keys are of paramount importance in any system, whether they are on parchment paper or in an electronic or optical format [5]. Various encryption techniques are available and broadly can be classified into two categories; asymmetric and symmetric encryption. In symmetric encryption the sender and receiver share the same algorithm and key for encryption and decryption and depends on safe communication network for encryption key exchange whereas in asymmetric encryption uses different keys for encryption and decryption. Asymmetric encryption gave birth to the concept of public and private keys and is preferred to symmetric encryption being more secure [1], [5].

2.2.1 Symmetric Encryption

Symmetric encryption also known as single-key encryption or conventional encryption was the only encryption and by far the most widely used of the two types before the concept of public-key encryption came into picture. The figure below illustrates the symmetric encryption process. The original message (plaintext) is converted into apparently random information (ciphertext) using an algorithm and a key. The key is a value independent of the plaintext. The algorithm produces different outputs for specific keys used at the time i.e. the output of the algorithm changes if the key is changed. The ciphertext produced is then transmitted and is transformed back to the original plaintext by using a decryption algorithm and the same key that was used for encryption.

Figure: Simplified Model of Conventional Encryption [7 page - 22]

The model can be better understood by the following example. A source produces a message X = [X1, X2, X3 …XM] in plaintext. The M elements of X are letters in some finite alphabet. The alphabet usually consisted of 26 capital letters traditionally but nowadays; binary alphabet {0,1} is used. An encryption key K = [K1, K2, K3 ….KJ] is generated and is shared between the sender and the receiver using a secure channel. Also a third party can generate the encryption key and securely deliver it to both the sender and the receiver. Using the plaintext X and the encryption key K as input, the encryption algorithm produces the ciphertext Y = [Y1, Y2, Y3 ….YN] as

Y = EK(X)

where E is the encryption algorithm and the ciphertext Y is produced as the function of the plaintext X using E. At the receiver's end the ciphertext is converted back to the plaintext as

X = DK(Y)

where D is the decryption algorithm.

Figure: Model of Conventional Cryptosystem [7 page - 23]

The common symmetric block ciphers are Data Encryption Standard (DES), Triple DES, and Advanced Encryption Standard (AES) The Data Encryption Standard

Data Encryption Standard has been used in the most widely used encryption schemes including Kerberos 4.0. The National Bureau of Standards adopted it as a standard in 1977 [7]. DES operates on 64-bit blocks using a 56-bit key. Like other encryption schemes, in DES there are two inputs to the encryption function, the plaintext to be encrypted and the key. The plaintext should be of 64 bits in length and the key length is 56 bits obtained by stripping off the 8 parity bits, ignoring every eighth bit from the given 64-bit key. The output from the algorithm after 16 rounds of identical operations is the 64-bit block of ciphertext. A suitable combination of permutations and combinations (16 times) on the plaintext is the basic building block of the DES. Same algorithm is used for both encryption and decryption except for processing the key schedule in the reverse order [6], [7].

The 64-bit plaintext is passed through an initial permutation (IP) that produces a permuted input by rearranging the bits. This is followed by16 rounds of the same function, which involves both permutation and substitution functions. The last round results in the output consisting of 64-bits that are a function of the input plaintext and the key. The left and the right halves of the output are swapped to produce the preoutput. The preoutput is passed through a final permutation (IP-1), an inverse of the initial permutation function to achieve the 64-bit ciphertext. The overall process for DES is explained in the diagram below

Figure: General Depiction of DES Encryption Algorithm [7 page - 67]

The right hand side of the diagram explains how the 56-bit key is used during the process. The key is passed through a permutation function initially and then for each of the 16 rounds a subkey (Ki) is generated, by combining left circular shift and a permutation. For every round the permutation function is same, but the subkey is different because of the repeated iteration of the key bits.

Since the adoption of DES as a standard, there have always been concerns about the level of security provided by it. The two areas of concern in DES are the key length and that the design criteria for the internal structure of the DES, the S-boxes, were classified. The issue with the key length was, it was reduced to 56 bits from 128 bits as in the LUCIFER algorithm [add a new reference], which was the base for DES and everyone suspected that this is an enormous decrease making it too short to withstand brute-force attacks. Also the user could not be made sure of any weak points in the internal structure of DES that would allow NSA to decipher the messages without the benefit of the key. The recent work on differential cryptanalysis and subsequent events indicated that the internal structure of DES is very strong. Triple DES

Triple DES was developed as an alternative to the potential vulnerability of the standard DES to a brute-force attack. It became very popular in Internet-based applications. Triple DES uses multiple encryptions with DES and multiple keys as shown in the figure [below]. Triple DES with two keys is relatively preferred to DES but Triple DES with three keys is preferred overall. The plaintext P is encrypted with the first key K1, then decrypted with the second key K2 and then finally encrypted again with the third key K3.According to the figure the ciphertext C is produced as

C = EK3[DK2[EK1[P]]]

These keys need to be applied in the reverse order while decrypting. The ciphertext c is decrypted with the third key K3 first, then encrypted with the second key K2, and then finally decrypted again with the first key K1; also called as Encrypt-Decrypt-Encrypt (EDE) mode, producing the plaintext P as

P = DK1[EK2[DK3[C]]]

Figure: Triple DES encryption/decryption [6 page - 72] Advanced Encryption Standard

2.3 Encryption in Database Security

Organizations are increasingly relying on, possibly distributed, information systems for daily business; hence they become more vulnerable to security breaches even as they gain productivity and efficiency advantages. Database security has gained a substantial importance over the period of time. Database security has always been about protecting the data - data in the form of customer information, intellectual property, financial assets, commercial transactions, and any number of other records that are retained, managed and used on the systems. The confidentiality and integrity of this data needs to be protected as it is converted into information and knowledge within the enterprise. Core enterprise data is stored in relational databases and then offered up via applications to users. These databases typically store the most valuable information assets of an enterprise and are under constant threat, not only from the external users but also from the legitimate users such as trusted insiders, super users, consultants and partners - or perhaps their unprotected user accounts - that compromise the system and take or modify the data for some inappropriate purpose.

To begin with, classifying the types of information in the database and the security needs associated with them is the first and important step. As databases are used in a multitude of ways, it is useful to have some of the primary functions characterized in order to understand the different security requirements. A number of security techniques have been developed and are being developed for database security, encryption being one of them.

Encryption is defined as the process of transforming information (plaintext) using an encryption algorithm (cipher) into unreadable form (encrypted information called as ciphertext) making it inaccessible to anyone without possessing special knowledge to decrypt the information. "The encoding of the data by a special algorithm that renders the data unreadable by any program without the decryption key", is called encryption [1].

2.3.1 Access Encryption

There are multiple reasons for access control to confidential information in enterprise computing environments being challenging. Few of them are: First, the number of information services in an enterprise computing environment is huge which makes the management of access rights essential. Second, a client might not know which access rights are necessary in order to be granted access to the requested information before requesting access. Third, flexible access rights including context-sensitive constraints must be supported by access control

Access control schemes can be broadly classified in two types: proof-based and encryption-based access control schemes. In a proof-based scheme, "a client needs to assemble some access rights in a proof of access, which demonstrates to a service that the client is authorized to access the requested information". Proof-based access control is preferred to be used for scenarios where client specific access rights required are flexible. It becomes easy to include support for constraints if the access rights are flexible. However, it is not the same case for covert access requirements. According to the existing designs, it is assumed that a service can inform a client of the nature of the required proof of access. The service does not need to locate the required access rights, which can be an expensive task, in proof-based access control scheme. [9]

In an encryption-based access-control scheme, confidential information is provided to any client in an encrypted form by the service. Clients who are authorized to access the information have the corresponding decryption key. Encryption-based access-control scheme is attractive for scenarios where there are lots of queries to a service shielding the service from having to run client-specific access control. As compared to proof-based access control it is straightforward to add support for covert access requirements to existing encryption-based architectures. In particular, all the information is encrypted by the service as usual, but the client is not told about the corresponding decryption key to use. The client has a set of decryption keys, the client now needs to search this set for a matching key. On the other hand, considering that key management should remain simple, it is less straightforward to add support for constraints on access rights to the proposed architectures. [10] Encryption-Based Access Control

Encryption-based access control is attractive, in case there are lots of requests for the same information, as it is independent of the individual clients issuing these requests. For example, an information item can be encrypted once and the service can use the ciphertext for answering multiple requests. However, dealing with constraints on access rights and with granularity aware access rights becomes difficult with the uniform treatment of requests. Further challenges are presented in cases of covert access requirements and service-independent access rights. The main requirements for encryption based access control are:

û Any knowledge about the used encryption key or the required decryption key must not be revealed by the encrypted information.

û For decrypting encrypted information, each value of a constraint must require a separate key that should be accessible only under the given constraint/value combination and we want a scheme that supports hierarchical constraints to make key management simple.

û The decryption key for coarse-grained information should be derivable from the key for fine-grained information to further simplify key management.

û A single decryption key will be used to decrypt the same information offered by multiple services as implied by the service-independent access rights. Because of this, same information can be accessed by a service encrypting information offered by other services in a symmetric cryptosystem. This problem can be avoided by using asymmetric cryptosystem. [8] Encryption-Based Access Control Techniques

An access-control architecture will be an ideal one if the access rights are simple to manage; the system is constrainable and is aware of granularity. The architecture also has to be asymmetric, provide indistinguishability, and be personalizable in the case of proof-based access control. Some common encryption-based access control techniques are:

Identity Based Encryption - An identity-based encryption scheme is specified by four randomized algorithms:

û Setup: takes a security parameter k and returns system parameters and master-key. The system parameters include a description of a finite message space m and a description of a finite ciphertext space c. Intuitively, the system parameters will be publicly known, while the master-key will be known only to the "Private Key Generator" (PKG).

û Extract: takes as input system parameters, master-key, and an arbitrary ID ϵ {0,1}*, and returns a private key d. ID is an arbitrary string which is then used as a public key, and d is the corresponding private decryption key. The Extract algorithm extracts a private key from the given public key.

û Encrypt: takes as input system parameters, ID, and M ϵ m. It returns a ciphertext C ϵ c.

û Decrypt: takes as input system parameters, C ϵ c, and a private key d. It returns M ϵ m.

Standard consistency constraint must be satisfied by these algorithms, especially when d is the private key generated by algorithm Extract when it is given ID as the public key, then

∀ M ϵ m: Decrypt (params, d) = M where C = Encrypt (params, ID, M) [11]

Hierarchical Identity-Based Encryption - One of the first practical IBE schemes was presented by Boneh and Franklin. Gentry and Silverberg [7] introduced Hierarchical Identity-Based Encryption scheme based on Boneh and Franklin's work. In HIBE, private keys are given out by a root PKG to the sub PKGs, which then in turn distribute private keys to individuals (sub PKGs) in their domains. There are IDs associated with the root PKG and the public key of an individual corresponds to these IDs, any sub PKGs on the path from the root PKG to the individual, and the individual. Public parameters are required only from the root PKG for encrypting messages. It has the advantage of reducing the amount of required storage and the complexity of the access right management. The following figure gives an example overview of HIBE architecture:

Assuming that the service provides location information, we will try and analyze this architecture according to the four algorithms which specify the basic Identity-based encryption. [12]

Setup - Since encryption-based access control is not client-specific, there is no need for Alice to personalize her information and constraint hierarchies.

Access Control - When Bob queries information about Alice (7), the service encrypts the information (8) and returns the encrypted information to Bob (9). The service splits up the information based on its granularity properties and each piece is encrypted separately. For example, the information "17 Grange Road Chester" is split up into "17", "Grange Road", and "Chester". The service then locates the node in Alice's information hierarchy for each piece which describes the piece and gathers the IDs of all the nodes along the path from the root node to his node. Similarly, the service chooses the leaf node that contains the current value of the constraint, for each of the constraint hierarchies, and gathers the IDs along the path from the root node. Encrypt () is then called by the service with the gathered sequences of node IDs. The received ciphertexts is decrypted by Bob by calling Decrypt () with the required tuple of private keys (10) for each ciphertext. He can only decrypt a ciphertext if he has access to the granularity of the encrypted information.

Key management is simplified in IBE. For example, in an email system, Bob can encrypt an email and then send it to Alice, simply by using her email address as public key in IBE. Alice does not needed to be contacted beforehand to acquire a separate public key, which can be disadvantageous as Alice needs to inform a service of her hierarchies and her public key. But we have already mentioned that we do not expect each policymaker to define their own hierarchies. Instead, we can have a shared set of hierarchies, which a service is aware of. A setup step is also necessary for IBE in an email system because; First, IBE schemes require a set of public parameters for encryption and these must be acquired by Bob before he can encrypt email for Alice. Second, the email address Bob is going to use to encrypt information for Alice should really belong to Alice. This address should only be used if provided either directly by Alice or a trusted third party in a setup step. HIBE scheme can be expensive in terms of performance and also require storage and transfer of a constant amount of additional information.

Attribute Based Encryption - with the increase in the amount of sensitive data shared and stored by third-party sites on the Internet, the need to encrypt data stored at these sites also increased. If the storage behind the internet is compromised the amount of information loss will be limited if we have the information on this storage encrypted. Though one disadvantage of encrypting data is, it severely limits the ability of users to selectively share their encrypted data at a fine-grained level. For example, what if, a user wants to grant decryption access for all of its Internet traffic logs for all entries on a particular range of dates that had a source IP address from a particular subnet to a particular third party. The user then either must give the party its private decryption key or needs to act as an intermediate level and decrypt all relevant entries for the party. Neither one of these options is particularly appealing.

Attributed-Based Encryption (ABE) was introduced by Sahai and Waters to make some initial steps to solve this problem. In an ABE system, sets of descriptive attributes are given to the user's keys and ciphertexts. A particular ciphertext can only be decrypted by a particular key if there is a match between the attributes of the ciphertext and the user's key. The cryptosystem of Sahai and Waters allowed for decryption when at least k attributes overlapped between a ciphertext and a private key. While this primitive was shown to be useful for error-tolerant encryption with biometrics, the lack of expressibility seems to limit its applicability to larger systems. [13] [14]

2.3.2 Database Encryption

Today computing environments have progressively shifted their scope and character from traditional, one-on-one client-server interaction to the new cooperative paradigm. Providing means of protecting secrecy of information, while guaranteeing its availability to legitimate clients at the same time has become the primary importance. It is very difficult to operate online querying services securely on open networks. This being the main reason for many enterprise organizations to outsource their data center operations to external application service providers. Encryption at access level and in recent times at data level has been a promising direction toward prevention of unauthorized access to outsourced data. However, data encryption is often supported for the sole purpose of protecting the data in storage while allowing access to plaintext values by the server, which decrypts data for query execution. From my point of view, Database encryption is a time-honored technique. It introduces an additional layer for preventing exposure of sensitive information even if the database server is compromised after conventional network and application-level security levels. Database encryption prevents illegitimate users breaking into a network, from seeing the sensitive data in databases and at the same time, it allows database administrators to perform their tasks without accessing sensitive information (e.g., sales or payroll) in plaintext. [15]

It has been long since Database encryption has been proposed as a fundamental tool for providing strong security for data at rest. The idea of encrypting database is well recognized due to the recent advances in processors capabilities and the development of fast encryption techniques. Database vendors like ORACLE and MICROSFOT (SQLSERVER) has introduced inbuilt database encryption. However, there are still many issues surrounding developing a sound security strategy including database encryption. Key management and security are of prime importance in any encryption-based system and were therefore among the first issues to be investigated in the framework of database encryption [17] [16] Whole Database Encryption (Cell/Column Encryption)

A lot of research has been done on the security and privacy of database information at the storage level focusing mainly on encrypting the database contents at rest in the database. This can prevent an illegitimate user to break into the database server, protects the data from the network or domain administrators, but it does not protect the privacy or integrity of the data travelling between the application client and the database over the network. On the other hand there is a considerable performance impact, and limitations in certain database operations like comparison queries and updates on encrypted data as a result of the necessity to decrypt the encrypted data before being processed by the database server. To decrease the performance impact and to relax some of the restrictions on the basic database server operations the concept of column-based encryption on database tables was introduced. Still, a significant performance decline is experienced when accessing and updating encrypted data or when performing comparison searches and queries on an encrypted column in large databases. [18]

ORACLE - Authentication, authorization, and auditing mechanisms are used in Oracle Database 10g to secure data in the database. This does not protect the data in the operating system files where the data is actually stored. The concept of Transparent Database Encryption (TDE) was introduced in Oracle Database 10g to protect those files. This feature enabled users to protect sensitive data in database columns stored in operating system files by encrypting it and to prevent unauthorized decryption; it stored encryption keys in a security module external to the database. Users or applications do not require managing the encryption keys in transparent data encryption. This freedom can be extremely important when addressing, for example, regulatory compliance issues. Once a user has passed access control checks the data is transparently decrypted, hence no need to use views to decrypt data. Security administrators have the assurance that the data on disk is encrypted, yet handling encrypted data becomes transparent to applications. Transparent data encryption can be used to protect confidential data such as credit card and social security numbers without having to manage key storage or create auxiliary tables, views, and triggers. An application that processes sensitive data can use this feature to provide strong data encryption with little or no change to the application. [19] [20]

How does it work?

Transparent data encryption is a key-based access control system. The encrypted data cannot be understood until authorized decryption occurs, so even if the data is compromised the loss will be limited and the authorized decryption is automatic for the legitimate users (one who have passed the access control checks). A single key is used regardless of the number of encrypted columns in a table containing encrypted columns. The database server master keys is used to encrypt all keys for all the tables containing encrypted columns and then the keys are stored in a dictionary table in the database. No keys are stored in the clear.

As shown in Figure below, the master key of the server is stored in an external security module outside the database and is only accessible to the security administrator. The external security module used by ORACLE is Oracle Wallet. For this external security module, Oracle uses an Oracle wallet. The unauthorized use of the master key is prevented by storing the master key in this way. Oracle wallet also generates encryption keys and performs encryption and decryption in addition to storing the master key. Using an external security module also provides the option of separating ordinary program functions from encryption operations, making it possible to divide duties between database administrators and security administrators. Security is enhanced because no single administrator is granted complete access to all data. [19] [20]

SQL Server - Transparent data encryption (TDE) has been introduced in Microsoft SQL Server 2008 as a new whole (or partial) database encryption technique. It is designed to provide protection for the entire database at rest without affecting existing applications. Encrypting databases traditionally involved complicated application changes such as modifying table schemas, removing functionality, and significant performance degradations. All of these contribute to slow query performance. TDE solved these issues by simply encrypting everything. Thus, all data types, keys, indexes, and so on can be used to their full potential without sacrificing security or leaking information on the disk. Two Windows features, Encrypting File System (EFS) and BitLocker Drive Encryption, are often used as cell-level encryption cannot offer these benefits; they provide protection on a similar scale and are transparent to the user as TDE. [21]

How does it Work?

Microsoft SQL Server offers two levels of encryption: database-level and cell-level, both using the key management hierarchy. At the root of encryption tree is the Windows Data Protection API (DPAPI), which secures the key hierarchy at the machine level and protects the service master key (SMK) for the database server instance. The SMK protects the database master key (DMK), which is stored at the user database level and which in turn protects certificates and asymmetric keys. These in turn protect symmetric keys, which protect the data. TDE uses a similar hierarchy down to the certificate. The primary difference is, in TDE the DMK and certificate must be stored in the master database rather than in the user database. A new key, used only for TDE and referred to as the database encryption key (DEK), is created and stored in the user database.

The figure on the next page shows the full encryption hierarchy. The encryption hierarchy used by TDE is represented by the dotted lines.In the cell-level and database-level encryption, this hierarchy enables the server to automatically open keys and decrypt data. The important difference is, in cell-level encryption, all keys from the DMK down can be protected by a password instead of by another key which breaks the decryption chain and forces the user to input a password to access data; whereas in TDE, the entire chain from DPAPI down to the DEK must be maintained so that the server can automatically provide access to files protected by TDE. Windows Cryptographic API (CAPI) is used in both cell-level encryption and TDE to provide encryption and decryption through these keys. [21]

The database is marked as encrypted in the sys.databases catalog view, the DEK state is set to Encryption In Progress, the server starts a background thread called as the encryption scan which scans all database files and encrypts them (or decrypts them if you are disabling TDE); when TDE is enabled (or disabled). When the encryption scan is completed, the DEK state is set to the Encrypted state. At this point all database files on disk are encrypted and database and log file writes to disk will be encrypted.

TDE in SQL Server 2008 supports AES with 128-bit, 192‑bit, or 256‑bit keys or 3 Key Triple DES as encryption algorithms. Data is encrypted in the cipher block chaining (CBC) encryption mode. [21]

2.4 Impact on Database/Conclusion/Evaluation

Encryption has been recommended as an effective measure of protecting databases against illegitimate access, but the process of encryption has been limited to access encryption only. There has been lot of emphasis and research work on the access encryption. The idea of encrypting data at rest is still kind of new to the database world. The reason for this has been the issues surrounding encrypting data at rest like performance degradation, extra cost with respects to performance and resources. Performance with respect to time issues, limitations in certain database operations like comparison queries and updates on encrypted data in large databases and key management are the big ones. Remedies have been proposed for key management like using external security module (Oracle Wallet being one of the examples). But improving performance decline has still been a target to achieve. Vendors have been continuously been trying to improve the performance decline involved with encrypting data at rest.

ORACLE - Oracle is one of the other major database vendors and it still has not recommended whole database encryption. They are still holding on to the partial (cell/column level) encryption which they introduced in the ORACLE 10g. Their main argument about efficiency is "This feature (TDE) affects performance only when data is retrieved from or inserted into an encrypted column. No reduction of performance occurs for such operations on other columns, even in a table containing encrypted columns. The total performance effect depends on the number of encrypted columns and their frequency of access. The columns most appropriate for encryption are those containing the most sensitive data, including regulatory mandates [20]". Transparent data encryption in ORACLE also cannot be used with the following database features:

  • Index types other than B-tree
  • Range scan search through an index
  • Large object data types such as BLOB and CLOB
  • Original import/export utilities
  • Other database tools and utilities that directly access data files

SQL SERVER - Encrypting databases traditionally involved complicated application changes such as modifying table schemas, removing functionality, and significant performance degradations. For example, in Microsoft SQL Server 2005, the column data type must be changed to varbinary; ranged and equality searches are not allowed; and the application must call built-ins or stored procedures or views that automatically use these built-ins to handle encryption and decryption. All of these contribute to slow query performance. These issues are not unique to SQL Server; other database management systems face similar limitations. Custom schemes are often used to resolve equality searches and ranged searches often cannot be used at all. Even basic database elements such as creating an index or using foreign keys often do not work with cell-level or column-level encryption schemes because the use of these features inherently leak information.

There is still lot more of research, time and money which is required to be spent on the topic of Database Encryption and that too mainly focusing on the encryption on the data at rest involving both partial and whole database encryption rather than access encryption. It is widely accepted and has been proved so far that access encryption is a very good technique to avoid illegitimate access to the database. But as the time progress, there have been case when people have successfully breached the access control mechanisms and a huge lot of sensitive information has been compromised. Adding to the problem is the new concept of mobile databases and in recent time most of the database breaches were linked to mobile database in one or the other way, the most common example being lost or stolen mobile data storage. This does give importance to the on growing demand of encrypting the data within the database as an extra layer of defense so that in case if someone internal or external breaches the access control and get access to the data, there is still an extra layer of security protecting that data from being completely compromised.

Another factor which still stands in the way of Database Encryption being used on a wide scale is the performance statistics. There is not enough data available out there which can prove that there won't be enough performance decline, if database encryption be used. And if there is, to the likes of ORACLE and Microsoft giving the performance statistics, then we do not sufficient amount of proof for that in regards to how these statistics were generated and so on, and even these vendors are still not able to prove that how much effective the statistics being provided by them are in the enterprise worlds involving huge databases.

Chapter - 3

Research Methodology

Research approach selection is a very important aspect of a good research. There are various research approach techniques out there in the computing world, and can be classified in several ways as well, but the most common classification is discussing them as qualitative and quantitative research. Qualitative research involves the use of qualitative data, such as interviews, documents, and participant observation data, to understand and explain social phenomena. In qualitative research, researchers gather insight or knowledge about a topic in an attempt to understand perceptions, attitudes, and reasoning behind actions. Using qualitative methods, researchers are able to understand not only what is happening but more importantly why. Qualitative research is particularly useful for determining the opinions and attitudes of research participants, understanding how specific groups construct their sense of social reality, and discovering the reasons rather than the causes for these opinions. [22]

The common research methods which are classified as qualitative are: Action research, Case Study Research, Ethnography and Grounded Theory. I will be discussing Case Study in brief as this is the one which is relevant to my research. The term "case study" can be used to describe a unit of analysis (e.g. a case study of a particular organization) or to describe a research method. Case study research is the most commonly used qualitative research method in information systems. Yin (2002) [24] defines the scope of a case study as, "A case study is an empirical inquiry which investigates a contemporary phenomenon within its real-life context, especially when the boundaries between phenomenon and context are not clearly evident".

Quantitative research is a set of methods and techniques which allow researchers to answer research questions about the interaction of humans and computers. There are two key elements in this approach to research; emphasis on quantitative data and emphasis on positivist philosophy. Statistical tools and packages are an essential element in quantitative research because of the quantities being very predominant in this type of research. Quantities are the numbers which represent values and levels of theoretical constructs and concepts and this interpretation of the numbers is considered as strong evidence for how a phenomenon works. The second key element 'emphasis on positivist philosophy' also depends on numerical analysis and can be defined as the researcher having the belief that a scientific theory as one that can be falsified. Examples of quantitative methods include survey methods, laboratory experiments, formal methods (e.g. econometrics) and numerical methods such as mathematical modeling. [23]

Although most researchers do either quantitative or qualitative research work, but it has been suggested by some that these two can be combined resulting in one or more research methods in a single research and called the combination as triangulation.

I have used the case study research as the research approach for my literature review. According to the definition of case study research, it is particularly well-suited to Information System research, since the object of our discipline is the study of information systems in organizations, which is the case in my research regarding the literature review i.e. studying security technologies regarding databases in enterprise organizations involving encryption. This includes reading of different books, journals regarding database encryption, discussions with dissertation guide, teachers and student who are interested in this subject area. I have also studied/monitored a real time enterprise database with some encryption done on it in an enterprise organization (not named because of the ethical reasons) in order to understand the concepts in the real time world more closely. I had worked for this organization in the past, also had discussion with the employees there who work on the database side on day-to-day basis regarding the prospects of introducing encryption as an extra layer of defense in cases of an enterprise database system with regards to performance and cost.

Chapter - 4

Proposed Approach

In the proposed approach, a test strategy will be devised using the test driven database development approach which I will using to derive the results for supporting my argument i.e. "Is it worth to implement database encryption (partial/whole) as an extra layer of defense in an enterprise database environment in regards to performance and cost.

Again as compared to the approaches/techniques for carrying out the research for literature review, there are various approaches available to carry out this tests/project. Test driven development (TDD) is part of a larger group of development methodologies referred to as agile software development. The reasons for this method to appear attractive to the developers over the traditional methods like waterfall development are the advantages offered by TDD over such methods. Some of these advantages are: reduced code complexity, greater number of fulfilled customer requirements, reduced test and debug time at the end of the development cycle and improved black box testing. [25]

In the database world the concept of Test Driven Database Development (TDDD) is fairly new and is based same principles as test driven software development. It has also been described as a way of managing fear during development and can be defined as "a programming methodology which provides an iterative design cycle with integrated testing at each step", leading to reduced cost and time associated with end of cycle testing as compared to the traditional waterfall method of development. The following figures shows an example of how TDD works: [26]

Test-Driven Development [26, 28]

These can be explained in term of the following steps:

  • Add a test.
  • Run your test to ensure it fails.
  • Update the functional code so that it passes the new tests
  • Run the test again, and if fails, repeat the previous step again i.e. update the code.
  • If it is a success, remove duplications and tidy up the final functional code or specifications. [27]

After careful reading of few journals on TDD and other traditional development techniques, I was able to design a test strategy based on one of the examples I came across in a TDD journal [26] as shown below in the figure:

My test strategy based on the above example looks like this:



Dataset and Usage


Performance Criteria




Query Time










Unencrypted Data

Large Dataset

Encrypted Data

Small Dataset

Unencrypted and Encrypted data

CPU usage

The approach is to create a database system using Oracle 11g Enterprise Edition Database with different tables in it of varying sizes which will fulfill our criteria of having different datasets. The next step is to write a set of SQL queries either using the command line or the SQL Developer. The complexity of the SQL queries will vary and will be written with the fact that the data the SQL queries will be fetching or playing with be from different tables which can be either encrypted, unencrypted or both, in mind. Then these queries will be run against different datasets as specified in the table above and on environments with different CPU usages. I won't be able to check the CPU usage criteria properly as I am doing all this testing on my laptop which is in reality is not a real-time high CPU usage enterprise environment. In order to overcome or substantiate this issue and to propose results for high CPU usage I will use biggest dataset available to me and while running the queries against it, will run several other processes at the same time, so that we can check the effect of change in CPU usage. This process will be repeated 3 times, once for unencrypted data, once for encrypted data and once for the dataset involving data from both the encrypted and unencrypted tables.

These queries will be tested on these datasets against two criteria's:

û The first is whether the data is encrypted or not

û And the second one is performance which is further divided into various sub-categories like cost, cardinality, projection, time taken to run the query in conjunction with the first criteria.


  • [1] Connolly and Begg - see book for reference

    [2] Knox, David (2004), Effective Oracle Database 10g Security by Design, McGraw-Hill.

    [3] Imperva White Paper…see hard copy for reference

    [4] 2008 CSI Computer Crime and Security Survey, by Robert Richardson

    [5] Data Encryption: Mixing Up the Message in the Name of Security - see PDF for reference

    [6] Internet Security - Cryptographic Principles, Algorithms and Protocols by Man Young Rhee - see book for reference

    [7] Cryptography and Network Security - Principles and Practice 4th Edition by William Stallings - see book for reference

    [8] Exploiting Hierarchical Identity-Based Encryption for Access Control to Pervasive Computing Information (2005)

    [9] L. Bauer, M. A. Schneider, and E. W. Felten. A General and Flexible Access-Control System for theWeb (2002)

    [10] I. Ray, I. Ray, and N. Narasimhamurthi. A Cryptographic Solution to Implement Access control in a Hierarchy and More, June (2002)

    [11] Identity-Based Encryption from the Weil Pairing (2003)

    [12] Ran Canetti, Shai Halevi, Jonathan Katz, A Forward-Secure Public-Key Encryption Scheme

    [13] Vipul Goyal, Omkant Pandey, Amit Sahai, Brent Waters, Attribute-Based Encryption for Fine-Grained Access Control of Encrypted Data

    [14] Amit Sahai and Brent Waters. Fuzzy Identity Based Encryption. In Advances in Cryptology - 2005

    [15] Balancing Confidentiality and Efficiency in Untrusted Relational DBMSs - 2003

    [16] Modeling and Assessing Inference Exposure in Encrypted Databases - 2005

    [17] [Davida et al. 1981; Hacig ¨um¨ us and Mehrotra 2004]

    [18] An enterprise policy-based security protocol for protecting relational database network objects - pdf






    [24] Yin, R. K. Case Study Research, Design and Methods, 3rd ed. Newbury Park, Sage Publications, 2002.


    [26] Test-Driven Development Concepts, Taxonomy, and Future Direction - pdf

    [27] index

    [28] Test-Driven Development - pdf for TDD diagram

  • Get in Touch With us

    Get in touch with our dedicated team to discuss about your requirements in detail. We are here to help you our best in any way. If you are unsure about what you exactly need, please complete the short enquiry form below and we will get back to you with quote as soon as possible.