Nosql Data Storage Techniques

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Abstract

In this paper I want to discuss about how modeling of object dependencies are varying in nosql and sql under Open-source technologies and how the functionalities are works under these technologies. Modeling of object dependencies are considered as Performance, scalability. Where Performance includes the high output (result) and scalability includes the quality of output, it uses the number of resources to achieve better output. In present world big data is the key point to database systems. When we comparing these two data bases in terms of performance and scalability there are lot of variations are around it. Nosql and sql includes the lot of Open source technologies to check performance and scalability. Presently we have lot Open source technologies for these both database systems. Nosql includes Open-sources like Cassandra, HP BASE, Mango DB, Accumulo, HPP, Amazon simple DB, Hyper table, Stratosphere…etc. Sql includes Open source like VoltDB, NuoDB, Sensei DB, Genie DB, ScalArc, Scale DB, Sql, MySQL cluster…etc. Performance and scalability are measured under cpu work load data transactions, read and Write times. Each Open source technologies are performed differently but it gives high performance and scalability. Open source is the user friendly source code and it can be used by anyone without paying any fees.

Introduction

No sql and sql both are the databases, those are used in different platforms to store the large data .For many years lot of organizations are depended on sql data base but in present scenario technology has been changed a lot and now organizations going with nosql.

Sql is a structured query language used to database manipulation. It includes the data insert, create, delete and many other factors to set data in order to storing data. It follows the data definition language and data manipulation language.

Nosql it has no structure and "NoSQL involves and the way it enables large data sets to be spread across racks of cheap commodity servers, allowing for almost infinite scalability and processing power. Notes that NoSQL applications are still relatively few and small, but they are becoming more widely used in enterprise, government telecommunications companies (telcos), and banks"(Leonard J 2012 P20-23).

Mainly Database management system (DBMS) is depends on ODBC or JDBC.

1. ODBC: Open database connectivity is a standard C language middleware API for accessing database. If application writes using ODBC it can be ported to other flat forms also.

2. JDBC: Java database connectivity is standard java language middleware API for accessing rational database.

The focus of this literature review is to critically analyze how the performance, scalability factors are varying between nosql Open-source technology and sql technology, how these databases are works under different flat forms and what techniques are used to store large data. It analyzes the open source technologies and how those are working under different user conditions. For my literature review I discussed about two sql technologies and two Nosql technologies. This Literature review is organized as follows 1.0 sql data storage techniques 2.0 nosql data storage techniques 3.0 Comparisions between sql and nosql.4.0 conclusion 5.0 Bibliography.

sql data storage techniques:

The spread of dynamic websites on the World Wide Web today is largely due to the possibility for their content to be handled through databases. Database management is a complicated process, which has been considerably rationalized by the SQL programming language. As its full name (Structured Query Language) implies, SQL is responsible for querying and editing information stored in a certain database management system (http://www.ntchosting.com).

Most of the organizations are depended on SQL database for many years, it is very is to construct and easy to work.

Sql database systems are mostly depends on ACID theorem and it stands for

A stands for Atomicity: Automicity means indivisibility and irreducibility.

C stands for consistency: In each instance database changes and provides correct results continuously.

I stands for Isolation: Its function is to tell when and how the changes are made in database.

D stands for Durability: Transactions are set permanently in the database, even if the system crashes transaction will remain same.

There are number of technologies for sql: I considerd Mysql cluster and voltdb technologies.

Mysql cluster: Is the part of mysql database technology.

"MySQL Cluster shards data over multiple database servers (a "shared nothing" architecture). Every shard is replicated, to support recovery. Bi-directional geographic replication is also supported"( Rick Cattell,2010).

MySQL Cluster Components

-( dev.mysql.com)

All the data has to be stored in data nodes and it connects sql nodes. Data nodes are inter related to NDb Management server to maintain the data. Other factors are used to write the data. When the data fails in sql nodes it automatically generates the back up from data nodes in this case performance is 100% due to from back up facility. In this diagram each component has its own memory.

Data can be accessed from client side by using PHP, C, Java..etc.

NDB is the Network database it should be consists at both server and client side.

VoltDB: Is the new-open source technology and it is a relational data base system, it gives high throughput.

Minhas, Umar Farooq et..al(2012) are defined about Voltdb in their paper accordingly "VoltDB has been designed to provide very high throughput and fault tolerance for transactional workloads". Voltdb has design choices like

1. All the data can be stored in a Main memory, it should be avoids slow disks operations.

2. Transactions are done in the server side and stored procedures can be done at server side.

3. Transactions are executed at the each database partition, is there are no single transactions at the single partition.

4. These type of partitioning provides durability, fault tolerance.

Voltdb supports two types of transactions those are:

1. Single partition transactions: It supports only one partition at the database hence it is very fast.

2. Multi partition transactions: it takes data from more than one database so transactions speed is low.

Voltdb always creates k+1 instances for k inputs as the results shows as no failures.

How to increase the performance in the Voltdb:

Increase the size of cluster

Database information can be transferred between the nodes.

Nosql data storage techniques

This is the definition about nosql from (www.oracle.com)"The recent launch of Oracle NoSQL Database has further spurred interest and excitement. Oracle NoSQL Database is a horizontally scalable key-value database. Built by the acclaimed Berkley DB team, it features excellent performance, tunable consistency, integration with Hadoop, with a simple but powerful client API".

Nosql mainly works on Cap and BASE theorems.

CAP theorem: defined by Brewerin the year 2000..

Seth Gilbert and Nancy A. Lynch discussed about "the CAP Theorem can be stated as follows: In a network subject to communication failures, it is impossible for any web service to implement an atomic read/write shared memory that guarantees a response to every request.".

C indicates the consistency: it defines a system how the performance is going under long time period. When a client send request to server it has be generates the output immediately.

A indicates the Availability: whether the system available for all the times or not.

P indicates the Partition Tolerance: When the system servers are partitioned inti multiple disks there should be a communication delay is possible between each disk.

Base theorem:

BA stands for Basic Availability.

S stands for Soft-state.

E stands for Eventual Consistency.

Michael stonebraker has explained about nosql in his paper like Nosql considerd as to work under OLTP technology. OLTP means Online Transaction Processing. This technology mainly used in Banking sectors, Railway sectors, Super markets..etc. There are two ways to improve OLTP performance, first one is automatic sharding over a shared nothing processing and second one is improve server OLTP performance.

OLTP time would be depends on four factors

1. Logging: It is a process to entering into data.

2. Locking: It is a process to set data base lock after completion of work.

3. Latching: it is a process to update system and data in all the disks.

4. Buffer Management: Instructions can get results very fast with buffer management because of it store the value in cache memory.

Nosql performance can be increases in two ways

1. Number of Nodes are add to computing

2. Increse the performance of the each-Node

Guy Harrison(2011) disussed about oracle Nosql in his paper like "Oracle NoSQL is a distributed key-value store: values written to nodes in a cluster based on a hash of the key value. Unlike some NoSQL databases, there is no support for a partitioning scheme that allows adjacent keys to be located on the same node. However, Oracle NoSQL supports the concept of major and minor key paths:

A major key may have sub-keys all stored on the same node. These may be used to optimize the retrieval of master detail records".

Oracle Nosql mainly look over the big data process in order to achive the large data over the systems.

Cassandra:

 Bagade, Prasanna..et al(2012) defines "Cassandra is NoSQL distributed database system which is known for managing large amount of distributed data. It provides high availability without single point of failure, the reason behind this is that it treats failure of node as norm rather than exception. It is also famous for high write throughput without harming read efficiency".

Cassandra is the leading new technology in nosql, it is very is to configure the distributed database. It is column oriented structure, It partitions the data in clusters like Random partitioning, Old preventing partition.

Cassandra follows the SEDA architecture, in this process transactions are made by queues it also called as thread pool. It operates active transactions first and later it finishes the waiting transactions hence it provides the high performance.

Cassandra uses the Node tool to perform the resources like CPU utilization, memory statistics, column family graphs.

Jason brooks (2011) also mentioned about oracle Nosql in his paper like "a key-value data store on which Oracle has layered services supporting scale out over large numbers of nodes". Here key value function is main asset.

Again Jason brooks(2011) talks about Nosql " NoSQL refers to a broad class of key-value stores, document-oriented databases, columnar databases and graph databases—each with its own data models, scaling strategies and use cases".

So Cassandra includes key value, document, column, graph databases.

Cassandra Key-value storage functioning: it is the alternative for relational database system.

Indexing technique for key value storage: A unique node is assigned to each index node and it is accessed by API. VIX (value embedded index) and RIX (reference embedded index) both are used to storing the key-value.

Erik meijer and Gavin bierman (2011) discussed about key-value storage and relational tables like "While we don’t often think of it this way, the RAM for storing object graphs is actually a key-value store where keys are addresses (l-values) and values are the data stored at some address in memory (R-values).

Apache Hbase: This is an open-source technology; it is used when we handling random read/write big data and it is column oriented. It operates big tables we can create billions of tables.

Features

Linear and modular scalability

Strictly consistent reads and writes

Automatic and configurable sharding of tables

Automatic failover support between RegionServers

Easy to use Java API for client access

Query predicate push down via server side Filters

- (hbase.apache.org)

Mainly hbase is used when there is huge data is to be store.

Mehul Nalin Vora (2011) defined about hbase like "HBase, an Apache open-source project, is a distributed fault-tolerant and highly scalable, column-oriented, noSQL database built on top of HDFS".

Mehul Nalin Vora discussed in his paper about Hbase performance by comparing with sql and mysql. Data should be stored in the form of image files on the hdfs, location of the hdfs stored in hbase and mysql. This process impacts on the performance because of query searching in two flatforms.

Hbase Performance and response times are depends on

random read

random write

Equal read and write

Heavy read and heavy write

Hence performance is changed under these conditions.

Comparisons between SQL and NOSQl:

SQL:

SQl performance depends on the different comparative factors.

1.Query Translation time

Number of storages

Number of translation algorithms

Query types

Network environment

2.Query Transmission

Time

Number of storages

Query type

Network environment

Storage structure

Size of data set

- Jiseong Son, Jeong-Dong Kim et al (2011).

From the table it analyzes that performance and scalability depends on the query translation and transmission times, these are varies according to user inputs.

Sql performance depends on the:

Storage dependent systems: here data should be stored in multiple disks then when a query passed from the user it takes more time to produce output.

Storage independent systems: here data should be stored in single disk then user can get fast output.

Then there is no transmission delay to produce output

When a query passed in both dependent and independent systems output could be varies.

NOSQL:

Nosql supports the four types of component systems

Column store/column family

Document store

Key value

Eventual consistent key value store

1.Cassandra

2.Hypertable

3.Cloud data

4.Amazon simple dB

1.couchdb

2.Mangodb

3.Sisodb

4.Ravendb

1.Azure table storage

2.Geniedb

3.Hampsterdb

4.clouddb

1.Mangodb

2.Dovetaildb

3.Amazon dynamo

4.Voldemart

- Tudorica, Bogdan George et al (2011)

From the table it analyzes that Nosql performance and scalability depends on the four factors, column family, document store, key value, eventual consistent key value store. These factors changed in the each open source technology but when we operating those technologies we can get difference performance and scalability.

From the both databases it analyzes that performance and scalability is changes accordingly within the systems.

Conclusion:

In this paper I discussed various SQL and NOSQL technologies and their performance and scalability variations. Yes modeling of object dependencies are varied in SQL and NOSQL environment. In SQl environment performance can be varied accordingly query translation time and transmission time. In NOSQL environment four factors are shows the impact on performance and scalability.

When we comparing both Sql and Nosql environments user can



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now