Components Of A Data Virtualization System

Published Date: 02 Nov 2017

This report provides a review of the literature and secondary data that already exists regarding data virtualization and its benefits. Accordingly this report initially provides the components of a data virtualization system; it will then move on to describe the history and evolution of data virtualization in order to understand some of the benefits of data virtualization in order to comprehend the cases, that will be presented, where organizations have implemented data virtualization in order to help fix certain issues in their companies. Finally there will be a critical analyses of how implementing data virtualization and several organizations and a comparison between the benefits.

Definitions and Components of a Data Virtualization System

Data Virtualization is defined by Rick van der Lans (Clearly Defining Data Virtualization, 2010) as â€˜the process of offering data consumers a data access interface that hides the technical aspects of stored data, such as location, storage structure, API, access language, and storage technology. Data virtualization provides an abstraction layer that data consumers can use to access data in a consistent manner. A data consumer can be any application retrieving or manipulating data, such as a reporting or data entry application. This abstraction layer hides all the technical aspects of data storage. The applications donâ€™t have to know where all the data has been stored physically, where the database servers run, what the source API and database language is, and so on.â€™

Data Virtualization can be thought of as a technique that allow clients to combine several types of data, whether its data format or structure, to a single "virtual" data layer that help provide integrated data services to a variety of consuming applications or users in real-time. Data Integration, using data virtualization, can be done in a few steps: connect and virtualize in a variety of data types or sources, combine and integrate into virtual use of data and finally publish them as data services. Without any of these steps, data virtualization cannot be done.

The architecture of data virtualization is easier to deal with than the existing integration data systems. The data virtualization system or platform sits between a wide range of data consumers (business applications - mobile, web, users, enterprise applications, SQL clients etc.) and data sources (formats â€“ excel, PDF, docs, emails etc.) and attempts to consolidate the data into a common view thus providing a virtual data layer. Getting the data from different data sources to different data consumers is often times a troublesome task and thatâ€™s where data virtualization comes in. Using wrapper libraries (contains a layer of code which helps translate an existing libraryâ€™s interface to a compatible interface) it is able to read and write data to all of the major applications and data sources inside the enterprise. Web automation provides the ability to read and write data in the public web and certain web application that provide an application programming interface (API â€“ requirement intended to be used as an interface by software packages to communicate with each other). By combining these technologies we get a normalized structure of all this information or data and as mentioned before can be combined, improve quality, transformed, integrated and published as data services in multiple formats such as RSS feeds, widgets, web service etc. In run time these applications call these data services in their favorite format. The data virtualization system uses a set of optimization tools including intelligent caching (enhance user experience and stores data to load up quicker) and preloads of data to fetch the information from multiple sources. It is also worth mentioning that the data virtualization system provides data management tools such as monitoring data, security and authentication and metadata stores (data about data - how, when and by whom a specific data was collected and how it has been altered).

Figure : Data Virtualization (adapted from Davis and Eve, 2011)I:\DV.png

There are several mechanism that access data from the various data sources such as Open Database Connectivity (ODBC â€“ developed by SQL Access group and provides access to any data from any software or application) and Java Database Connectivity (JDBC â€“ an API specification used for connecting programs written inÂ Java languageÂ to the data in databases). Similarly, delivering the data uses the same mechanisms that accessing data has including Web Services (a server designed to support interoperable machine-to-machine communications over a network ).

In general and in a simple way, data virtualization is about accessing data from the various data sources, combing the data if necessary, and delivering it to the business applications, that manages a business, that need it. It is about pulling data or information about new key assets and delivering it to the applications that need it to make the right business decisions that help to run the business. There are several benefits for using this approach for data integration such as fresh data, less replication involved, quicker time to solution, easy to use and overall the cost of data virtualization is lower than alternatives. Not to be confused with other data integration systems, data virtualization is all about providing data on demand and delivering it directly to the applications that need it.

The History and Evolution Behind Data Virtualization

Before understanding the pursuit of business agility and what research and organization have to say about data virtualization, it is important to understand the history behind it.

The term data virtualization is not of old age. The first time it was mentioned is unclear. To present an idea of how data virtualization started, we must take account of the history of data federation servers as well as distributed databases, XSLT, and XQuery due to the fact that many technologies contribute to its development for the reason that data virtualization products are very rich in its benefits.

One of the key features of data virtualization is data federation technology. Data federation involves being able to merge data from a varied set of data stores. IBM's DataJoiner (1990) and information Builder's (1991) EDA/SQL (Data Access, Enterprise) can be claimed as the first products to be Dedicated Data federation servers. They were primarily used for integrating data from different types of data sources - not to be confused with actual database servers. Both products were the first to be able to access database not linked to SQL, however they were still able to access SQL databases. They have gone through several name changes and have matured. Currently, IBM's DataJoiner is now named Infosphere Federation Server and Information Builder's EDA/SQL is now called iWay Data Hub and is now part of a suite.

Data federation technology was first put into practice in distributed database servers. Several independent database servers can operate as one logical database in a distributed database server. Its main task is to make many databases look like one large usable database. This can help consumers to enter an inquiry where multiple tables are managed by different types of database servers that are joined. Data federation technology was implemented in order to join data or information from several different types of database servers. According to van der Lans (2011), such a distributed join must be processed efficiently. Networks were slow compared to today and therefore the majority of the research was focused on reducing the amount of networking traffic, when these products were released. The first commercially used database servers that supported distributed server joins were first released in the 1980s which included Oracle and Ingres.

Data federations initial research was done by IBM which started in 1979 according to Van der Lans (2011). IBM's System called System R was the birthplace of SQL which led to the development of the most commercially used SQL database servers. System R* project was a project later made by IBM after the System R plan which included extensive research on data federations. Van der Lans (2011: 75) states that 'the goal of the project was to implement a distributed database servers.' The Ingres Project was another project, other than IBMs famous System R* project, to have been heavily contributed research on distributed databases and had helped to create Ingres - an open source SQL database server. Today, the products of Data Virtualization strongly inherit from distributed database technologies which were the first products to initially implement data federation into their systems.

The success of Extensible Markup Language (XML) has led to the increase in data, on the internet and in organizations, to be accessible in the form of XML documents. To acquire an understanding of this language: 'XML is a flexible way to create common information formats and share both the format and the data on the World Wide Web, intranets, and elsewhere.' as stated by Margaret Rouse (What is XML ?, 2007). In the year 2000, Extensible Stylesheet Language Transformations (XSLT), the standard and powerful language, was used to alter the form of XML documents. XSLT is now a recognized language and has been applied by many vendors in a variety of products. XSLT is a language needed for structuring the hierarchical forms of XML data and documents and assigning a hierarchical form to perspective tables due to the fact that data virtualization systems have to be able to alter data and information that is designed with XML. Data Virtualization uses all the research and practice done in this area for its benefits.

XQuery, released in 2001, is a programming language used to insert, update, delete and query compilations of XML documents. XQuery is a much more powerful language compared to XSLT due to that fact that, among several other features, it selects and extracts the essentials from XML documents, enables to join XML documents and to merge relational data with XML data. Respectively, it is the most comparable to SQL for the reason that one of the main designers of SQL is also one of the designers of XQuery. Because of this, SQL database systems support XQuery. Again, Data virtualization uses all the research and practice that has gone into combining XQuery and SQL and uses the development and make it beneficial to the system.

There are other forms of technologies that contribute to the development of data virtualization but the main contributors, and is a must to be mentioned, are data federation servers, distributed databases, XSLT, and XQuery. As we now know, data virtualization systems was once called data federation systems in the 1990s and was initially introduced by Information Builders and IBM. Many vendors entered the market after the year 2000. The Composite Data Virtualization Platform and Denodo Platform were both introduced in 2002 as data virtualization systems and in 2010 more products were released.

Data virtualization was once seen as a technology for only solving a specific problem. It was once considered to be a non strategic technology by most companies. Organizations never saw its true potential and the beneficial features it had. This is one of the areas of challenge that this research project seeks to address. Recently, around the year 2008, this had all changed. Business Intelligence (BI) experts started to realize how beneficial the product is. Organizations were looking for better data integration systems and have found data virtualization. It is currently being used many companies and is now a necessity and is a great alternative to other data integration systems and has reached a high maturity level.

Data Virtualization and the Pursuit of Business Agility

There are many cases where organizations have implemented data virtualization into their business and have achieved beneficial outcomes. This literature review will analyze these cases.

In an extensive study of the business problems of a company called Comcast Corporation - a leading media entertainment and communications organization, Davis and Eve (2011: 101) noted that 'Comcast recognized the potential value of creating a federated, or virtualized, data architecture to integrate the data.' The need for a data virtualization system grew as there were millions of Comcast customers who use their websites as a portal to the internet, to access many web sites and email and to manage their services online and products. The problem was that supporting this process involves, depending on what the consumer needs, several products and many data sources that have to be integrated in many ways. In late 2009, data virtualization was implemented and had several significant benefits including improved customer satisfaction, self-sufficiency and reduced support cost. Another study done by Davis and Eve (2011) of a pursuit of business agility by a company is the research of Qualcomm Incorporated. Qualcomm was founded in 1985 and is a global leader in next-generation mobile technologies. They manufacture chipsets, license technology and offers, primarily to the telecommunications industry, communication services. Qualcomm has been operating in a rapidly changing organizational environment due to the fact that they are a leading industry. The company started to realize that because the industry changes quickly, they need to constantly get things done much quicker in order to maintain their leadership position. In 2009, they realized that they needed data virtualization as a solution to get applications up and running, enable them to prototype and obtain feedback to customers much faster in a matter of weeks rather than months. They also realized that maintaining and monitoring all the systems they use, which contain significant amounts of data, and keeping them in synch was costly. Therefore, data virtualization was needed. As data virtualization was implemented, the company cited three major benefits: agility and speed of execution, reduced support costs and more efficient data management. Comcast and Qualcomm Incorporated implemented data virtualization and have both realized it as very beneficial to their organizations. It is vital to understand that by implementing data virtualization, they have gained many advantages and have acquired almost the same beneficial aspects.

Another data virtualization-enabled case study that describes how enterprises have successfully implemented data virtualization to increase their business agility is the cases of NYSE Euronext and Compassion. As stated by Davis and Eves (2011: 125), Compassion International is an organization that is responsible for helping children from poverty and enables them to become responsible Christian adults. They are the world's main Christian child sponsorship and development business and were founded in 1952. Compassion have realized that they need to update their information technology systems in order to achieve their goal of largely increasing the number of children enrolled over the next decade. This awareness happened in 2007 as the information systems used then could hardly meet their information requirements. In 2009, they implemented data virtualization in their company and have had many benefits. They have reduced the IT cost for data services, improved data quality and are now making agile business solutions. According to Davis and Eve (2011: 240) NYSE (New York Stock Exchange) Euronext, a world leader in most liquid equities and derivatives exchange, is a complex organization and has a complex operating environment due to the fact that it has gone through many mergers, trades 14 different equities with all kinds of different data structures and they also have to deal with massive data volumes every day. The company has decided to implement data virtualization for their business. Including the benefits mentioned in Compassion's company, Euronext has made several other more benefits. Through the implementation of data virtualization they have enhanced the agility of the business through a flexible data delivery infrastructure and have optimized the performance of its data delivery environment. The advantages of both businesses are the same and therefore it was a great decision to implement data virtualization for their organizations. While the book chapters used in this research agree that the organizations have implemented data virtualization, it is crucial to realize that the organizations share almost the same benefits as well.

Conclusion

In summary of this literature review, there has been a review of the background and evolution and how data virtualization functions and its benefits. It is clear that whilst data virtualization has many obvious benefits, adaptation of data virtualization is still in its early stages even if some of the leaders of certain industries have already implemented the system into their organizations. This review also briefly examines some of the cases where organizations have implemented data virtualization to help fix certain issues in their companies including NYSE Euronext (saved 4.5 million for a project alone), Qualcomm (saved $2.2 million across five initial projects), Compassion (achieved development cost reductions of 30 percent) and Comcast (saved 35% developing data services). It is vital to understand that this review is used to help understand the beneficial results obtained by data virtualization using several cases.

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now