The History About The Google File System

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

A Distributed - File System (DFS) is a file system model that is distributed among/across multiple workstations and machines. The purpose of these file systems is to share the dispersed files across the platforms within a network securely and fast. All the resources that are in the machine itself is local to itself, whereas resources that are placed on some other machine(s) are remote. A file system provides a service for clients. The server interface is the normal set of file operations: create, read, etc. on files.

Servers, clients and storages are usually dispersed across the different machines which also means that the implementation and configurations of these would also be different and vary platform and usage vise. In many scenarios servers and clients are on different machines but they can also be on the same machines that depends on the requirements of the users. The file system sharing is an important aspect here since it is required to share files among the different machines when and where required.

Distributed file systems are designed to support client/server architecture(s). Considering implementation view, distributed file systems are not managed as local component or dish, but is managed through network in distribution via remote systems. Also the files that are stored in the distributed file systems are not being stored in just single machine but is being stored in cluster of the storage i.e. distributed storage called as distributed file systems (Chen, Li and Gou, 2011).

Because of many reasons distributed file systems are successful like they provide the capability of sharing among different machines and users for information between users. Second, they also increase the mobility for the individual users by facilitating access to data anywhere and anytime. Thirdly, they provide facility to extend allowing the storage to upgraded in cost effective way. At the end they also simplify the management and administration of users and big count of machines (Howard et. al., 1988).

Fault tolerance with the network storage and resource(s) is a important characteristic of Distributed File System (Microsoft, 2000). Distributed File System is useful due to many reasons like it provides a common view to the centralized file system that is distributed over the network. It also provides the ability for opening and updating any file(s) on any of the machine placed on the network. It also provides other facilities like sharing data within multiple users, mobility of user, transparency of the location and its independence, along with synchronization and back up capability. Distributed file systems are implemented to make the data locations scalable, if users need high-data processing speeds, reliability, performance, availability, along with the fault-tolerant capability (Adamov, 2012).

Figure 1: NFS architecture

Figure 1 is shows an NFS client server model. Shared space and file system is implemented by the server and client mount the user interface required to access the shared file system on the local system.

Network file system is used to share data on the network. It provides some benefits over other file systems and make the data access from the remote computer as easy as to access from the local system. Some of the key benefits of Network file system are:

1. It store the common data on a single machine and other users on the network can access the data from that machine. So the local system uses less space.

2. It sets up the home directories on the network server and machines on the network can access it from the server.

3. It allows the use of storage devices over the network so need of using removable drives is reduced.

NFS file server requires to run a few commands on client and server side for remote access of data between server and client.

Nfsiod command runs on client side to manage the requests from the server. Nfsd command is run on the server side to process the NFS client requests. Request processed by nfsd is sent to mount command. Rpcbind command gives the information about port used by the NFS server (Free BSD, Anon).

II. AIM & OBJECTIVES

Problem Statement: There are many file systems available that can be implemented for file sharing systems across the network or different platforms. All the file systems follow a different approach and implementation method resulting into different security, performance and access approaches and issues. This problem demands for a critical evaluation of these distributed file systems which would give the readers option to choose the best and favorable file system they can use for their specific environment and requirements.

Aim of the project: This projects aims to do the comparison study of 3 different files system called Samba, NFS and SSH FS and do their critical evaluation.

Objectives:

Research on the different file systems and methodology.

Research about the different features, perform comparative and performance analysis of the different distributed file systems.

Perform implementation of the 3 file systems Samba, NFS and SSH FS on Linux servers.

Perform experiments to do the critical analysis of the file systems for their performance, application and security.

Evaluate and conclude the research study on the different distributed file systems.

III. LITERATURE REVIEW

There are many distributed file systems that are implemented and being used all over the world. Some of the distributed file systems are as follows,

Google file system (GFS) is a distributed file system that is owned by the Google Incorporation for use in its own environment. This file system is designed for providing reliable and efficient access to the data and is usually used for the search engine that can generate bulk of data which is to be retained. GFS cluster has multiple nodes that is divided into master nodes and many chunk servers. Each file is considered to be a chunk and is assigned with 64-bit label by master node during the creation of the file that is then maintained after logical mappings of the file. Every chunk of file is replicated 3 times over the entire network and can also be done more times when more redundancy is required (Tan et.al, 2009).

Gluster file system (Gluster FS) is an open source distributed file system. This file system includes varied servers for storage over infiniband RDMA or ethernet into a parallel and single network - file system. It is based on user space and also includes support for cloud computing. The Gluster File System is a simple file system having intense features like a user can export existing directory as it is and then also allows the translators in client side to structurize the store. Latest version of Gluster File system allows the capability to add volumes, delete, or even migrate dynamically that helps avoid any issues related to configuration. GlusterFS also allows to scale up the hardware and avoids any bottlenecks which can impact tightly coupled file system (Gluster, Inc., 2012)

Andrew File System (AFS) is another distributed file system over the network that uses trusted servers to represent a homogenous name space for file system to client workstations. This file system has many benefits as compared to the distributed file systems mainly in areas of scalability and security. This file system provides various features like authentication, access lists for reading, writing, inserting, deleting, looking up or locking the file to system and managing logical volumes (Howard, 1988).

Server Message Block (SMB) is a Microsoft protocol delivered by Windows OS to share printers and files which forms a base for Network Neighborhood. It is functional in Application layer i.e. over TCP/UDP/IP and SMB 2.1 was introduced with Server 2008 R2 and Windows 7 with few performance enhancements and new locking mechanisms (SNIA, 2012)

Common Internet file system (CIFS) is a protocol for file sharing which is used by the client systems to access the file services from servers over the network. CIFS is now a day's a key file to share protocols and enhancements in it has also improved its suitability for file sharing and Internet authoring (Samba, 2012)

IV. PROJECT PLAN

This is the project plan for the research and implementation that will be followed during my MSc project. Given below is the tentative effort estimate and projection of the different tasks that will be followed up during my experiments which also includes the documentation and submission of the tasks. This project plan will be followed to accomplish the tasks of the project and is expected to complete by 24th May 2013.

V. THESIS REPORT AND ORGANISATION

Chapter-1: Introduction

This chapter will have the introduction of the topic that would cover different types of distributed file systems available as of now. This introduction would give an brief overview about the distributed file systems that will help the readers to know and understand about the basic use and necessity of the distributed file systems. This chapter would also reflect the objectives of the research, its purpose, problem statement, which will then be broken down into specific and generic objectives to achieve the desired aim of the project.

Chapter-2: Literature Review

This chapter will have the review of the previous work that was done before in the distributed file systems. Most of the research will be done on the open source distributed file systems. This research will reflect the previous work that is done so far in the distributed file systems demonstrating their strengths and weaknesses. I will also select few distributed file systems on which I will focus my detailed study.

Chaper-3: Design of the System

This chapter will have the design of the system which will be later on implemented in the virtual environments. This design will be a high level demonstration or blue print of the system that will be implemented and tested. This chapter would also reflect the different tools and procedures that would be used during the implementation of the systems.

Chapter-4: Implementation of the System

This chapter will have the implementation method and high level steps for performing the different type of implementations. This chapter would also reflect the settings used during the deployments and in which conditions and environments they are implemented giving a brief overview of all the tools and methods used for the implementation of the system.

Chapter-5: Experimentation

This chapter will show the reflection of the experiments that were performed on the implemented systems based on the performance evaluation and other factors. This chapter will also show the conditions or isolated environments in which the implementations were performed.

Chapter-6: Analysis and evaluation of the experimental results

This chapter will show critical analysis on the findings during the experimentation demonstrated by numerical data and analysis. These experiments will be performed on the isolated environments and the documented results during the experimentation will then be analyzed in this chapter to conclude the final results.

Chapter 7: Conclusions and future work

This is the final chapter of the thesis that would reflect the conclusion of the work done and what can be done in the future to carry on the research work.

VI. REFERENCES

Adamov, A. (2012). Distributed File System as a basis of Data-Intensive Computing. 6th International Conference on Application of Information and Communication Technologies. Pages: 1–3.

Chen, P., Li, J. and Gou, X. (2011). Research of Distributed File System Based on Massive Resources and Application in the Network Teaching System. International Conference on Advanced Intelligence and Awareness Internet. Pages: 154-158.

Gluster, Inc. (2012). Gluster FS Concepts. Available at: http://www.gluster.org/community/documentation/index.php/GlusterFS_Concepts

Howard, J. H., Kazar, M. L., Menees, S. G., Nichols, D. A., Satyanarayanan, M., Sidebotham, R. N., and West, M. J. (1988). Performance in a Distributed File System. ACM Transactions on Computer Systems 6. Pages: 51–81.

Howard, J.H. (1988). An overview of Andrew file system. Available at: http://ra.adm.cs.cmu.edu/anon/usr/ftp/itc/CMU-ITC-062.pdf

Microsoft. 2000. Step-by-Step Guide to Distributed File System, Microsoft. Available at:

http://technet.microsoft.com/en-us/library/bb727150.aspx.

Tan, J., Pan, X., Kavulya, S., Gandhi, R. and Narasimhan, P. 2011. Mochi: Visual Log-Analysis Based Tools for Debugging Hadoop. Electrical & Computer Engineering Department. Available at: http://static.usenix.org/event/hotcloud09/tech/full_papers/tan.pdf

SNIA, 2012. SMB remote file protocol. Available at: http://www.snia.org/sites/default/education/tutorials/2012/fall/file/JoseBarreto_SMB3_Remote_File_Protocol_revision.pdf

Samba, 2012. Available at: http://www.samba.org/samba/docs/

Free BSD. Available at: http://www.freebsd.org/doc/en/books/handbook/network-nfs.html

VI. APPENDIX

a. Proposal document

b. Ethics form



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now