Software System And The Source Of Information

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

The Google File System (GFS) is a distributed file system that allows the reliable access of files from multiple host computers through a computer network system. Designed for system-to-system interaction, it is primarily used for Google’s data storage platform and search engine’s usage. Some of the features of this proprietary system includes the recovery from component failures, efficient management of huge files, support for large streaming reads and concurrent large appends to the same file, as well as high sustained bandwidth.

The source of information for this software system is retrieved from the Google patent paper

(Sanjay Ghemawat, 2003). - http://www.cs.rochester.edu/meetings/sosp2003/papers/p125-ghemawat.pdf

2. THE ARCHITECTURE DIAGRAM

The Google File System (GFS) is organized into clusters of computers. Within each cluster, there are three types of nodes; the client node, the master node and the slave nodes (Chunk Servers). A client node can be any computer or computer application that makes a file request. Each cluster of the file system is coordinated and controlled by a single master node. The coordination between the master server and chunk servers are established via control lines between their manager components. The responsibilities of this master node includes the management of the file namespace and reports from chunk servers, chunk creations/removals and replications, as well as mapping the client’s request with the use of the file namespace and chunk list to the respective chunk servers. Each cluster also consists of a large number of chunk servers that are used to store distributed chunks of data in its database inventories. Each chunk has a fixed-sized of 64MB and can be identified by a unique 64-bit chunk handler assigned by the master server during its creation. The chunk is then replicated several times onto other chunk servers to increase the reliability of the system.

SSD:Users:etch:Desktop:GFS ArchitectureDrawingByAHBAO:Slide2.jpg

Figure 1 Google File System Architecture

3. QUALITY ATTRIBUTES ADDRESSED BY THE DESIGN

Scalability

As Google requires a file system platform to handle a significant amount of data, the Google File System must possess high scalability. This refers to the ease of adding storage capacity to the system without the limitation of performance or system design overhaul. With the above design, storage capacity can be easily increased through the introduction of new chunk servers.

Reliability

Reliability is always a concern when a heavily utilized system is involved. With the possession of such a large amount of data, the Google file system must maintain the data in the case of component failures. The above design achieved a high quality of this attribute by replicating chunks of data onto several other chunk servers. In the event that a chunk server fails, its data can still be found in the inventories of other chunk servers. Even the master server has replica servers (Shadow Masters) in the network cluster that are ready to take its place should it fail.

Availability

Together with the reliability concerns, the Google File System must maintain availability for the clients. The above design utilizes a large bandwidth requirement for read/write operations. This maximizes the number of clients that the system can serve. Operations to handle a request are distributed among the nodes in the system. The master server handles metadata operations on files and directs a client to the located chunk servers, where the files are directly retrieved. This efficient distribution of operations reduces bottleneck and increases the availability of the system.

Simplicity

The Google File System is a huge system and being simple can be beneficial when the system undergoes maintenance or addition of chunk servers. Using the single master paradigm, the above design simplifies the coordination between the many components in the architecture.

4. DESIGN STYLE USED IN THE DESIGN

The design style for the above architecture is single master, multiple chunk-servers. The master server acts as a coordinator for the cluster of computers and manages metadata describing the chunk servers. The master server also handles the file requests from clients and distributes the request operations to the respective chunk servers.

In our opinion, this design style can be represented as a heterogeneous design style that consists of a Client-Server and data-centralized architecture. The Google File System provides the service of file access to clients through a single master server. This is similar to that of a Client-Server design style where clients share the access to information resources. The operation tasks are distributed by the master server to the respective chunk servers determined by the file request. This is similar to that of a data-centralized design style where the master server acts as a central data store component while the chunk servers are the knowledge sources.

5. SPECIAL TECHNIQUES OR SOLUTIONS USED

The Google File System employs a few special techniques to maintain a smooth performance from the system. The following techniques were built into the system to compensate for any unreliability of the system’s components.

The replication of data chunks among several chunk servers and the use of shadow masters as a copy of the master server provide the system with added reliability. Electronic messages are used to collaborate between the different components; this ensures that in a situation where a component fails, another will step in to take over its place. The use of unique chunk identifiers verifies the validity of each replicated chunk, this provides a reliable recovery process where data will not be loss should a hardware component fail.

The master server uses the technique of rebalancing to monitor the entire cluster and transfer chunks of data from one chunk server to another. This allows the even distribution of workload between the large numbers of chunk servers.

The techniques of state replica detection and garbage removal are used when replicas of chunks are no longer valid. The master server will assign the chunks as state replicas which will eventually become garbage chunks. After an allocated amount of time, the master server will then delete the garbage chunks permanently.

In order to prevent data corruption, the Google File System uses the technique of checksums. Each chunk of data is allocated a 32-bit checksum, which can be used to verify the integrity of the chunk’s replicas.

6. DISCUSSIONS

In order to meet up to the demands of the World Wide Web, Google came up with their own architecture style for their distributed file system (Google File System). Facing the problem of the routine use of large files, Google knew that a traditional computer file system will not be able to solve their problem. Instead, Google designed a system with the focus on reliability, scalability and simplicity. The "One master, multiple-chunk servers" architecture design allowed Google to build the Google File System on ultra-cheap commodity hardware, and maintain the standard of being a reliable scalable storage platform. It also uses software to manage the imminent failure of hardware, allowing Google to replace failed hardware components with significantly lower costs. It is through building its own platform that Google is able to control and distinguish their system from other off the shelf alternatives.

In conclusion, it is important for software engineers to consider the appropriate software architecture to be used for their projects. While there are no right or wrong designs, an inappropriate design style may hinder a software project due to the lack of crucial quality attributes, and may even introduce unnecessary costs. Therefore it is not only the technical know-how but also through the experiences and innovation that engineers have to rely on, so as to provide higher quality software systems.



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now