Intelligent Web Caching Solution

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Web caching is a widely deployed technique used to reduce the latency observed by web browsers, decrease the aggregate bandwidth consumption of an organization's network, and reduce the load incident on web servers on the Internet. Web caches are often deployed on dedicated machines at the boundary of corporate networks, and at Internet service providers.

Here I discuss about the current solutions available to address web caching at corporate level and as well as at global level. The features of each of these solutions, and the pros and cons of adopting each of them in a given local area network or LAN are also discussed here.

This project is targeted at developing and implementing a centralized web caching solution to optimize the usage of internet bandwidth of a given corporate or domestic network. I describe here the barriers that I have to overcome in implementing such a solution, and how I can leverage the existing terminologies in doing so.

Table of Contents

Abstract i

Table of contents ii

List of figures iii

1.0 Introduction 1

2.0 Background 2

3.0 Objectives 4

4.0 Procedures 5

5.0 Evaluation 10

6.0 Personnel and Facilities 11

7.0 Budget 12

8.0 References 13

List of figures

Figure 1 Overview of a LAN with the caching server in place.

Figure 2 HTTP traffic flow in a LAN during a Cache Hit.

Figure 3 HTTP traffic flow in a LAN during a Cache Miss.

Chapter 1

1.0 Introduction

In the world of IT where almost everything is computerized, resource management is a key factor for success. With the advancements in the cloud technologies such as SaaS (Storage as a Service), PaaS (Platform as a Service) and etc. internet bandwidth is mostly utilized for business critical applications rather than for traditional web browsing and video streaming. But in a world where the latter is also important it is a necessity to manage the internet bandwidth to support both the needs.

Internet bandwidth usage for web traffic is exponentially increasing with the outbreak of web applications which are rich in dynamic content than ever before. This includes audio and video streaming, social networking, and etc. This intense use of internet bandwidth hinders the better performance of business critical activities such as the ones that were mentioned previously.

Prioritizing internet bandwidth usage for business critical applications can also be achieved by using Quality of Service or QoS. When the internet bandwidth usage is at its peak QoS will come into action and start allocating the necessary bandwidth for the traffic in high priority traffic classes, for example Voice-over-IP or VoIP. This will solve the problem of lack of adequate bandwidth to business critical applications but this alone is not going to help in resolving this issue.

The problem of higher bandwidth usage could effectively be handled by implementing a web caching solution in the local area network. If the web content is cached locally in a centralized location within the local area network when one user accesses a specific web resource the next user can utilize the local copy of that resource instead of downloading or streaming it again from the original source via the internet connection. This will reduce the time taken for the content to reach the end user as well. However the biggest gain would be the reservation of the internet bandwidth which would otherwise have been used to download or stream the same content each time a similar request is received.

The rest of the proposal will discuss the background of this project, the related researches by other researchers, the objectives to be accomplished, the procedures that will be carried out, how the final solution would be evaluated, the personnels involved and the resources allocation, and the budget.

Chapter 2

2.0 Background

Over the past few years several researches have been carried out in the context of web caching with the intention of optimizing the internet bandwidth usage. Most of them have addressed this issue in unique approaches.

At present, Content Delivery Networks or CDNs is one of the widely used caching mechanisms in the web community [2]. This seems to be the ideal solution for the latency experienced when accessing web resources which are located in different geographical regions. In CDN’s web caching is achieved in a hierarchical manner where the original content is cached in a single master server or cluster of master servers and then replicated to the downstream servers which are called CDN Surrogates in different regions. This is a different flavor of web caching. Here the content redirection is achieved using Domain Naming System or DNS [1]. The CDN providers manage the customers’ DNS servers upon their consent to use the relevant content delivery service provider to deliver their content. They update the customers’ DNS servers’ records to redirect the web traffic to their surrogate servers which have cached copies of the original content. During web request, if the cached content is unavailable in the surrogate server(s) that the request is redirected to then it requests the content from the CDN master server(s). If the cached content is also unavailable at the master server then the master requests the content from the original web server. It then caches the retrieved content and delivers it to the surrogate server. Finally the surrogate server delivers the content to the client who made the web request.

Akamai Technologies is the market leader in providing the content delivery services [2]. It owns more than 12000 caching servers over 1000 networks in 62 countries worldwide [2]. This approach is suitable for implementing a web caching solution at a global scale but not at a local or domestic scale. A CDN will not suite for implementing a web caching mechanism in a typical local area network. Although the CDN’s hierarchical design could be used in such a solution for load balancing and high availability the content redirection via DNS would be impossible to achieve as every web request to external resource would be resolved to a public IP address. For a small to medium sized environment it will be useless to introduce this type of a hierarchical design as the web traffic will be comparatively very low.

In [8] the researchers Sitaram Iyer, Antony Rowstron, J.J. Thomson Close and Peter Druschel discuss the possibility of using a decentralized web cache by making use of the local caches of each and every web browser on every personal computer in a local area network. One advantage of this approach is the low investment on the hardware that would be needed if the web caching is centralized. The disadvantage I see in this approach is the probable congestion in the corporate network due to this traffic. However this may not be a critical issue in a multi-gigabit Ethernet network.

There are a few commercial products as well. Cachebox [4] is one of them. This is a hardware appliance in which the hardware is customized for the optimum performance of the caching software rather than a pure software solution. The vendor ApplianSys delivers this solution in different flavors to meet different demands. In my opinion this is a great product since this is an all-in-one plug-n-play solution. If we use this then we would not have to worry about buying new hardware or struggle to optimize the hardware for the software application. The disadvantage would be the cost of this product. This may not be affordable by the small to medium sized businesses.

OracleAS Web Cache is another commercial product which is a web caching application server software. As said in [3] this is a content-aware server accelerator, or "reverse proxy", that improves the performance, scalability, and availability of web sites that run on Oracle Application Server. By storing frequently accessed URLs in memory, OracleAS Web Cache eliminates the need to repeatedly process requests for those URLs on the application Web server and database tiers. Unlike legacy proxies that handle only static objects, OracleAS Web Cache caches both static and dynamically generated content from one or more application Web servers. Because OracleAS Web Cache is able to cache more content than legacy proxies, it provides optimal performance by greatly reducing the load on web application server and database tiers. As an external cache, OracleAS Web Cache is also an order of magnitude faster than object caches that run within the application tier.

If we speak about the free products of this kind one of the most famous products is Squid [5] which is an open source web caching proxy. In my opinion this seems to be the ideal solution for web caching in a local area network due to the following reasons.

Affordable by any business since it is free.

Can be customized as required since it is open source.

Designed to run on Linux which is again a free operating system. (Latest versions are designed to run on Microsoft Windows as well.)

One problem with this solution/product is that the additional administrative overhead required in configuring each and every web browser to use it as it is a proxy. This is not at all a problem when the user base is low. The other problem would be the ability to access web content bypassing the cache in some networks where the use of proxy is optional. In such cases web browsers not configured to use the Squid proxy will retrieve the web content directly from the original source without using the cache.

Chapter 3

3.0 Objectives

Primary objective of this project is to minimize the usage of internet bandwidth of a given internet connection by caching the web content which have been already accessed during previous sessions. This would be accomplished by implementing a web caching server in the local area network which effectively caches web content once accessed by a user and delivers the cached content to the next user or the same user upon the next request for the same web resource.

The secondary objective of this project is to deploy the solution in a given local area network seamlessly with less administrative overhead. I am planning to achieve this by deploying the caching server in-line with the network so the outbound traffic flow of the LAN is routed through it. This will enable it to the see the outbound internet traffic and filter out HTTP traffic from it to perform caching. This in-line deployment is more similar to the concept of a surrogate server in a CDN architecture except that there will not be any traffic redirections using DNS nor will there be any CDN Masters from which the unavailable content is downloaded or streamed from. In a more distributed setup this may be achieved.

This server should be able to analyze the internet traffic of a given LAN and identify the HTTP (Hyper Text Transfer Protocol) traffic in it. Once the HTTP traffic is filtered from the main traffic flow it should be able to check the HTTP header of each conversation and decide whether the request should be forwarded to the actual destination or should it be serviced by itself. Cache manager* makes the latter decision. It should be able to check the cache store quickly and identify whether the request has already been cached or not. Conversation Manager* handles the communication between the client and the server. It should speak with the Cache Manager to fetch the content requested by the client or the end user. Cache Manager then should check the cache storage for the requested content. If found then it should be delivered to the conversation manager who in turn should deliver it to the end user. If the content is not present then it should forward the request to the actual destination, fetch the content, cache the content, and then deliver it to the conversation manager who in turn should deliver it to the client or user.

The other objective is to measure the usage of the internet bandwidth before and after the implementation of this caching server in a local area network. I will do this test in a multi-user environment where a single internet connection is shared among them. The evaluation procedure is discussed more in the Evaluation section of the proposal.

* These are modules of the caching solution that facilitates the caching functionality.

Chapter 4

4.0 Procedures

Planning

Identifying the dependencies in modules

This solution would be a composite product of many modules with different functionalities. It is convenient to have a modular approach to this as it would convene the development as well as troubleshooting. During this phase I am planning to identify the dependencies among these modules. This would help me to test them along the development process and would also help me to allocate time effectively for the development based on their complexity and other modules’ dependency on them.

Choosing the main software language for the development

As this solution is a software product choosing a suitable software development language is crucial. During this activity I will be evaluating the features of the available software development languages to identify the best one to proceed with the development.

The reason why I find this necessary to be done is that if I start with any random language I choose and if I come across an issue after a few months where a certain feature or function is impossible to be implemented with the selected language it will be a laborious task to change every line of code to another language to proceed with the development.

Choosing the platform to host the solution

By platform what I mean is the target operating system that this solution would be developed to be hosted upon. I find this necessary to do as a task as it would affect the timeline very much. For example if I choose an operating system and it does not allow to access certain features in it such as packet-level analysis (which I would need to identify HTTP headers) for the external software it will again be a laborious task to rewrite the coding to support another target host.

Development

Development of the individual modules

This activity is the heart of this project. Each and every module that will collectively build the ultimate solution will be developed during this phase.

I am planning to test each of these modules along with the development so that the errors do not get piled up at the end of everything. I believe that having such an approach would reduce the probability of having a large number of unexpected software bugs.

Implementation

Deploying the solution in production

In this activity I am planning to implement the web caching solution in a sample, multi-user environment and evaluate its impact on the internet bandwidth usage.

I will be deploying the caching server in-line with the LAN during this procedure.

The following diagram depicts the deployment of the caching server in a production network.

Figure 1: Overview of a LAN with the caching server in place.

The following diagrams depict the HTTP traffic flow during a Cache Hit and a Cache Miss at the caching server.

Cache Hit

Figure 2: HTTP traffic flow in a LAN during a Cache Hit.

Cache Miss

D:\sys\Dropbox\SLIIT\_Curtin\_CTP\Archive\Implementation Plan 3.jpg

Figure 3: HTTP traffic flow in a LAN during a Cache Miss.

Chapter 5

5.0 Evaluation

To evaluate the product I will create a sample multi-user environment such as a small windows domain environment.

Evaluating the caching functionality

To demonstrate this I will view an online document such as PDF as a normal user and let the server cache it (which would happen automatically). Then I will try to view the same file as another user. This time the file should be delivered from the caching server instead of the actual web server. I will monitor the usage of the internet connection during both the scenarios so it will be visible that the bandwidth is not used during the latter scenario.

If time permits I will try to design the solution to cache video and audio as well. If I am able to complete this portion as well at the end of the timeline I will demonstrate this by streaming a YouTube video as one user and then view the same video as another user or same user streamed from the cache.

Evaluating the effective bandwidth usage

I will measure the internet bandwidth usage for carrying out a few web related activities such as viewing online documents, images and etc. without the caching server in place and then measure the bandwidth usage for the same activities with the caching server in place and then compare the effective bandwidth usage. This way I will be able to prove specifically the percentage of bandwidth that can be saved by using this solution.

Chapter 6

6.0 Personnel and Facilities

I am working on this project alone and I will thus be assuming all the project roles. I will be handling all the phases of the project. The roles I will be assuming, in brief, are as follows.

Choosing the main software development language for the product development.

Choosing the target platform to host the final solution.

Development of the individual modules of the product.

Testing the developed modules.

Debugging the developed modules.

Inter-connecting the modules to work in harmony to achieve the final goal.

Testing for conflicts in operations among the modules.

Testing for the caching functionality (whether the web content get cached when it receives web requests).

Testing for request forwarding (whether the web requests are forwarded to the original destination during a Cache Miss).

Testing for updated content delivery (whether the solution always delivers only the updated content to the user and not the stale content).

Testing how the solution reacts in a critical condition such as insufficient storage for caching.

Test whether the solution services the web requests from its cache during Cache Hits when the internet communication is offline.

For demonstration purposes I will use my own equipment for hosting the caching server and for hosting the sample windows domain environment to mimic the multi-user environment. I will use an internet connection of my own (probably a portable USB modem) to mimic an average internet connection.

Chapter 7

7.0 Budget

I will be developing my own software solution using existing hardware. I will also be using freely available resources to assist my development such as open source languages and tools. Thus no material costs would be incurred during this project.

Chapter 8



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now