A Standard Definition Of Web Analytics

Published Date: 02 Nov 2017

"the objective tracking, collection, measurement, reporting, and analysis of quantitative Internet data to optimize websites and web marketing initiatives" (http://www.webanalyticsassociation.org/ 2006).

Looking at this definition it becomes clear, that Web Analytics can help to understand where problems exists, but it does not explain how to solve these problems or optimize the website (Wu et al. 2009, p.164). This further part needs to be done by an analyst who understands the metrics produced through WA and can translate them into actions.

Historically Web Analytics was part of the work of the IT department as it worked with the web servers log files and databases. But today, with the help of the analytic tools, this has changed. Often it is still located within IT, but the way people look at it has become more of a business function (Hassler 2010; Kaushik 2007). The data is used for website improvements on the one hand, but as the definition above shows it is also used for marketing, sales and support on the other hand (Kaushik 2007, p.93). This means, the strategic goals of Web Analytics are not only the prediction of user behavior, but also the redesign of the website according to user interests (Norguet et al. 2006, p.430), the improvement of the quality of a website (Weischedel & Huizingh 2006, p.463), and getting insights for better disposal. In order to achieve all this and to improve the usefulness and the usability of a website it is key to assess the userâ€™s reaction and interaction (Sullivan 1997). And "The only information left behind by many users visiting a Web site is the trace through the pages they have accessed" (Spiliopoulou 2000, p.128).

By 2002 Web Analytics was already a billion dollar a year industry (Chi 2002, p.64). Companies spend millions of dollars for Web Analytics to optimize websites and campaigns which in turn should bring them billions of dollars in online revenue (Kaushik 2007, p.xxviii). Web Analytics is a booming business (Weischedel & Huizingh 2006, p.463). But as Albert Einstein argued: "Not everything that counts can be counted, and not everything that can be counted counts." Therefore it is important to really understand the data, metrics which are drawn from this data and how to interpret them in a meaningful way.

The way data is collected for Web Analytics can be distinguished into quantitative and qualitative data collection methods (see Table 3). Within the quantitative methods it is further distinguished between server-side data collection, client-side data collection and alternative methods which use a third level. When collecting server-side data this means that the data captured are from the view of the server and what the server works on. Log files are the data captured here. The client-side collection methods page tagging and web beacons show what happened on the website from a user perspective. Methods which use an alternative area to capture data are for example packet sniffer or reverse proxies. Here, the data collection is introduced between the server and the user (Hassler 2010; Kaushik 2007). Within the qualitative data collection methods investigations like lab usability testing or surveys are common methods.

Beside these methods Kaushik (2007, pp.24&39) outlines a different grouping of data. He distinguishes between 4 groups: clickstream (web logs, web beacons, JavaScript tags, packet sniffing), outcomes (visitors, page view, time, referrers), research/qualitative (surveys, heuristic evaluation, usability testing, site visits) and competitive data.

Within this research project the aim is to get an overview of the quantitative methods (clickstream) and the metrics (outcomes) as these are the data used by Web Analytic tools. Therefore only the quantitative methods are further outlined here.

The first method used for Web Analytics data collection within history was to capture log files generated on the web server (Kaushik 2007; Hasan et al. 2009). However, as other methods have been developed which try to overcome the issues of log files, all methods are outlined below.

Originally log files were developed for debugging purposes (Suneetha & Krishnamoorthi 2009, p.328). The web log is a file that is being created by the web server and each time a particular resource is requested it writes information to the log (Pani et al. 2011, p.19). When a user requests data from a server via an URL the server accepts the request and creates an entry (log) for this particular request before sending the web page to the visitor (Kaushik 2007). Thus it records the information about the activities on the server. But the first logs, server error logs, were developed to discover errors during the transmission of files through the web (Kaushik 2007) and not to provide information for Web Analytics (Kohavi et al. 2004, p.86). However, they were the original source of data collection of the web and their potential was recognized quite early. In 1995 Dr. Stephen Turner invented the first log analyzer tool called Analog (Wu et al. 2009, p.163). As the number of accesses to web pages increases rapidly also the number of log entries increased (Kumar Jain et al. 2009). Peacock (2002, p.6) stated that "No other survey technique generates as much data for so little effort. The trick is to turn it into useful information and practical applications" (Peacock 2002, p.6). When managed properly log files can be a good data source for extracting interesting patterns about user behavior (Pani et al. 2011, p.15) and for getting information to improve decision making (Kumar Jain et al. 2009). Used systematically log files can be a very effective tool for gathering passive feedback from users (Sullivan 1997).

Web server logs are plain texts (ASCII) and independent from the server platform. Different web servers may save different information within logs. Table 4 gives a short overview of this basic information.

In addition to the examples outlined above demographic information such as country information can also be captured. It is important to notice that it is not the personal information of the user that is captured at this time, but the technological details, hence the location of the server used by the user (Weischedel & Huizingh 2006, p.465).

The format this data is captured and displayed in thereby differs from the log file format used on the web server. Beside custom log formats which can be individually configured common log formats which also are a standard format for web server log files are the W3C Common Log Format (CLF) and the W3C Extended Log File Format (ECLF) (Pani et al. 2011). The ECLF thereby is the most common format used (Sen et al. 2006, p.85). The following gives an example of what an ECLF log file can look like:

www.lyot.obspm.fr - - [01/Jan/97:23:12:24 +0000] "GET /index.html HTTP/1.0" 200 1220 "http://www.w3perl.com/softs/" "Mozilla/4.01 (X11; I; SunOS 5.3 sun4m)" (W3perl 2011)

Notwithstanding that different web server may save different types of information the basic information maintained are often similar. Because of this standardized way web server logs are saved it is also possible to analyze data from the past (Hassler 2010, p.58).

Besides the advantages of log files being able to collect huge amounts of data with little effort a number of challenges and limitations exist. As server logs capture each activity of the server when requesting a site, one site request of a user can lead to several logs. If for example a site is requested which includes a picture, loading the picture will be a separate log. The same applies to CSS data or flash videos. Log image entries and similar ones need to be identified correctly and be removed for the purpose of usage mining (Pani et al. 2011, p.20; Hassler 2010, p.46). In addition logs that have been created because of web robots, spiders and crawlers (bots) need to be removed (Pani et al. 2011, p.21). Bots are automated programs which visit a website (Heaton 2002). They are for example used by web search engines or site monitoring software in order to see what is available at a site. As the number of bots today is enormous they can dramatically distort the results (Kohavi et al. 2004, p.95). Furthermore, if a page was cached and a user gets the cached page no server log will be written as the server does not get the request. Thus the measures would be underestimated. The same applies if users use the back button of the browser to switch between pages (Pani et al. 2011, p.21; Cooley et al. 1997).

Another limitation of server logs is the fact that the time spent on the last page cannot be measured. The request of the following website will generate a server log on the specific web server, but no hint is sent to the first server. Thus the time spent on the exit page cannot be recognized (Hofacker 2005, p.233). The identification of individual users is a different problem that exists. If users are not logged in to a site, the IP-address often is the information which is used to identify a user. But in a big company network for example multiple users share one IP-address and it cannot be separated between individual users (Weischedel & Huizingh 2006, p.465). Furthermore, with dynamic IP-addresses it happens that a user visiting a site for the second time will be seen as a new visitor only because his/her IP-address has changed since the last visit (Hassler 2010, p.47). Safe distinction of users can only be guaranteed by user authentification or when using cookies (Spiliopoulou 2000, p.129). A cookie is as "a message given to a web browser by a web server. The browser stores the message in a text file. The message is then sent back to the server each time the browser requests a page from the server" (Web Analytics Association). The problem here is that more and more users do not allow the storing of cookies or delete them regularly. Hassler (2010) refers to a study of an American Internet analysis company comScore (Abraham et al. 2007) which draws the conclusion that about one third of all Internet users regularly delete cookies and around 12 percent do not allow saving cookies at all. These users will distort/underestimate the measurements.

A few years ago the majority of web metrics were generated with the use of server log files. They are able to create an overview of the usersâ€Ÿ behavior, existing problems within the website and the used technology (Weischedel & Huizingh 2006, p.464). But within recent years other data collection techniques for Web Analytics have been developed (e.g. page tagging) which try to overcome the challenges and limitations of web server logs and even try to analyze more/different data. Log files only capture the server-side data within the moment of a request. But they do not have any information about what the user is doing in between clicking a new side and they also do not know which setting are used (Kaushik 2007, p.54). Because of little development in web logs and other positive innovations such as JavaScript tags, Kaushik (2007, p.27) recommends to now only use web logs to analyze search engine robot behavior to measure success in search engine optimization. In all other cases other data capturing methods should be used.

When using JavaScript tagging each website needs to have a short JavaScript code included. When a visitor requests an URL from a web server, the web server sends back the page, including the JavaScript code. This code is executed while the page loads. It captures different data such as page views and cookies and sent them to a data collection server. The variety of data which can be captured is huge. It ranges from clicks and position of the curser to mouse moves and keystrokes to the window size of the browsers and installed plug-ins. Besides, any information that can be captured by log files can also be captured with tagging (Kaushik 2007, p.54). Following is an example of the JavaScript code which needs to be included in each page when using the tool Piwik:

var pkBaseURL = (("https:" == document.location.protocol) ? "https://{$PIWIK_URL}/" : "http://{$PIWIK_URL}/");

document.write(unescape("%3Cscript src='" + pkBaseURL + "piwik.js' type='text/javascript'%3E%3C/script%3E"));

</script><script type="text/javascript">

try {

var piwikTracker = Piwik.getTracker(pkBaseURL + "piwik.php", {$IDSITE});

piwikTracker.trackPageView();

piwikTracker.enableLinkTracking();

} catch( err ) {}

</script>

(Source: Piwik 2011)

When using this data collection method it is possible to capture more data more accurately than with web beacons or log files. JavaScript tagging was developed at the end of the 1990â€™s and at the moment it seems to be the most used data gathering technique (Kaushik 2007, p.30ff; Hassler 2010, p.29).

One clear benefit of JavaScript tagging is the possibility to also get data from cached pages. As the code is executed each time a websites loads it is also executed if the site is loaded from a cached space. In contrast, if users have switched off JavaScript, which 2 to 6 percent of users have, the data of these users will not be captured at all. The implementation effort of JavaScript tagging normally is easy. Indeed the Java code needs to be included in every single page, but itâ€™s only a few lines and therefore it is possible to control what data is being collected. But it should always be implemented within the footer of a web page because it should not happen that a page is not loading just because the tagging causes problems. Furthermore, it is possible that an ASP vendor will collect the data. Here the question of who owns the data needs to be clarified. But beside all this benefits JavaScript also has limitations. What was described about the cookies for log files is the same for tagging. If users switch them off or delete them, this information is lost. Furthermore, even if it is possible, it is much harder to capture data from downloads with tagging than with log files. PDF files for example do not include executable Java code. If they are requested through an URL of the web page, the request can be captured. But if they are directly opened from a search engine then the request will not be recognized. In addition, if a website already uses a lot of JavaScript on its site the tagging can cause conflicts and sometimes is not even possible (Kaushik 2007, p.32f; Hassler 2010, p.60f).

Web beacons are 1 x 1 pixel transparent images which are sent from the web server along with the website and are then executed to send data to a data collection server. They were developed and are mostly used to measure data about banners, ads and e-mails and to track user across multiple websites. Web beacons are easy implemented in web pages around an <img src> thus an image HTML tag. They can capture data such as the page viewed, time, cookie values or referrers. As robots do not execute image requests they wonâ€™t be visible in web beacon data collection. But if users turn off image request in their e-mail programs and web browsers they also wonâ€™t be measured. Furthermore, as web beacons are often coming from third-party servers they raise privacy issues and for example antispyware programs will delete them.

Web beacons are not as powerful as JavaScript tags. Thus they shouldnâ€™t be used as the main data collection methods. But they stand out when trying to get data across multiple websites (Kaushik 2007, p.28f).

Packet sniffers are implemented between the user and the website server. A user requests a page. Before this request is processed by the website server it runs through software- or hardware-based packet sniffer which collects data. It is then passed through the website server. On the way back to the user the packet sniffer again is between them to collect data. Using this method of data collection will bring the most comprehensive amount of data as every communication is watched. Technical as well as business related data can be captured. Only cached pages will not be measured unless they have additional JavaScript tags. One clear benefit of packet sniffing in contrast to JavaScript tagging is that there is no need to touch the website, nothing on the actual website needs to be changed. But on the other hand additional hard- and software is needed which need to be installed, controlled and maintained. Furthermore, packet sniffing raises privacy issues. Using this method, raw packets of the Internet web server traffic is collected. This means that information such as passwords and addresses, etc. will be saved. It needs careful investigations to handle this data correctly (Kaushik 2007, p.35f).

Only a few Web Analytic tools support packet sniffing today and because of additional hard- and software it can become quite expensive. Therefore Kaushik (2007) recommends to only use packet sniffing if JavaScript tagging has significant shortfalls.

The methods discussed above for data collection have limitations. And there are a few factors which all of them need to keep in mind. Probably the most important point is to make sure that the website is working for the user. The customer is first, not the analysis. If the analysis is not working that might be bad, but if the site is not working because of problems with the data collection this is even worse. Furthermore, the user privacy needs to be maintained. Privacy policies need to be in place and should be observed. Using web logs on the own server directly leads to the ownership of the log files. But with all other collection methods attention should be paid to the data ownership if the analysis is given to a third party. Furthermore, if the website is hosted on several servers the data needs to be aggregated in order to get the whole overview. The issues with cookies and the measurement of the time spent on the last page as already outlined for log files are issues true for each collection method (Kaushik 2007, p.36f).

Regardless of the method that is used for data collection no way is 100% accurate. There is no one way to go (Kaushik 2007, p.37). As already outlined within the explanation of the different methods above each method has its own benefits and challenges. Web logs and JavaScript tags are the most used data collection techniques at the moment (Kaushik 2007, p.100) but there are big discussions whether to use the one or the other technique. Hints about when to use what method were given above, but a question to ask is "what do we want to get out of the data?" No technique will collect "all" data and it is also hard to make solid statements about the quality of clickstream data. The web is a nonstandard environment. Websites and their technologies are constantly changing. Visitors use different mediums to access pages and not everything is captured in clickstream data due to different reasons (Kaushik 2007, p.108). Kaushik very clearly expressed the data quality in general by saying: "Data quality on the Internet absolutely sucks" (2007, p.109). But besides the importance of data quality it is more important how confident someone is with the data. If someone knows how the data was generated and what drawbacks the method has it is possible to still use the data in a good way. The important point is that the data leads to a decision so that it is possible to move forward (Kaushik 2007, p.110).

The data itself is not very useful until it is analyzed. But even when coming up with different metrics the interpretation can be problematic. For example, does a user who looked at a lot of pages represent a satisfied user because he found a lot of information or is it a lost user who clicks around because he couldnâ€™t find what he was searching for (Sullivan 1997)? Or what does a long viewing time of a page say? The visitor may actually be reading the text, but it is also possible that he is making a phone call or is getting a coffee and the page is just open in the meantime (Weischedel & Huizingh 2006, p.465).

Even though it is not further outlined here, qualitative data collection techniques can further help to understand users. With the help of metrics the quantitative data can be analyzed but they only show what happened at a specific site/a given timeframe. But they all miss to tell why something happened. To really understand user behavior the quantitative analysis answering the "what" (clicks, visitor count, etc.) and the qualitative analysis answering the "why" (intent, motivation) need to be combined (Kaushik 2007, p.13f). Besides, having competitive data and to know how competitors are doing within the business help to estimate whether the own business is doing well or not (Kaushik 2007, p.44).

As outlined in the above section there are different ways to collect data for Web Usage Mining. The effort to get these data is not too high, but the analysis and interpretation are very time-consuming (Hasan et al. 2009, p.697). The first and most asked questions when talking about Web Analytics often are: "How many visitors came to our site? How long did they stay? How many pages did they visit?" (Kaushik 2007, p.132). In order to get objective answers to this questions defined measurable attributes can be used. These quantifiable attributes are called web metrics (Hassler 2010, p.90) or key performance indicators (KPI) (Burby & Atchison 2007). They help to understand how visitors are using a specific site (Weischedel & Huizingh 2006, p.463) and to answer these questions.

Defining business metrics is one of the first steps in the analytical process (Burby & Atchison 2007, p.49). Within their book Burby & Atchison (2007) write about business metrics as measurements of performance, specific for one organization, which are based on the most important business and web goals and which lead to long-term considerations. They argue that most KPIs are financial in nature and give examples, such as the top-line revenue or higher average revenue per purchase. They do so because they look at Web Analytics only from a commercial business perspective having an e-commerce website.

However the types of metrics can be distinguished by the quantity, the ratio which looks at metrics and their relation and the value which e.g. represents an URL or search referrer (Hassler 2010, p.105). Most commonly the following web metrics are used:

Page View: The page view represents the number of times a single page was viewed (Web Analytics Association 2008). In the past hits which represented requests to the server were measured from log files. They are probably the oldest metric in Web Analytics (Hassler 2010, p.90). But since web pages nowadays include more than just text, one request can easy lead to 5 or more hits as every picture for example is counted separately. Thus, the metric hit is not that important any more. Instead the page as a whole is looked at. It is a good metric to estimate the general demand of a website. The question to ask is if it is good or bad to have a high number of page views. If the structure of a website is not leading to answers the number of page views might be quite high. But people might not be happy with the site. Only from the number of page views it cannot be stated if a site is successful or not (Kaushik 2007, pp.9, 140). Kaushik points this out by saying "... a page view is a page view; it is just a viewed page. Nothing more, nothing less" (2007, p.77). One aspect which may become more and more important in the future is the information value of page views when using Rich Internet Applications like AJAX. They can lead to an underestimate of page views as it can happen, that a user only request one page and parts of the site are only reloaded while surfing (Hassler 2010, p.93).

Most viewed Page: Also called most popular pages or most requested URL the most viewed page simply counts which page had the greatest number of visits (Kaushik 2007, p.150).

Visitors and Visits/Sessions: These metrics are some of the basic ones in Web Analytics and provided by every tool. The name thereby often is different and ranges from visitors and visits, over total visitors or unique visitors to sessions and cookies. It is important to understand how a tool really measured the specific metric. Most commonly the differentiation is the following:

Visits/Session: represents the number of visits (sessions) on a website during a specified time period. One visit thereby is an interaction by an individual with one or more pages of a website (Kaushik 2007, p.133; Web Analytics Association 2008).

Visit Duration/Time on Site/Length of Visit/ Time Spent per Visitor: It is counted as the difference between the timestamp of the first minus the last activity in a session and shows the length of time in a session. This time-based metric often is hard to evaluate unless it is known what the user did in between. Did he really viewed the page or was he doing something else? (Web Analytics Association 2008; Kaushik 2007, p.136). Furthermore, when a user stays on a page for e.g. five minutes that can mean that he is interested, that he is distracted, or cannot find what he was looking for (Sullivan 1997).

Visitors: counts the number of users of a website (Hassler 2010, p.98). It is important to notice that for example the number of unique visitors can be quite different from the total number of visitors and the question needs to be asked how the visitors were identified (Kaushik 2007, p.132).

Unique Visitors (unique browser) tries to count the number of distinct users within a given timeframe. Each individual is only counted once. To distinguish between individuals authentication, IP-addresses or cookies can be used. All have different limitations so that it is important to understand how the number was made up in the specific case and to which timeframe it is counting (Web Analytics Association 2008; Kaushik 2007, p.133).

New Visitors shows the number of unique visitors who viewed the site for the first time ever (Web Analytics Association 2008).

Return(ing) Visitor shows the number of unique visitors who visit the site, but also have visited it before (Web Analytics Association 2008).

Visits per Visitor: The number of visits divided by the number of unique visitors in a given time period (Web Analytics Association 2008).

Click-through: This number indicates how often a link was clicked by a user (Web Analytics Association 2008).

Referrer: Referrer can show where users came from and what they searched for. The referring website shows the page where the user came from before viewing the current page (Web Analytics Association 2008). Search key phrases show what users searched for in search engines before they got to the current site. But it is not always possible to get referrer information. Around 40 to 60% of referrer data will be empty. Reasons are, among others, that the page was bookmarked or the entry was through direct type in of an URL (Kaushik 2007, p.145).

Entry Page: It shows the first page of a visit (the URL). As usually many entry pages exist it is often displayed by a list of URLs with the number of visits (Web Analytics Association 2008).

Exit Page: The exit page represents the end of a visit. It is the last page accessed on a site before leaving it (Web Analytics Association 2008).

Exit Link: Within a page often external links are included. When a user clicks an external link and exits the own website the link is called an exit link.

Top Exit Page: The top exit page is the page where most visitors leave the site from. The exit rate is the percentage number of how many people left from a specific page. However, how can these numbers be interpreted? Why do people leave at a specific site? The answer can be either because they found what they were looking for (easy to discover with a thank you page of an e-commerce site) or they left because they couldnâ€™t find it (Kaushik 2007, p.9).

Page Exit Ratio: It is the number of exits from a specific page divided by total number of page views of that same page (Web Analytics Association 2008).

Landing Page: A page viewed as a result of the beginning of a users experience resulting from a marketing effort is defined as landing page (Web Analytics Association 2008).

Event: An event is every recorded action with a specific date and time directed to the browser or the server. It can be either seen as a count (number of events) or as a dimension (how many visits can be associated with a specific event) (Web Analytics Association 2008) and often is associated with a specific page or session.

Completion/Order Conversion Rate: The percentage of visitors who purchase on the site (Ogle 2010; Hasan et al. 2009, p.703).

Bounce Rate: Depending on the author the bounce rate is defined differently. It is either the percentage of single page visits (Hasan et al. 2009, p.701) or the percentage of visits that only stayed on the site for a very short time (Kaushik 2007).

Assessment of Content Popularity: It is a ranking of the content of a website e.g. the most frequently showed texts (Ogle 2010).

Path Analysis: The path analysis shows how users click though a site (Ogle 2010). It is one kind of pattern mining as used within pattern discovery.

This list could be endless, as each tool is able to generate hundreds of different metrics. For example it could go on with the most popular pages, internal referrers and internal search phrases. The list above should only give an idea of what is possible. Each Web Analytic tool measures the metrics a little bit differently and two different tools will never have the same numbers for one website. Around 10 to 20% differences are quite common. But as long as the numbers generated over time are from the same tool this is regardless. Furthermore, absolute numbers do not really matter. For example, how high the real number of visitors is, if it is 5,000 or 6,000 is not important, as long as the number is stable or growing (Hassler 2010, p.104). However, to really understand the metric and to be able to estimate inaccuracies it is always important to know how the analytic tool which is used measures the metric.

Besides, Hassler (2010, p.104) points out four important issues to keep in mind when using metrics and their measurement inaccuracies:

Look at ratios rather than at absolute numbers

When trying to compare metrics the same analytic tool should be used

Keep in mind the measurement inaccuracies when comparing between websites

Benchmarking on own measurements from today and the past are relatively resistant against measurement inaccuracies

All in all, the quality of data can be as good as it may be possible and the metrics as exact as possible, but "The data is not telling me what I should do" (Kaushik 2007, p.6). Therefore analysts are needed which further interpret the data and draw conclusions from it. Furthermore, metrics miss the reporting of usability problems such as a lack of privacy or security or inconsistent design. Therefore heuristic evaluations could be used (Hasan et al. 2009, p.705). Metrics are a good starting point for Web Analytics, but they shouldnâ€™t be seen alone. Other analysis methods and further investigations of analysts are always required.

Whenever it comes to the storage of user information legal issues around privacy concerns arise. What makes it particularly difficult in the web domain is that websites mostly are available from more than one specific country. An American website for example, regardless if implicitly or explicitly, also addresses European users and thus needs to handle international legal regulations. A problem here is that different countries have different laws and regulations. The USA for example has soft laws concerning the protection of user information whereas the regulations in Europe are much more restrictive (Schubert et al. 2004; Hassler 2010). In the following some of the main legal aspects are outlined. The discussion represents the German viewpoint but these issues are also similar for many other countries, especially countries within the European Union.

When looking at Web Analytics the area where it can get legally problematic is the data protection of user information. Conducting Web Analytics means saving user data. Impersonal or anonymous data of any kind are not problematic, but as soon as it comes to personal data that is clearly assignable to a natural person general privacy policies are affected (Hassler 2010, p.71). It needs to be distinguished between explicit profiles (identification profiles) where the user is recognized or deliberately adds data to a system and implicit profiles (interaction/transaction profiles) that are stored and generated in the background. Personal data via e.g. login can be saved and used, if the user grants permission. Permission can be asked for with a checkbox, but the checkbox must use the opt-in method. That means that the user must check the box by himself. It is not allowed that the box is already checked (Hassler 2010, p.72). But if the user does not need to login at the site it is problematic to ask him.

The storage and analysis of user data such as date, time, page views etc. are not legally forbidden, but when it comes to IP-addresses it becomes problematic.

Therefore the suggestions given by Hassler (2010, p.73ff) provide good guidance:

Include a data privacy statement on the website

Use cookies instead of IP-addresses

Do not combine user data with personal data

Include an option to turn off the tracking

Consider the location of the data storage

Within Web Analytics there are unique privacy issues that arise. The Web Analytics Association tries to outline and discuss the initiatives and issues as well as best practices on a dedicated webpage (Web Analytic Association 2011).

5.5 Summary of Key Challenges

Most challenges have already been described between the lines within this whole chapter. This section provides a short summary of all the challenges mentioned in order to draw them together and to stress their importance again. The most important points where problems and challenges occur are:

The identification of visitors

Filtering of bots

Issues with caching

Data overload

Type of metrics

Time on last page

Appropriate tool

Broad overview of whole site

Interpretation of findings

Privacy issues

The identification of visitors can be done though IP-addresses, caching or user profiles if the user needs to log in. Each method has its drawbacks and with all it needs to be kept in mind, that they wonâ€™t be 100% accurate and count each single individual user. The user profile might be the easiest and most accurate way, as the users identify themselves. But if they do not allow using this information for web analytic purposes, this data cannot be used. Furthermore, it must be taken care, that sensitive information like the password is not connected with the analytical data. When using caching it might be that we will not get the information of all users if people might protect themselves from caching. Finally, when using the IP-address privacy issues might arise and same users might actually be different once and the other way around as already outlined above.

In particular, when using log files for Web Analytics spiders and bots need to be identified and removed before analyzing the data. If not, they will dramatically blur human traces and distort the statistics. Another difficulty lies in caching. The only data collection method which is able to also measure cached websites which are viewed by users is JavaScript tagging. All other methods miss these pages which lead to too low results in the statistics.

The volumes of data to measure can be another problem area. Not all tools are capable of handling very large amounts of data (Sen et al. 2006, p.87). Especially when not analyzing in real time, but trying to analyze heaps of data from the past that can cause problems.

As discussed earlier there are hundreds of metrics available within Web Analytic tools. It is not always easy to understand the differences between metrics and find the appropriate one for the right purpose (Sen et al. 2006, p.87). It furthermore needs to be clear, how the metrics were developed. The time which is spent on the last page before leaving for example cannot be measured accurately. As there is no incoming information from the next server most tools determine a session after 29 minutes of inactivity (Kaushik 2007, p.36).

The market for Web Analytic tools has risen within the last years. But which tool is the appropriate one and where are the differences? An answer to this question is given in Chapter 6. But beside the analytical part which is achieved with the tool there is a need to understand the website as a whole, including its content and structure (Cooley 2003, p.94). It is required to understand the metrics and to generate useful findings from the statistics. Often many analyses stop with the findings of how many people visited the website and similar. But really good Web Analytic investigations should go behind that and try to conclude with action for e.g. redesigning the website or getting ideas for marketing campaigns.

Aside from changes in the environment, like the introduction of a new product, understanding the user behavior of a website is the main reason why websites are subject to changes today (Weischedel & Huizingh 2006, p.463). In the past consumers were mostly passive elements in the overall ecosystems. With the rise of the internet this has changed completely (Burby & Atchison 2007, p.6). Today they are very important participants and can provide much interesting data. Thus, it is desirable to understand how to support them best. With Web Analytics it is possible to go along the entire process and to see a website from the perspective of its users (Spiliopoulou 2000, p.133). Getting to know how users access a site and which path they are taking is critical for the development of effective marketing strategies as it allows organizations to predict user visits. It further helps to optimize the logical structure of a website (Cooley et al. 1997). "Improving Web communication is essential to better satisfy the objectives of both the Web site and its target audience" (Norguet et al. 2006, p.430).

In the past WA has widely been used in the economic field (Wu et al. 2009, p.163) and most analytical investigations which are described within the literature were done for commercial websites. Nevertheless there is no reason why Web Analytics cannot also be helpful for other kinds of websites. There might be areas which are much easier to determine when looking at e-commerce websites like the successful visit of a user with a purchase, but there are lots of other areas that can also be used for the other kinds of websites.

It needs to be kept in mind that Web Analytics only is an analysis. Itâ€™s not an exact science. No numbers will 100% show the reality. But "It is better to be approximately right than precisely wrong" (Hassler 2010, p.34f) because the approximately data can still be used in order to draw conclusions. It cannot give clear answers as no direct answers from users are given. With the help of analytic tools it might be easy to quantify a site, but the data needs to be interpreted by an analyst who takes action for improvements (Ogle 2010, p.2604).

"In web analytics, "going wrong" often means just going halfway." Very often vast amounts of money are invested into tools, but in the end only reports are produced and nothing more (Burby & Atchison 2007, p.43). The part of taking action after the analysis might be the most critical aspect in Web Analytics but is often neglected.

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now