Web analytics is one of top tools used by modern sales and marketing teams. It’s practically inconceivable to make serious business decisions without having solid numbers on your website performance.
But objective as web analytics results may seem, there are some common issues that can skew your reports. And when the problem of false data occurs, it may be hard to get rid of, or even to spot, as at first glance your data can seem plausible in most cases. You might only discover some inaccuracies when comparing your website's performance results with another tracking platform. But let’s be honest, do you have time to be reviewing your data from different platforms on a daily basis? Most likely you don’t.
Better safe than sorry, right? That's why it's essential to be aware of problems that may be skewing your reports. In this piece I want to discuss 5 top challenges as well as share some advice from the Piwik PRO team on how to overcome them in order to restore full data accuracy.
So I invite you to read on not only if you’re worried you may already be missing out some important insights. We’ll also mention some actionable ideas on how to ensure your analytics data is always as accurate as possible. Okay, let’s dive in!
No, we’re not kidding you :) I know this might seem overly obvious, but you’d be astonished how often it gets overlooked, especially on big websites with millions of pages. And when tracking code is missing on a given page, then its traffic will simply not be recorded. That means you’re not getting the full picture.
There are several ways to collect data in Google Analytics, and things work in a very similar way in other tools such as Piwik PRO. Depending on whether you want to track a static or dynamic website, an app, or other connected device, you’ll get a different code. You need to paste that snippet before the closing < /head > tag on every web page on your site, or in a general header file that is included at the top.
If your CMS doesn’t use a special add-on, extension or plugin, you need to insert the tracking tag manually. And that can be a pain, particularly if you happen to run a large website.
Better safe than sorry, so it’s essential to be 100% sure that your CMS adds tracking code to every new page by default. If you’re not sure this is the case, we recommend using software such as Web Link Validator or W3C Link Checker to identify all the missing tags and add the code where it’s absent.
You can also benefit from tools like Google Tag Manager or Piwik PRO Tag Manager. They make the process of adding code to all the pages of your website super quick and easy. You just can’t go wrong with them.
When attaching a tag, simply specify the triggers on which you want the tag to fire. If you select Page View without setting further conditions, your tracking code will load on all pages.
Figure 1: This is what adding the "All Pages" trigger looks like in Google Tag Manager.
Of course, also OnPage.org can help you to find out whether the Analytics Code is implemented correctly on every of your sites. To check this, you can make use of the report "Google Analytics ID". You find this report under "OnPage.org Zoom". There you will find the category "Content" under which you can select "My custom Fields".
Figure 2: Check with OnPage.org if the Google Analytics code is implemented correctly
Another way ist to create custom fields with which you can check if your standard website has a reference to your Analytics Code. To add a custom field, go to "Project Settings" and select "Custom Fields".
Referrers, also called referrals or traffic source analysis, is an important group of reports presenting segments of traffic from external sources. It’s valuable information on your audience and where they come from.
Useful and powerful, referrers also pose various problems from a web analysis point of view.
A common issue is self-referral. Have you ever opened your referrers report only to see your domain at the top? If you have, that means you’ve encountered a self-referral problem. This may happen when you’re missing tracking code or have a configuration issue that causes one visitor to trigger several sessions when there should only be one. A few self-referrals can appear if your analytics is configured to track across multiple domains or subdomains. But if this occurs frequently it means you’re potentially facing badly skewed data.
Another problem that can seriously undermine your referrers reliability is so-called "referral spam", which fills your analytics with fake data. It starts with spam bots that detect weak websites and then send fake referrer information, including domain names. Since this information is tracked by your analytics, it will show up in your reports.
So instead of valuable insights, your reports may become filled with multiple URLs linking to dodgy websites trying to improve their search engine rankings through backlinks. Such spam may also affect your site’s loading time, leading to higher bounce rates and degrading your SEO. All websites can receive some bogus traffic from time to time, but alarm bells should go off if you continuously observe high levels of suspicious links in your reports.
Figure 3: Any suspicious entries in your referrers reports?
In many cases self-referral is related to the lack of tracking code. If you’re missing tracking code on a given page, this will bring a visit to an end and automatically start counting a new session when your user moves to the next page that includes tracking code. The good news is that the problem usually ceases when you add the missing snippet of code.
If your landing page and referrer are on separate domains (e.g. mywebsite.net and offers.mywebsite.net), then the existence of self-referral means that you have incorrectly configured cross-domain tracking. That calls for a review of your setup. If you’re not sure what to look for or how to fix it, we advise you to seek professional advice.
Things can get a bit more complicated when trying to deal with referral spam. Unfortunately, there is no universal solution. Simple IP blocking of your spam may not be enough in the face of powerful botnets – networks of infected computers accessing your website from many different IPs.
You could think about blocking suspicious URLs through your .htaccess file in the root directory of your site domain. This method is also far from ideal because it only stops crawler referrer spam. It’s the other type, the so-called ghost spam, that is far more widespread. Ghost spam can only be blocked from pinging your analytics account if you use specific filters, like those described in this post on Search Engine Journal.
Alternatively, you can try using tools that deploy community-maintained and regularly expanded spam blacklists (such as Piwik or Piwik PRO). Since new spam sources are added to the platform with every release, to automatically exclude a great deal of spam it’s enough to keep your analytics software up to date.
Sampling is a common method in statistics. It’s based on the simple assumption that in order to determine the most popular trend in a given group you don’t really have to talk to all of its members. Instead, you select only a subset of people, hoping it will be representative enough to make the results accurate.
Web analytics sampling works in a very similar way. Only a subset of your traffic data is selected, analysed and used to estimate global results.
Many web analytics platforms automatically sample data when you reach a particular limit of actions tracked on your website. You know that you have this option activated in Google Analytics when you see a message at the top of your report saying "The report is based on x visits (x% of visits)."
The lower the sample size, the bigger the problem of inaccuracies you face. Sampled data can show some ups and downs in your reports, but not much more. If you are serious about growing your business, you need solid numbers and reliable insights rather than guesswork.
Let’s be honest: only with 100% of data can you be fully confident that your reports are correct. Accept nothing less than that, and avoid sampling at all costs.
If your tool allows automatic data sampling when you reach your monthly limit of hits, then you have two options. You either need to upgrade to a plan with a higher data allowance or start looking for another tool that comes without sampling.
Always treat sampled data with caution. It can provide you with useful reports, but only of a very general nature. More granular data, like conversion rates or revenue, should by no means be sampled.
Tired of too many online ads? Then you are probably already familiar with ad blocks, smart little pieces of software you can install to prevent ads from cluttering pages and which stop your data from being sent back by third parties. They are available as desktop browser extensions or as mobile apps.
PageFair and Adobe research confirm that install rates of ad blocks are continuously on the rise:
Figure 4: The install rates of ad blocks are continuously on the rise.
Practical as they may seem from the user’s point of view, ad blocks can badly damage your business analytics data. This happens because they can prevent pages from rendering properly, thereby hamstringing your analytics platform through the processes of element hiding and asset blocking. And since these ad blocks are becoming ubiquitous, you may not even know how little your reports have in common with reality.
Another problem is the growing phenomenon of third-party cookies rejection. Web analytics relies on cookies for doing the job of providing you with insights. One of the key attributes of a cookie is its host. We speak of a third-party cookie when the host name doesn’t match the domain in the browser’s address bar at the time it is set or retrieved. This seemingly minor detail can have a major impact on your data accuracy.
Some reports suggest that third-party cookie rejection is on the rise. Increasing numbers of users are manually blocking or even deleting all third-party cookies. According to stats provided by Webtrends, this may account for anywhere from 12% to 18% of all Internet users. Some common problems include inaccurate visitor, retention-based, e-commerce and conversion metrics, as well as unreliable campaigns and search reporting.
Figure 5: Third-party cookies can pose quite a challenge to your analytics data accuracy
Adblock can instruct your browser to hide or avoid downloading any assets from URLs that include specific keywords or expressions referring to advertising or analytics. This basically means your cloud-hosted analytics data can suffer from inaccuracies. To discover how many of your users deploy these tools you could go for solutions like Adblock Analytics.
As a rule of thumb, we recommend that you consider hosting your analytics files locally or deploying a self-hosted platform. Comparing your website’s performance using data from on-premises and cloud-hosted instances can give you a rough idea of how adblocks might be impacting your results.
And when it comes to cookie rejection, as a rule of thumb, steer away from platforms that use third-party cookies by default. Go for tools that deploy first-party cookies instead, as fewer people block them, while anti-spyware software and privacy settings do not usually target them.
So before anything else, find out if your platform uses first- or third-party cookies. Check if you have alternative ways of user tracking with cookies disabled, and see how this impacts your data accuracy.
The majority of analytics platforms let you tinker with the amount and types of data you can view. Such filters are a powerful feature allowing you to limit and modify the numbers that you get.
Figure 6: Filters in Google Analytics are available in Admin section
A common example is excluding traffic from particular IP addresses, such as your home or office. Most websites now use a filter that removes company traffic from reports, as your employees and customers behave in very different ways.
Figure 7: Examples of items you can filter out or add in Piwik PRO.
Practical as filters may seem, if used incorrectly they can absolutely skew your data for good. This is because once your filters or settings are applied to raw data, there’s no going back. After everything has been implemented and the data is flowing in accordance with the new rules, you know it’s over.
What if you select the wrong option or insert an incorrect parameter by accident? All it takes is one little slip-up, which is so easy to do, especially if you are choosing from competing options on a dropdown menu.
Again, better safe than sorry. Set up safety procedures in advance. Rules can be as short and sweet as something like this: think twice before adding any new filters, then verify and test your preferences, double-check your settings, and only then click "Save". That should do the job.
As you can see, there are many things that can skew your data. So if you want to see the whole picture, it’s best to take preventative action in advance.
Awareness of these problems is the first step on the road to ensuring your analytics data’s accuracy. We hope that the tips outlined in this guide will help you to protect your analytics insights from the curse of damaged data.
Along that path, you definitely want to avoid tools that come with data sampling, heavy referral spam or default usage of third-party cookies, as these can be detrimental to your insights. Taking the right precautions will also help, so pay close attention to how you apply data filters.
Published on 01/19/2017 by Ewa Bałazińska.
Ewa Agata Bałazińska is a Content & Communications Manager at Piwik PRO, the enterprise analytics and tag management suite. Passionate about online culture and business, Ewa obtained an MA in Digital Media at Goldsmiths, London. In her work, Ewa mainly researches and implements projects in marketing technology, data privacy and startups. She regularly writes for industry blogs and media.Become a guest author »