As soon as a website is online, every time the server is accessed is recorded in log files. The more visitors to your website, the more data you can evaluate with the help of a log file analysis. In this article, we explain how this works, advantages of log file analysis, and what you need to pay attention to.
Every time your web server is accessed, a data entry is automatically created. This data collection is called a log file and is usually stored in the root directory of your website. Analyzing these files is known as log file analysis.
The server can store the following data in a log file:
Up to ten different data points can be logged via a single access to a server.
You can download the log files directly from the server and view and process them with a tool such as Excel or Google sheets.
A log file analysis is an essential part of SEO, and should always be carried out as well as an SEO audit. This is because a log file analysis can show some valuable information that you can't get from a standard SEO audit. Whilst the focus of SEO is on user experience - if users can enjoy using your site, this sends positive signals to search engines, a log file analysis can give in depth information about bot behavior on your site. A log file analysis can help ensure that the search engine bots can crawl and index your site efficiently.
The data you gain from log file analysis helps you draw important conclusions about your website’s performance, as well as giving you insights about how search engine bots, including the Googlebot, behave on your site.
It is important to limit the amount of data recorded in log files, as they can take up a lot of storage space. The easiest way to do this is to limit the period under consideration. Often it is enough to look at a time window of one week, or check the log files after you have made changes to the page, or after a Google update.
You can always analyze all data at once in any time period, but you should keep in mind that you need the necessary resources to do this.
How is your website crawled?
If you identify the Googlebot as a client, you can use log file analysis to find out how the bot handles your URLs. For example, you can see how often it crawls which pages. You can also see how the search engine bot handles parameters.
How fast is my website?
There are many tools, such as Google Analytics or Ryte, that show you how fast your server reacts. But a log file analysis gives you deeper insights, and shows you how long it takes for a bot to download a resource from your server. A fast page speed is vital for SEO - on the one hand it will help ensure a good user experience, and on the other hand page speed has officially been named as a ranking factor by Google.
Are there indexing problems?
A log file analysis can show you exactly whether bots download your pages completely or only partially. The access logs also help you to identify poorly performing pages. If the Googlebot cannot load your URLs completely, there may be technical problems that prevent it from crawling. For example, the robots.txt file, which can prevent areas of the website from being crawled, could contain errors.
Which bots crawl your page?
Your website will be crawled by many different search engine bots, for example Google desktop-bot, the Google mobile-bot, or the Bing bot. The activity of these crawlers can show you whether your site is crawled for mobile content.
How often is a URL crawled?
You can use the crawl frequency to see whether your URLs are relevant for the Googlebot. URLs that are less important for Google are crawled less often. You can then derive optimization measures from this - for example, URLs that are never crawled, or very infrequently crawled could be deleted and redirected to a similar URL that is crawled more often. For the Googlebot to make optimal use of its crawl budget, the topicality of a URL also plays an important role. That's why you should, for example, regularly supplement or update content to increase the crawl frequency of the Googlebot.
Are there crawling problems?
The server response, which is also stored in the log files, tells you whether bots encounter problems when crawling your URLs. You also get hints about possible forwarding chains or incorrect URLs. Redirect chains prevent the Googlebot from optimally crawling your page so that crawling is interrupted in case of doubt.
You can carry out your log file analysis on the basis of these questions. Then you can derive measures to improve your website's technical SEO from the data.
The IP addresses, the browser used and the operating system used by the user are stored in the log files, so you must take data protection into account when processing this data.
This is what you should do:
Adapt data protection regulations: Inform your users in the data protection conditions that you are evaluating this data. You should also include that the data records are used exclusively for statistical purposes and will not be passed on to third parties. If you use a tool for this purpose, you must have a corresponding data processing contract in accordance with GDPR.
No linking to other data: For data protection reasons, you cannot merge or link the data from the log file analysis with other data, e.g. with personal customer data.
Anonymize IP addresses: In order to comply with high data protection standards, you should anonymise the IP addresses or have them stored anonymously by your server.
You can analyze a lot of data with a log file analysis. However, problems can arise - here we list possible sources of error and how you can solve them:
Crawlers from SEO-tools pose as Googlebot
In this case, an analysis of your log files can lead to incorrect data. You can fix this error by performing a DNS lookup. This will tell you whether it is really a Googlebot or other bots.
The amount of data increases and can no longer be processed by you.
If you run a larger website and around 1,000 visitors access your server every day, more than 1,000 entries are created in the log file, including bots' access. Within a month, more than 30,000 rows will be added to an Excel file. But what if you receive ten times the amount of traffic? Tools such as Excel or Google spreadsheets will reach their limits at just over a million lines as a maximum. Simple processes such as sorting or counting functions then require so much computing power that this can no longer be handled with conventional RAMs. Only cloud services for larger arithmetic operations can provide a remedy.
Not all relevant data is collected
You have the possibility to individualize the type of data stored, but this runs the risk that the log file storage is not set up correctly, and too little data is collected, or your lists are incomplete. Therefore, you should always evaluate the log files regularly and check them for completeness.
Logfiles are insufficient for the analysis of user behavior.
The log file analysis is primarily an analysis of bot accesses, because the data records are not sufficient for a detailed user analysis. For this reason, log files can only be a supplement to existing web analysis tools, but cannot replace them in user analysis.
Log file analysis is hugely important for SEO. Log files provide you with a huge database that is automatically stored by the server, allowing website operators to analyze user and bot behavior. Unfortunately it can be difficult to access this data.
The collected data can be processed and analyzed on small websites with standard table processing like Excel or Google tables. However, once you have several thousand hits per day, you need very large resources to analyze the log files efficiently. Without appropriate tools or computing power you will reach your limits. You also have to take GDPR into consideration, and ensure that user sessions do not fall into the wrong hands. All this means using a high amount of resources (employees, technology, time), which leads to both high costs and irregular analyses due to the effort and cost involved.
Ryte provides you with a fast, simple alternative to the classic log file analysis.
Ryte's bot logs are based on two key principles: intelligent data selection and intelligent technology.
The data selection ensures that only the bot traffic is processed, so the incoming user sessions are ignored. This reduces the amount of data to be analyzed, helps you to focus on bot behavior, and alleviates concerns regarding GDPR and data protection.
In addition, BotLogs allows easy access to the extracted data via a clear dashboard with visualizations of the most important data.
Ryte BotLogs makes it easy for you to view your URLs from the perspective of a search engine bot, analyze them professionally and optimize your crawl budget. In addition, the tool is GDPR compliant, and you have unlimited access to historical data. Find out more in this article.
BotLogs - the smart alternative to log file analysis
Published on 03/08/2019 by Philipp Roos.
Philipp is an extended member of the Ryte family and supports Ryte with the latest SEO know-how and digital marketing news.