A sitemap.xml is a list of all URLs of your website, and is machine-readable. It helps Google crawl and index your website efficiently. In this article we show you how you can create, monitor and optimize your sitemap.xml file.
A sitemap.xml is a file that exists in XML format. It can be read by search engines and contains all URLs of your website, as well as information about when a URL was modified and how often this URL is updated.
The XML Sitemap is important because it can help Google or other search engines bots crawl and index your site. With the sitemap, the Googlebot can easily crawl deeper directories of your website.
An XML sitemap is also useful in the following cases:
New directory: If you have created a new directory in your website, these URLs are not yet linked externally, meaning the Googlebot cannot easily access these URLs. Furthermore, the crawler may not yet have discovered these new URLs via internal links on your page. With the sitemap, you are giving the Googlebot a clear indication that you have new content.
New website: Creating an XML sitemap is particularly important for completely new websites. You can use it to inform Google about all existing URLs so that the Googlebot can index them bit by bit.
Very large websites: If your website has many directories and many URLs, it can take a very long time for the Googlebot to crawl and index all URLs. If there is an XML sitemap, there is a higher chance that the URLs it contains will also be crawled.
However, bear in mind that even with a sitemap, you can’t guarantee that the URLs listed will actually be indexed.
<?xml version="1.0" encoding="UTF-8"?>
In the first line, there is an indication in which XML version the sitemap was created. This is followed by the reference to the coding used.
The second line indicates the schema according to which the sitemap.xml is created.
Then follow the URL elements, which contain a loc tag and can contain further tags. Common tags are … (specify the frequency at which the URL changes) and … (specify how the Googlebot should prioritize the URLs when crawling).
You should watch out for the following points when creating a sitemap:
Use only complete URLs and no relative URLs. The Googlebot cannot process relative URLs. A complete URL is: https://www.yourwebsite.com/directory and not /directory.
Store URLs without session IDs, otherwise the Googlebot might crawl your URLs several times. Additionally, multiple indexing can result in duplicate content.
Code your sitemap in UTF-8. This makes URLs with special characters machine-readable.
For 50,000 URLs or more, you should create a second sitemap.
If you create several sitemaps, you should generate a sitemap index file.
You can usually generate a sitemap.xml with common CMS like WordPress or Typo3. There are also many plugins that act as sitemap generators.
After you have created the sitemap, you store it in the root directory of your domain, ideally: www.yourwebsite.com/sitemap.xml
Then, to inform the Googlebot about your sitemap, you should submit it to the Google Search Console. Log into your Google Search Console account, click on "Sitemaps", and then enter the URL path in the field provided. Then click "Submit" and your XML Sitemap will be retrieved by Google and analyzed in the Search Console.
Figure 1: Submit sitemap to the Google Search Console
If you use the BING Webmaster Tools, you can also submit the sitemap to them.
Once you have created your sitemap.xml on your domain, you can monitor and analyze the file with Ryte Website Success. The sitemap reports can be found in Website Success, under "Sitemaps".
First you check with Ryte whether all important URLs are contained in your Sitemap.xml. Click on "Included in Sitemaps".
Figure 2: Check that all URLs are included in the sitemap
Then, you can retrieve the status codes of the URLs that are stored in the sitemap. It is important, for example, to check whether there are still URLs in the sitemap that have already been deleted on the server. You can also use this report to quickly determine whether your sitemap contains URLs that redirect to another URL via 301-Redirect.
Figure 3: Find the status codes of your URLs
With the priority settings, Ryte Website Success shows you which URLs in the XML Sitemap have been assigned the highest priority. This gives you a quick overview of which URLs Googlebot should give priority to.
Figure 4: Check the importance of each URL
When you create an XML sitemap, you cannot often track changes to the file. However, Ryte Website Success can tell you exactly when the XML file was last modified. To do this, simply click on "Last changes".
In addition to the standard reports, you can also analyze the Sitemap.xml with further filters, for example you can find URLs which have been excluded from indexing with the noindex tag.
Figure 5: Find URLs in the sitemap that have a noindex tag
Or you can check with Ryte whether your sitemap lists URLs that contain a canonical that points to another file.
Figure 7: Filter URLs with canonical tags
With Ryte's sitemap reports you can quickly find out if and where you should customize your XML file. Using the reports you can either modify the entire configuration of your sitemap or manually delete or adjust individual URLs from your file.
Published on Jun 12, 2019 by Philipp Roos