Customers using the JavaScript crawler are also now able to exclude and include URLs from their website analysis. This functionality enables you to tailor your analysis specifically to your individual needs.
This feature has been popular among our users for a long time. Now it’s also available to customers using the JavaScript crawler. With this feature, you can choose to include or exclude parts of your domain from your crawl, giving you more flexibility in your analysis. You can find the function in the project settings, under advanced analysis -> Include or exclude URLs.
When building the new function, we took user feedback into account to ensure everyone could use it easily.
We have created pre-defined filters for the different components of a URL: protocol, domain, path, and query string. So, all you need to do is choose from the dropdown list, and enter the respective protocol, domain, URL path, or query string that you want to include or exclude.
The following filter options are available:
is
Contains
Starts with
Ends with
Doesn’t equal
Doesn’t contain
Doesn’t start with
Doesn’t end with
When you’ve set up your desired filters, click "save".
We recommend setting up the whitelist/blacklist function when you set up a new project so that you can track the progress of your optimizations more easily. Suddenly changing the number of URLs crawled would skew your data.
For example, if you want to exclude assets from your website crawl, this will affect your performance score in the Web Vitals Report - if the page is crawled without assets, it will load faster and have a better performance.
This feature can help you tailor your website analysis specifically to your needs. It will also help you save crawl budget as you can exclude URLs that are not relevant for your individual work.
These are some specific cases where this feature can help make your analysis tailored specifically to your needs:
Analyze specific subdomains
In this example, both the English and German subdomains of the Ryte website will be analyzed, using "Domain is en.ryte.com" and "Domain is de.ryte.com"
Analyze a subdomain, but focus on or exclude a directory
In this example, we analyze the subdomain en.ryte.com and the subdirectory "wiki":
Ignore specific file types such as .jpg .pdf .gif
In this example, we exclude URLs with file types .jpg, .pdf. and .gif:
Published on Apr 15, 2020 by Olivia Willson