SEO Horror Stories: What are the most common fundamental mistakes to avoid in 2016?

Nearly three months ago, to celebrate ‘Halloween’ in the best SEO way, I decided to start sharing some day-to-day SEO “horror” stories with the hashtag “#SEOhorrorStories”.

The hashtag "#SEOhorrorStories" soon started trending when hundreds of SEOs around the world began to share their own stories not only in English but also in Spanish, German and French, with even Matt Cutts and the official Google Analytics account participating – you can see a little more of that in this post I made later, commenting on the participation.

However, it was curious for me that so many of these stories that shared "extreme" cases of SEO issues – and even though some were clearly humorous – were characterised as being:

Essential for Implementation: The vast majority of the stories dealt with “basic” problems that were really simple to implement or solve in an SEO process: errors in crawling, indexing, content, penalties, etc.
Not only concerning implementation but also process management: Interestingly, many of the stories did not focus on the implementation of the process, but rather on it’s customer management, especially “unreasonable” requests and problems of expectation during the SEO process, arising from a lack of knowledge of the area by customers, company “bosses” and “non-technical” business decision makers.
Common and repetitive: The same stories that SEOs in the United States shared were also shared in Europe, over and over again.
Entirely avoidable: Interestingly, the vast majority are problems which have a solution and should be part of the SEO process.

This is why I would like to share different scenarios of some of the SEO horror stories that were shared and discuss how they can be prevented so that we can avoid them as much as possible – not a bad resolution for SEO in 2016:

1.Blocking the site crawling from search engines

Blocking a site to search engines might be caused by completely different scenarios: Either on purpose to try to avoid over-loading the server:

Or by mistake at a robots.txt level, when launching a website to production forgetting to remove its blocking:

How can this be avoided?

This can largely be avoided by using a monitoring system such as the one offered by Ryte with its “Monitoring” functionality, which does not just send alerts via email when the robots.txt is updated, but also when the server is down.

2. Deindexation of sites

This issue usually happens after a redesign or migration, when it is often forgotten to change the robots meta tag to enable the site indexation; however, this can also occur at any time, when updating the website:

How can this be avoided?

It is essential to monitor the changes of the critical elements of the site code, as well as the content of the website pages with tools like Versionista, OnWebChange or NetWatcher, which will alert us and allow us to see the changes that are generated in them.

3.Open test or pre-production environments

One of the first questions that should be asked when starting a SEO process is where the pre-production or test environment is located, not only to know if the changes can be validated before launching them (which is ideal), but also to ensure that is not accessible to search engines.

How can this be avoided?

The test environment should be blocked not only through the robots.txt but also by only allowing access with a password or being accessible only through a specific DNS.

As configurations can be changed by accident, in addition to monitoring the robots.txt, I configure alerts with Google Alerts for a “site:” search of the test or pre-production environment that is usually on a subdomain. Like this I’m notified in case its content indexation is identified:

4. Internal site content duplication

It is very common to find websites that are not redirecting or canonicalising to their original version, which is usually identified at the time of the initial SEO audit, when you check whether there is an internal content duplication issue on the website.

However, the reality is that site changes and continuous updates can cause this problem at any time.

How can this be avoided?

SEO crawlers will facilitate this validation by offering a report directly listing pages with duplicate content:

It is therefore essential to schedule frequent & continuous validations to avoid that changes we have not been alerted about, generate new content duplication issues.

This can occur, for example, when migrating from http to https, without you even being notified:

Or when someone publishes the same content on multiple pages without letting you know:

To tackle this, you can schedule frequent crawls – a function offered by most SEO crawlers such as OnPage Zoom – so that once the SEO process has begun it continuously checks for internal duplicate content that could be generated along the way:

5. Website migration without redirects or SEO validation

This is one of the scenarios that generates duplicate content, but it is so common – and is a problem in itself that has other consequences too -that deserves a separate point and can be summarised in these two tweets:

And not just forgetting or not implementing the 301 redirects from the old URLs to the new ones - referring each page to its new version - but also updating the links, XML sitemaps, registering the new site on the Google Search Console or notifying the change of address in Google Search Console, among other steps that Google itself specifies in this best practices guide for migration.

How can this be avoided?

By following an action plan that must start before launching, when planning the website migration, and not just at the time of and after the launch.

Besides Google’s best practices, here you have a couple of resources and guides for an SEO-friendly website migration to avoid losing all your organic traffic and rankings:

6. Non-relevant, chained or looped redirects

As happens with the blocking or erroneous deindexation, problems with redirects are common when redesigning websites, and not just by forgetting to migrate the old URLs to the new ones, but also by not doing it in the relevant way.

For example, massively redirecting mobile users (and search engine robots) to the desktop version of the site:

Redirecting to error pages:

Redirecting chains not just with permanent 301 redirects, but also with temporary 302 redirect chains, which do not transfer the popularity of the old page to the new one:

How can this be avoided?

Similar to the above problems, these types of redirects issues are usually identified and solved when doing the technical audit at the beginning of the SEO process, using a crawler such as, in this case, OnPage, that directly shows us the type of redirects and the URLs to which they are being redirected:

Additionally, in the Google Search Console we can identify non-relevant redirects that are marked as "Soft 404 errors.", pages that are redirected to those with errors would be in the "Not found" section, those that redirect to blocked pages would be found under "Access Denied", and those that are redirected using 302 in "URLs not followed" in the "Crawl Errors" Report.

What’s most important is to not forget to check this setting regularly with re-scans and frequent validations; and to not only solve the incidents that have been identified, but also figuring out what causes the redirect issues.

7. Canonicalization to erroneous or non-relevant URLs

Something similar happens with erroneous canonicalizations, when canonical tags point to pages that are not their original versions, which can happen in many different scenarios:

When all the website's pages’ canonical tags point to the same URL, for example, the home page:

When IP numbers are included instead of the site domain name:

When canonical tags are kept in the pages of a newly launched site pointing to their preproduction URLs:

Or when an international version of a website is launched on a new ccTLD that canonicalizes to the initial gTLD:

How can this be avoided?

These problems with canonical tags are usually also identified at the time of doing the technical audit with an SEO crawler, verifying which pages are canonicalized to others (those that are not pointing to themselves) and to which URLs are they pointing to, in order to check if they really are their original version, if they generate errors, if they point to another website, or other URLs that are not relevant, etc.:

Likewise, recurring scheduled scans should be enabled so they are regularly validated.

8.Hiding content with CSS

Hiding content as a cloaking technique is a "classic" (of what’s not recommended as it goes against Google guidelines, of course), however it is still used – sometimes on purpose, sometimes not – to show different content to the search engines than the users:

How can this be avoided?

As fundamental as it might seem, it is essential to always check how the page content is indexed (especially at the beginning of an SEO process with a new website) and what is shown in the search engine cache itself, in the text version:

As well as using the functionality of “Fetch as Google” of the Google Search Console to verify potential differences in the content of the site:

Once you have started the process, when monitoring content and HTML changes of the pages as discussed above in section 2, we can be alerted of potential changes that occur in the coding or content of the pages, focused in this case on hiding and "cloaking" it.

9. Total blockage of content on CDNs

Making use of a CDN can really help improve the speed of a website. However, without proper configuration you may risk of either indexing the copies of the website's content usually enabled in subdomains for the CDN, or, blocking entirely the crawling of the subdomains, including the access to files distributed through them that should be accessible to search engines, such as images, JS & CSS -otherwise, the pages will not be rendered correctly.

How can this be avoided?

The easiest way to avoid this problem is by using the settings focused on SEO that CDNs usually offer, and enable the inclusion of a canonical header to the files that are “duplicated” through the CDN's different subdomains so they point to their original URLs.

That should be enough, but additionally, if you want to prevent the crawling of the content on these subdomains and only leave it enabled for static files such as images, JS & CSS -which are the ones usually served through the CDNs- you can also configure a custom robots.txt for them.

To verify if the Googlebot can successfully access the content and files from your website you can make use of the functionality of “Fetch as Google” on the Google Search Console.

10. Websites depending on Ajax & Javascript to feature content

Google announced a few months ago that they we not recommending anymore their AJAX crawling scheme, as they could now render and process JavaScript, so they starting recommending instead following the principles of progressive enhancement.

The truth is that although Google has a greater ability to interpret JavaScript, this shouldn’t mean that we do not need to do anything and we can rely completely on this capacity, as we might end-up finding HTML with no content:

How can this be avoided?

To avoid this situation, it is recommended to verify the correct crawling and indexing of all the critical elements and areas of your website such as navigation and content; they should be always implemented directly in the HTML and do not rely on scripts.

If you use JavaScript frameworks such as AngularJS, it is recommended to create “snapshots” of the pages using Phantom.js or using services such as Prerender.io. Check out this post from BuiltVisible where they comment further on its implementation step by step.

A resolution for your SEO in 2016? Avoid these SEO Horror Stories!

Yes, they can be very funny if they do not happen to you, but it is definitely best to avoid them. Here's to a year 2016 without SEO horror stories!

Ryte users gain +93% clicks after 1 year. Learn how!

Book a demo

Published on Jan 28, 2016 by Aleyda Solis

Aleyda Solis

Ryte users gain +93% clicks after 1 year. Learn how!

Book a demo