SEO Site Audits are a key offering here at WebFuel. Accordingly, we need to know all about the nuances of a little file called a robots.txt.

A little history: The robots.txt protocol was established almost 20 years ago (19 to be exact). It’s a methodology with which website owners identify what parts of their site should be crawled and indexed by the plethora of Spiders that scour the Internet.

Your Average Search Engine Spider (BOT)

Your Average Search Engine Spider (BOT)

At least that’s the idea. A lot has happened since 1994. There are now thousands of BOTS constantly scanning the Internet. (Not all of them are looking for fresh content though). Most of the reputable BOTS (Spiders) adhere to the cryptic instructions laid out in the robots.txt file. But they only do so to a point.

Through the hundreds of SEO Site audits we have completed, we have discovered that the robots.txt file is oftentimes a dumping ground for sometimes useless, and often self-defeating instructions. Let me explain. Just because you indicate that you don’t want a certain directory crawled does not mean that Google (Bing, etc…) will not crawl and index that content. It’s really only a “request”. In the end Google et al may or may not follow your request. Our experience has been that they more often then not they ignore such requests. This can cause many SEO headaches. Some of which can be quite embarrassing. We’ve seen plenty of content a site owner thought was “hidden” to be publicly available. IE a directory called “/newproductlaunch” (No kidding!). Savvy competitors will look at your robots.txt file for cracks. They often find them…. thanks to the carelessness of the person who created the file in the first place.

A robots.txt file was meant to be very simple. Use it to indicate that robots (Spiders – BOTs) have full access to all files on your website and to direct robots to your sitemap.xml file. That’s it. Anything else you do here may be inconsistent & possibly ineffective. Unfortunately, the file is often implemented by people who do not understand it’s use and are very often not around to see the results of it’s implementation (enter a SEO professional).

There are numerous SEO strategies that can be implemented which will present your content to the Search Engines in the best possible light. These strategies, if implemented strategically and correctly, will get you the most positive online traction possible.

Think tagging…. Think strategic use of the Noindex tag, the Canonical Tag, 301 Redirects, and in some cases simple Password Protection of certain content. And so on…

Have you looked at your robots.txt file recently?

Tagged with:  
  • Brock Murray

    The dreaded robots.txt!