Home » Test Your Robot

Test Your Robot

A little bit of tech talk today. Embracing the digital end of the spectrum of this blog, rather than the behavioral aspect.

Did you know about the Google robots.txt tester? This tool determines is the server robots.txt file is accidentally blocking Googlebot from crawling the site, or alternatively that it is crawling files you don’t want being crawled.

Googlebot is a user-agent which is specific search engine robot that crawls – or indexes – files and creates a lit of the files on the web server so that the search engine can then archive it in a database for searches.

Enter theĀ  robots.txt file content and then the tool reads it in the same way Googlebot does, and lists the effects of the file and any problems found.

Test a site’s robots.txt file:

  1. On the Webmaster Tools Home page, click the site you want – if you do not see a list of sites, you will have to add them – this being your first time in.
  2. Under Crawl, click Blocked URLs.
  3. If it’s not already selected, click the Test robots.txt tab.
  4. Copy the content of your robots.txt file, and paste it into the first box.
  5. In the URLs box, list the site to test against.
  6. In the User-agents list, select the user-agents you want.

Remember save any changes, you’ll need to copy the contents and paste them into your robots.txt file.

This tool provides results only for Google user-agents (such as Googlebot). Other bots may not interpret the robots.txt file in the same way. For instance, Googlebot supports an extended definition of the standard robots.txt protocol. It understands Allow: directives, as well as some pattern matching. So while the tool shows lines that include these extensions as understood, remember that this applies only to Googlebot and not necessarily to other bots that may crawl your site.

robots_txt_explained_500w