Syntax
User-Agent:* (Agent Name)
Disallow: / (File Path)
In the above lines, Agent Name is to be replaced by name of any search engine bots which you wish to exclude and the file path is to be replaced by the absolute url of the file which you wish to exclude.
Example 1
User-Agent:*
Disallow: /
The above contents would disallow all the robots from accessing the server and thereby stopping them from accessing the files contained therein.
Example 2
User-Agent: Google
Disallow: /
The above contents would disallow Google bot from accessing the
server and thereby stopping it from accessing the files contained
therein.
When should you use Robots.txt ?
A pretty useful question, right? Well, you must use robots.txt if there are some special scripts in your server which you do not want the bots to access or if you want any specific bots not to crawl the contents of the site. For smaller sites of less than hundred pages , there is rarely any need for robots.txt as there are no special scripts to hide. But for larger sites that have huge databases associated with them, they may be some special pages or scripts which needs to be hidden from the bots. In that case, you must use this robots.txt file.
I have created robots.txt, now my secret pages are safe!
I have heard many people say this but in reality, robots.txt works only for obedient robots. The instructions contained in the file may or may not be followed by the search engine bots. The obedient ones will follow it and would not crawl the secret pages while the unobedient ones would disallow the instructions and can begin crawling. So if you want to keep the pages out of index, use no index meta tag. For more information on blocking pages from search engines visit:- How to block pages from search engines.
Robots.txt Example Entries and Use of Wildcards
1- To disallow crawling of a folder named “Abs”
User-agent: *
Disallow: /Abs/
2- To block a page named “Soc.html”
User-agent: *
Disallow: /Soc.html
3- To block web pages that has file name ending with php
User-agent: *
Disallow: /*.php$
4- To block googlebot (Google’s crawling agent) from accessing contents in the folder named “B”
User-agent: googlebot
Disallow: /B/
5- To disallow all .jpg extension images from the crawlers
User-agent: *
Disallow: *.jpg
6- To exclude a file named “joker.php” contained in the folder named “circus”
User-agent: *
Disallow: /circus/joker.php
7- To prevent the Googlebot-Image from accessing images on your site
User-agent: Googlebot-Image
Disallow: /
Tricky Question??
What will the following entry do?
User-agent: *
Disallow:
Answer- It will allow the crawlers to access every folder and every web page on your server because you have not mentioned any folder or file name to be disallowed.
Free Robots.txt Generators
There are some free tools on the web which can help you in creating your own robots.txt file. These tools are given below:-
Seobook robots.txt generator
Advanced robots.txt generator (Software free download)
Yellowpipe robots.txt generator
Seochat robots generator
1 hit robots generator
I like your post and the way you are explaining about robots.txt. It is nice post and contain genuine data for learning how we create robots.txt file for a site.