What is a Robots.txt File?
Robots.txt is a text file that is put on the site to tell search robots which web pages you want them to visit. These text files are not necessary for search engines do what they are asked not to do. But Robots.Txt is by no means the way to prevent search engine from crawling your site. It is neither a firewall nor any password protection. Let us take an example which will make you understand the purpose of Robots.txt. Putting robots.txt text file is very similar to putting a note “Kindly, do not enter” on an unlocked door. This would prevent thieves from entering inside but good people will not enter the door or open door. This is the reason why robots.txt file isn't meant for sensitive data as you cannot rely on robots.txt to keep it from being indexed and shown in search results.
Robots.txt should be in the main directory since search engines won’t find it. Search engines do not search the whole website for robots.txt. What user agents do is that they look in the main directory first and if robots.txt is not found in there, they assume that the site does not contain robots.txt file and thus they index everything that comes in the way. So, if robots.txt is not placed in the right place, search engine may index whole of the site. The structure of this text file is very simple and a never-ending list of user agents and forbidden directories and files.
Robots.txt text file is placed on your server to tell search engine spiders not to index certain pages of your site. It can also be used to prevent certain areas of the site from indexing or can be used to issue individual indexing guidelines to particular search engines. Most of the search engines look for robots.txt file when their spiders or bots arrive at your site. Therefore, if you need not exclude the spiders now from any part of your site, having robots.txt file would be great option as it would act as an invite to them to your site. This file is a simple text file and can be created in a notepad or whichever text editor you like. It should be saved to your site’s root directory i.e. the directory where your index page or home page is.
Robots.txt is basically a set of instructions for visiting search engine spiders or robots that index the content of your site. This robots.txt file should reside in the root directory of your web. Robots.txt file is a good way to preventing the web page form indexing. But not every site can use it. Spiders will read only those robots.txt files which are at the top html directory of your server. Robots.txt file are important to avoid wastage of server resources, to save bandwidth, to remove clutter from your web statistics, and to refuse a particular spider to index your site. Robots.txt is merely a text file for implementing standard for robot exclusion.