What is robots.txt file?
Robots.txt is a text file that must be place into the root directory of your website so that search engine spiders may easily find it. In a very short answer for what is robots.txt file?
Robots.txt files inform search engine spiders that how to interact with indexing your content.
Search engines are very rapacious for the content. They crawl the web with the certain crawl rate to find, and index the high quality content automatically. Suppose, you didn’t submit a particular page to the search engine but still your page is being indexed on the search results that causes automatically indexing your page. Why it is happening? There are so many factors for it, but the main reason is that your page link is placed somewhere on the web, which is already index in the search engine so when the search engine spiders crawl that page, they visit all the inbound & outbound pages too, which causes to index your page in the search engines.
If you wish to no-index any part of your blog, such as pages, posts, feeds, directory, sub-directory and anything you can make a rule for the search engine spiders via robots.txt file. For example,
Allow indexing of everything
Disallow indexing of everything
I am using User-agent: * in the above example. User-agent means the search engine bot, and * means that these rules are for every search engine bots. Let we make a rule only for Google search result, then we will use User-agent: Googlebot that will be valid for Google bot.
Disallow Googlebot from indexing a folder, and allow entire website.
The above example shows that Googlebot will index your entire WordPress blog, except the content on the folder wp-content and its sub-folder (i.e., /wp-content/uploads).
Disallow Googlebot from indexing a folder, except for allowing the indexing of a sub-folder in that folder.
If you are familiar with WordPress, then you can easily go to your WordPress files and folders through your FTP account and look on the wp-content folder, which contains at least four sub-folders as plugins, themes, upgrade and uploads. It is not necessary to index the plugins, themes and upgrade folders into search engine results. The uploads folder has all the images of your WordPress blog posts so you may index it. The above example is perfect for this task for Google search engine. If you wish to apply this rule to all search engine spiders, then you can replace Googlebot with *.
Disallow indexing a folder, except allowing for indexing a file in that folder.
Advantage of robots.txt File in WordPress SEO Optimization
If you know something about SEO, then you would be familiar with Google panda & pengium update. Pengium targeted mostly the sites that have the duplicate contents. It may be for internal or external pages. Let’s explain that what is the duplicate content on internal pages.
Your blog has 25 posts in a month, and you don’t allow anyone to write on your blog so only you are the author of that 25 posts. The author archive and the monthly archive page has the same content which occurs duplicate content issue.
You may include the URLs to not be crawled by the bots via WordPress robots.txt files. Here is how to add robots.txt file to WordPress for better SEO.
robots.txt file Code to WordPress With SEO Optimization
We have generated a general robots.txt file with the consideration of SEO. You may copy all the code and paste to your robots.txt file. The code is,
Sitemap: http://www.yoursite.com/yourindex.xml Sitemap: http://www.yoursite.com/yourindex-video.xml User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /wp-content/ Disallow: /wp-includes/ Disallow: /go/ Disallow: /comments/feed/ Disallow: /feed/ Disallow: /trackback/ Disallow: /index.php Disallow: /xmlrpc.php Disallow: ?comments=* Disallow: /search? Disallow: /?p=* User-agent: Mediapartners-Google* Allow: / User-agent: Googlebot-Image Allow: /wp-content/uploads/ User-agent: Adsbot-Google Allow: / User-agent: Googlebot-Mobile Allow: /
Make sure to change yoursite, yourindex and yourindex-video with your own data. If you are using Go URL for your WordPress blog, then you may block these URLs by adding “Disallow: /go/” into your robots.txt file but if you are not using Go URL, then you can delete it from the above code.
Now, below is the method to edit robots.txt file, if you are using SEO by Yoast plugin.
Edit robots.txt file via SEO by Yoast Plugin
- Go to your WordPress dashboard admin panel.
- Click on SEO in the left menu, and then click on Edit files as shown in the image below,
- Once you clicked, you will see the robots.txt file editor like the below preview,paste your copied code here, and finally click on Save changes to Robots.txt.
Now, you may check your robots.txt file by going to http://www.yoursitename.com/robots.txt. Your robots.txt code will have updated. Below is another method to add robots.txt file via FileZilla FTP client.
How To Add robots.txt Files to WordPress by FileZilla FTP Client
Copy the above code and open the notepad in your desktop, and paste the above code inside your notepad, and rename this file to “robots” without quotes. Now you have to host your robots.txt file into the root directory of your WordPress blog.
For hosting your robots.txt file you need,
Once you open FileZilla, go to your root directory of your WordPress blog, and upload the robots.txt file from your desktop to your WordPress directory, Now, you may check your robots.txt file by going to http://www.yoursitename.com/robots.txt. Your robots.txt code will have updated.