When it comes to optimizing your website for search engines, every detail matters. One often-overlooked yet crucial aspect of search engine optimization (SEO) is the *robots.txt file.
This simple text file can significantly impact how search engine crawlers interact with your website. In this guide, we’ll delve into robots.txt and explore how to use it to improve your website’s visibility in search engine results pages (SERPs)
What Is a Robots.txt File?
A robots.txt file is like a signpost for search engine bots when they visit your website.
It tells them which parts of your site they’re allowed to explore .
Which areas they should stay away from.This file helps you control how search engines see and rank your website.
Why Is Robots txt Important for SEO?
- Control Over Crawling: Robots.txt provides webmasters with control over which pages and sections of their website are crawled by search engine bots.
- Preventing Duplicate Content: Search engines like Google penalize websites for duplicate content. By using robots.txt effectively, you can prevent duplicate pages from being indexed.
- Protecting Sensitive Data: robots.txt can be used to block search engines from indexing and displaying sensitive information in search results.
- Crawl Budget Management: Search engines allocate a specific budget for crawling your site. By guiding crawlers through robots.txt, you can ensure they prioritize important pages, helping to maximize your crawl budget.
Syntax
The syntax of a robots.txt
file is relatively simple. It consists of two main components: User-agent and Disallow/Allow directives. Here’s a breakdown of the syntax:
- This line specifies the web crawlers or user agents (search engine bot)
- Disallow: which parts of your website should not be crawled by the specified user agent
- Allow: specifies that a particular user agent is allowed to crawl a specific area
- Comments: You can include comments in your
robots.txt
file by using the “#” symbol. - Crawl-delay: is used to specify the amount of time (in seconds) that web crawlers or user agents should wait between successive requests to your website’s pages.
How to Create a Robots txt File ?
file is relatively simple. Follow these steps to create one for your website:
Open a Text Editor
Use a plain text editor like Notepad (Windows), TextEdit (Mac), or any code editor you prefer.
Define User-Agents
User-agents are search engine bots or web crawlers.
Here’s a list of the user-agents you can use in your robots.txt file
Baidu | baiduspider |
Yahoo | slurp |
Bing | bingbot |
Faceobook | facebot |
Googlebot | |
Yandex | yandex |
Duck duck | Duckduckbot |
Specify which user-agent you want to give instructions to. For example, if you want to address Google’s crawler, you’d write:
User-agent: Googlebot
If you want to address all web crawlers, you can use an asterisk (*):
User-agent: *
Set Permissions
After specifying the user-agent, you can set permissions for that user-agent.
Use the “Allow” and “Disallow” to specify which parts of your website should be crawled or excluded.
User-agent: Googlebot
Disallow: /admin/
Allow: /public/
In this example, Googlebot is disallowed from accessing the “/admin/” directory.
But is allowed to crawl the “/public/” directory.
User-agent: *
Crawl-delay: 10
the “*” symbol indicates that the directive applies to all user agents. The “Crawl-delay: 10″ line specifies that there should be a 10-second delay between requests from the web crawler
# Allow Googlebot
User-agent: Googlebot
Allow: / # Allow Googlebot to crawl the entire site
# Allow Bingbot
User-agent: Bingbot
Allow: / # Allow Bingbot to crawl the entire site
# Allow MobileBot
User-agent: MobileBot
Allow: / # Allow MobileBot to crawl the entire site
# Sitemaps
Sitemap: https://www.example.com/googlebot_sitemap.xml # Sitemap for Googlebot
Sitemap: https://www.example.com/bingbot_sitemap.xml # Sitemap for Bingbot
Sitemap: https://www.example.com/mobilebot_sitemap.xml # Sitemap for MobileBot
- The
robot txt
file starts by allowing Googlebot, Bingbot, and MobileBot access to the entire site by using the “Allow: /” directive for each of them. - Then, it specifies the sitemaps for each bot using the “Sitemap” directive. Each bot’s sitemap URL is provided, allowing the respective bot to find and crawl the URLs listed in their specific sitemaps.
You can use a generator tool to generate robots.txt by yourself.
Upload the Robots.txt File
You’ll need access to your website’s server or hosting account
- Launch your FTP client and connect to your server then upload * robots.txt file to the root directory of your website
- Locate the
robots.txt
file on your computer, select it, and then drag it into the root directory of your website using your browser - If you’re using a CMS like WordPress, you can use Yoast plugin to create one
- Ensure the file is uploaded correctly, open a web browser and enter your website’s URL followed by
/robots.txt
(e.g.,https://www.yourwebsite.com/robots.txt
). You should be able to view the content of yourrobots.txt
file in the browser.
Checking Robots.txt file
Here’s how you can test your * robots.txt
file:
- The robots.txt Tester in Search Console
- Semrush’s Site Audit tool can check for issues regarding your file.
Robots txt Best Practices
- Use Specific User-Agents
- Use ‘$’ to Indicate the End of a URL
- Use the Hash (#) to Add Comments
- Use Separate Files for Different Subdomains
- Regularly Update Your File