Understanding and Configuring the robots.txt File for WordPress Websites

Robots.txt is a file that tells search engine robots which pages or sections of your website should not be crawled. When a robot visits your website, it first looks for a file called robots.txt before it starts crawling your website.

When you install WordPress, a file called robots.txt is created in the root directory of your site (e.g., yoursite.com/robots.txt). The default robots.txt file allows all web crawlers to access all pages on your site. However, you can edit this file to exclude specific pages or sections of your site from being included in search engine results.

When you set up your robots.txt file, it’s important to identify which pages or sections of your site you want to exclude from being crawled by search engines. Here are some examples of your robots.txt file:

This example allows all web crawlers to access all pages on the site, which is helpful for indexing.

User-agent: *
Allow: /

To block web crawlers from accessing the entire site, add the following line to your robots.txt file:

User-agent: *
Disallow: /

This example keeps category pages from being accessed by web crawlers.

User-agent: *
Disallow: /category/

To allow all pages and disallow admin pages you can use this code:

User-agent: *
Allow: /
Disallow: /wp-admin/

Note that if you exclude pages from being crawled by web crawlers, those pages may still appear in search results. To prevent a page from appearing in search results, add the “noindex” tag to that page.

Please avoid blocking your home page and other important pages such as your blog, contact information, and other services. Blocking these pages can harm your search engine optimization (SEO) and visibility on the web. Also, avoid blocking the XML sitemap file from being crawled by search engines.

Example:

User-agent: *
Disallow: /wp-admin/
Disallow: /sitemap.xml

You should not rely on robots.txt to protect sensitive information; search engines do not always obey the file and it’s not a foolproof way to prevent crawling.

For more examples you can refer to this:

Pages that should be excluded in wordpress site in robots.txt with examples

Related Posts: