Pages to exclude in robots.txt for WordPress site

You can use the “Disallow” tag in the robots.txt file for the pages to exclude in robots.txt for WordPress site. The “Disallow” tag in a robots.txt file is used to prevent search engines from crawling or indexing certain pages of your website, including sensitive information or duplicate content. The robots.txt file can be a helpful tool for webmasters, but it is only a suggestion to search engines, and not all search engines or web crawlers will obey it.

Here are some examples of pages that you might want to exclude from your WordPress site’s robots.txt file:

Pages that contain sensitive information: These might include pages that have personal or financial information that you don’t want to share. Example:

User-agent: *
Disallow: /my-account/
Disallow: /preferences/
Disallow: /checkout/
Disallow: /cart/

Pages that aren’t important for SEO include welcome, thank you, payment completed, the account is expired pages after form submission, and pages that are accessible only to logged-in users. Example:

User-agent: *
Disallow: /welcome-page/
Disallow: /thank-you-page/
Disallow: /payment-completed/
Disallow: /account

If you have multiple pages on your site that contain the same or similar content, you can exclude those pages from being crawled so that search engines won’t penalize you for duplicate content. Example:

User-agent: *
Disallow: /category/
Disallow: /sections/

If you are working on a new section of your site that is not ready to be indexed by search engines, you can prevent it from being crawled. Example:

User-agent: *
Disallow: /your-under-construction-pages/
Disallow: /your-new-pages/

You can exclude admin login pages from being crawled to prevent search engines from including them in search results.

User-agent: *
Disallow: /wp-login.php
Disallow: /wp-admin/

You can set dashboard pages to be excluded from being crawled by search engines so that they are not indexed. Example:

User-agent: *
Disallow: /wp-admin/

It is important to avoid blocking the XML sitemap file so that search engines can crawl your site easily.

It should also be noted that you shouldn’t use the robots.txt file to hide content from users. The file is meant to be used only for communicating with web crawlers and not for preventing users from accessing your site’s content.

Further, you can refer to this post for Understanding and Configuring the robots.txt File for WordPress Websites:

Understanding and Configuring the robots.txt File for WordPress Websites

Related Posts: