I love to optimize my WordPress-based blog. The only problem is, I rarely have enough time to do it — and still, there are some small improvements, which may take less than 5 minutes of your time, and yet have a tangible impact on your overall blog optimization.

One of these things is how we can prevent Google (and other search engines) from indexing (searching) the WordPress RSS feeds.

The next few lines will be dedicated to this problem (and how we can solve it).

Where to start?

I remember that some time ago I was checking which pages of optimiced.com are indexed in Google.

I was puzzled by the fact that, beside the blog posts, I have found a lot of RSS feeds, which were also indexed.

Why you do not need Google to index/spider the RSS feeds?

First of all, the indexed (searched) content is duplicated – the last 10 posts or the last comments, available via RSS, can be read on the blog itself. Second point, RSS is meant to be used with an RSS reader, not to be read in the browser window (text and images won’t be formatted, for example). Last, but not least, who would like after a performed internet search to land on a un-formatted RSS page with comments, for example, instead of on the blog post itself, to which the comments are related? And this happened to me, and more than once…

(Example: you can use this link to subscribe to the RSS feed of my blog, or just to check the ten last blog posts from optimiced in RSS format.)

Can we prevent this from happening?

I searched the Internet for some time, until finally I dropped on the WordPress Support forum, where the solution was found, and the thread itself, titled “Prevent indexing of feed pages”, was marked as ‘resolved’.

Here’s the way to do it – you must use a robots.txt file.

What is robots.txt?

robots.txt graphicAs the name itself suggests, robots.txt* is a text file in the standard text format (.TXT), intended to use by robots:-)

But not all robots, of course (for example, Roomba doesn’t count;-), but only by the search machines (spiders), like Google, Live Search (until recently MSN Search), Yahoo!, Alta Vista and all other search (ro)bots.

To that purpose, you must create a new blank file and save it as a simple text file with the name of robots and the extension .txt (the file must be placed in the main directory of the website/blog – for example, for optimiced this is https://www.optimiced.com/robots.txt).

After you have created the text file, you must copy & paste inside the following two lines:

User-Agent: *
Disallow: */feed/

The first line with the asterisk means that the rule on the next line will be applied to all search engines; and the second one, Disallow: */feed/, means, that all URLs, ending with /feed/, should not be indexed (searched).

Because in my case I use WordPress permalinks (permanent links) of the type Date & Name based:
https://www.optimiced.com/bg/%year%/%monthnum%/%day%/%postname%/
…the RSS feeds for my blog are as follow:

https://www.optimiced.com/en/feed/
https://www.optimiced.com/en/comments/feed/
https://www.optimiced.com/en/name-of-category/feed/
https://www.optimiced.com/en/2007/09/16/title-of-post/feed/

(the last one is an RSS feed example for comments for a specific blog post).

If your blog uses a different structure for the URLs – for example, the ‘short’ version (https://www.optimiced.com/en/?p=1234), the RSS feeds for the blog will be in this URL format:

https://www.optimiced.com/en/?feed=rss2
https://www.optimiced.com/en/?feed=comments-rss2
https://www.optimiced.com/en/?feed=rss2&cat=123
https://www.optimiced.com/en/?feed=rss2&p=1234
.

In this case, I guess, you should change the rule in the robots.txt file to:

User-Agent: *
Disallow: *?feed*

(Note: This scenario was not tested by me!)

After you make the required changes and place the robots.txt file on the server, you’ll have to wait a couple of days to see the intended effect.

That’s it:)

The described method is very simple and works well. Because I use Google Webmaster Tools, I can check the list of all indexed URLs for optimiced.com. Soon after I’ve added the two lines of code to the robots.txt, in the list “URLs restricted by robots.txt” appeared all of my RSS feeds, exactly as expected:) I guess, the other search engines obey to the same rules, so you should be quite safe using robots.txt to ‘filter’ the RSS feeds from search.

Other uses of robots.txt

Of course, the use of robots.txt is not limited to the restriction of indexing RSS feeds — you can restrict, for example, the search within a certain category in your blog, and in this case this category (or categories) will become ‘invisible’ for the search engines. You can restrict other sections of your website (it not necessarily has to be a blog), just by listing them in the robots.txt file, and adding more rules to the file.

Robots.txt finds a lot more uses in practice, but in this case I wanted just to write on the subject how to make your WordPress blog a bit closer to perfection:)

Final remarks

While making my small research on the subject, I thought about another way for achieving the same effect – you can place the attribute rel="nofollow" in the RSS feeds links. But this would require to edit the code of your WordPress theme, and in more than one place.

So definitely, the robots.txt way is much easier:)

____________
Notes:
(*) More on the robots.txt subject you can learn from the official Robots.txt FAQ and from Wikipedia.
(**) More on the use of the asterisk (*, or wildcard) for robots.txt, Googlebot and other search engines, you can learn from Google itself;-)

3 thoughts on “How to prevent Google from indexing WordPress RSS feeds

  1. Thanks, Jonathan (and thanks for passing by, too!) :)

    I’ve decided to find a way to exclude my RSS feeds after I’ve discovered that maybe half or one-third of my indexed blog pages in Google are RSS, and this was something I didn’t want, for sure:)

Leave a Reply

Your email address will not be published. Required fields are marked *