Wednesday, July 18, 2007

Blogger Adds Robots.txt

If you use Google Sitemaps for your Blogger and you are seeing a sudden increase in the number of links being blocks by robots.txt, don't panic as it is not anything you did wrong. It is all because Blogger is adding robots.txt by default recently. As you can see from the codes below, all pages under the /search directory is being disallowed meaning and pages under the /search directory would not appear included in any search result pages of major search engines (e.g. Google, Yahoo and Live.com).

User-agent: *
Disallow: /search
Sitemap: http://gspy.blogspot.com/feeds/posts/default?orderby=updated
This is, in fact, a good news to Blogger users as Google is treating most of these blocked pages as duplicated content and are listing them as 'Supplemental Results'. Furthermore, the more duplicated content you site has, the less Google is weighing your site's content.

A further improvement towards this great feature is to allow Blogger users to customize their own robots.txt so they could prevent undesired content from appearing in search results.

Note: If you are using Blogger and has recently directed your feed to FeedBurner, make sure you change your sitemap URL in Google Sitemaps to http://yourblog.blogspot.com/rss.xml?orderby=updated instead of just http://yourblog.blogspot.com/rss.xml otherwise an error would occur in Google Sitemaps.

47 comments:

Devi Mahapatra said...

hi Keith
thanks 4 the precious post.
i have been using blogger + google_webmaster for a long time . but i was not aware that /rss.xml can be uploaded as site map. i am now using /rss.xml?orderby=updated
as my sitemap. thanks a lot.

ivilla

forex trader said...

It's a pity theese robots's rules do not work with other SEs.

Keith Chan said...

@ devi You're welcome.
@ forex The robots.txt rule applies to all SEs like Yahoo and MSN.

Mike Dayoub said...

This is bad news for a Blogger.com user who wants all posts to be crawled and doesn't care what weight Google gives them.

Do I have any way to get the User-agent: *
Disallow: /search

removed from my robots file?

Keith said...

@ Mike Unfortunately, there isn't an option for Blogger users to modify the robots.txt file of their blogs. However, I don't see the point for Google to crawl duplicated because 99% of the /search pages are duplicated of your existing posts.

Mike Dayoub said...

well my posts are actually not duplicated. much of the content is the same (fire incidents) but the actual details (when, where, why) is what I want to be searchable on Google.

Chef Mom said...

I have two blogger blogs. One has minimal traffic and one has had much better results. I just checked in Adsense today, and my "minimal traffic" blog is making me some chump change, but the blog with all the traffic is showing as ZERO page impressions (although I've had 10,000+ visitors). I then used the Google Tools which told me that 47 pages of content are being blocked by robots.txt. I understand the whole "duplicate content" vs "original content", but why would nothing at all be showing up?

cindydanda said...

Sorry, I am confused. I am a user of both Blogger and Google Webmaster but I don't know how to access that robot.txt. I am having this problem because I really need to remove a page that google has cached. I tried everythin but it still hasn't worked. Now I want to try to put robots.txt to my blogger but I don't know how to do it. Help?

Keith said...

@ cindydanda Currently, Blogger aren't able to edit their blog's robot.txt. To remove a cache from Google, follow the instructions listed here: http://www.google.com/support/webmasters/bin/answer.py?answer=61062

सारंग पतकी said...

Hello Keith,
I am seeing my robots.txt as

User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /search
Disallow: /

My main page itself is blocked by the robots.txt so nothing comes in search result when I put my link for searching in google.

How to get rid of this? Any idea?

Jane Air said...

Hi,
thanks for this.
I saw there were errors in my sitemap, now it's ok.
so, thanks to you ;)

gagi said...

Nice work on the robots.txt. I searched allot the net to find any info why google says I block some pages

PALS said...

The only solution to this is create another blog per post you have. Meaning if you have 50 posts, you should have 50 blogs with 1 post each. But still i think this does not work.

Shall we all transfer to wordpress now? Lets have a massive bloggers transfer to wordpress.

Thank you.

Technical Details said...


Thanks for your tips I was worried, why Google Site map shows certain URLs of my blog is blocked by robot.txt.
Visit my blog for tips and tricks about blogger and any computer user

Mitchie said...

I've had problems editing the robots.txt. I did some research about it and found that it couldn't be edited in blogger unless you have your own server like wordpress or page.ph. Anyway, i was just adding a sitemap to get indexed by google and I did it. I don't need a sitemap anymore.

Book said...

Thanks you very much. But robots.txt isnot changing. Blogger is havent for me any permission.

Gunawan said...

thanks for your tutorial bro, my this tutorial i share to my blog

Ann Donnelly said...

Thanks -- I've been a few other pages on the topic and none actually had the correct answer. Good one!

Livingstrong said...

Thank you so much for this great information. It was very helpful and I'm subscribing right now!

Emmanuel said...

thanks for the information.

Tim Freeman said...

Thanks for the information. I looked at it today and noticed it and wondered why. I guess I got lucky finding your site first.

New subscriber to your blog!

Shane Montgomery said...

I have google verified my site and changed all blogger settings to allow publishing. Yet Google will not index the site do the the auto robots file:

User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /search

What is the solution. I do not have duplicate posts, do not want 75 blogs for the 75 postings, and must not be the only one having this problem.

Epenschmiede said...

Quite helpful. Thanks a lot.

hendri susanto said...

thansk for this information, this is very usefull for me that learning blogspot.

Vidit said...

I got errors when I added rss.xml as sitemap, but adding rss.xml?orderby=updated solved my problem. Thanks.

Tanzy said...

Hello Keith,

Thank you for your great article about sitemaps, I have now added orderby=updated to my google sitemaps.

Yesterday I submitted my site to bing and added the meta tag validation and today I see my website which was being listed in the first and sometimes 5th postion on google with various keywords is totally not visible even after 2nd page.

Keith do you have any idea. I suppose Its because of the bing meta tag code?? as soon as i realised this I have removed the meta tag from the blog. Please advice. I am panicking. what must be teh reason??

Any ideas ??

Thanks for reading!!

Vikingdread said...

@Tanzy: I have exactly the same problem. I added the Bing Meta Tag and now my blog no longer shows up in Google search results.

Removing the Bing Meta Tag is not resolving the problem. Haven't got a clue what to do about it.

PennyAuction said...

How can I change the robot.txt for my blog? Is there something to worry about?

Facebook Pretender said...

How can I be confirm that my post is duplicate or not. I mean how does google do this.

ldii said...

Really, I still don't understand how to create a new robots.txt file for new site. Do you have that exact information? Thank you.

shiva said...

thnxs

Animals Zoo Park said...

Wow ... excellent information ... this is i am searching form last 10 days. This info is very useful for us.

Thank you.

Eurasia Review said...

This is actually bad news, as the robots.txt prevents sites that have their material in google news from being read ... unless there is a workaround. You cannot change the robots.txt in webmaster either, unless the site is hosted on a non.google host.

Andrew said...

Nice useful information there.. thanks..

Harendra Gusain said...

Hi
This is a very nice thing to serve with each other , Web provide us to share our knowledge world wide .
Harendra Gusain

Rikimaroo said...

This is odd. I go to google webmaster tools and it shows that googlebot crawled my site, but I see no information, no statistics. I am not sure what else to do at this time. My robot.txt is the same:

User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /search

Sitemap: http://www.freepctechtips.com/feeds/posts/default?orderby=updated

It is the same as everyone else. What I don't understand does this mean that I am blocked by other crawlers, spiders from other search engines to index my blog? This is really what I need to know. Also is google not crawling my blog so it will show up in the search results?

Thanks,
Rikimaroo
www.freepctechtips.com

George Nicolae said...

Okey, okey.

And the solution from unblocking the content is... ?

robots.txt are bloching some labels on my blog. the solution is to remove the labels?

faraz said...

Very nice and useful article.

Amit Gupta said...

Hello Keith,
I am seeing my robots.txt as

User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /search
Disallow: /

My main page itself is blocked by the robots.txt so nothing comes in search result when I put my link for searching in google.

How to get rid of this? Any idea?
http://learnmicrosoftbi.blogspot.com

Wonder Twins said...

thanks. we change our url sitemap and we think it's gonna work. good work on your post! :-)

Alex M. said...

Very useful information! Thank you! But ploblem is still being.. (

davao city jobs directory said...

thanks for sharing this

JediMasterArt said...

This, post are realy good, just did the changes at google.
Thanks man

reviews said...

Do you know if there is a way to edit robots.txt file as of today?

Mat said...

Do I understand it well, that Google is crawling only pages that are in sitemap? Our blog is growing very fast and sometimes some pages are not crawled as, they are out from RSS feed before the page is crawled.. Does it mean, that some posts are never crawled? :(
Regards!
PS. My blog is http://cdcoverdesign.blogspot.com/

Prsna said...

how to change robots.txt in my blog www.easymovieshyderabad.co.in(domain connected to blogger) pls help

MLM Software said...

Robots.txt is just a request you cant rely on it 100% google webmaster also provide robots generator tool for creating robots.txt file.

Post a Comment

New! You can now receive emails whenever there are follow-up comments by signing into Blogger.

Off-topic posts will be deleted