Blogger Adds Robots.txt
If you use Google Sitemaps for your Blogger and you are seeing a sudden increase in the number of links being blocks by robots.txt, don't panic as it is not anything you did wrong. It is all because Blogger is adding robots.txt by default recently. As you can see from the codes below, all pages under the /search directory is being disallowed meaning and pages under the /search directory would not appear included in any search result pages of major search engines (e.g. Google, Yahoo and Live.com).
User-agent: *This is, in fact, a good news to Blogger users as Google is treating most of these blocked pages as duplicated content and are listing them as 'Supplemental Results'. Furthermore, the more duplicated content you site has, the less Google is weighing your site's content.
Disallow: /search
Sitemap: http://gspy.blogspot.com/feeds/posts/default?orderby=updated
A further improvement towards this great feature is to allow Blogger users to customize their own robots.txt so they could prevent undesired content from appearing in search results.
Note: If you are using Blogger and has recently directed your feed to FeedBurner, make sure you change your sitemap URL in Google Sitemaps to http://yourblog.blogspot.com/rss.xml?orderby=updated instead of just http://yourblog.blogspot.com/rss.xml otherwise an error would occur in Google Sitemaps.
46 comments:
hi Keith
thanks 4 the precious post.
i have been using blogger + google_webmaster for a long time . but i was not aware that /rss.xml can be uploaded as site map. i am now using /rss.xml?orderby=updated
as my sitemap. thanks a lot.
ivilla
It's a pity theese robots's rules do not work with other SEs.
@ devi You're welcome.
@ forex The robots.txt rule applies to all SEs like Yahoo and MSN.
This is bad news for a Blogger.com user who wants all posts to be crawled and doesn't care what weight Google gives them.
Do I have any way to get the User-agent: *
Disallow: /search
removed from my robots file?
@ Mike Unfortunately, there isn't an option for Blogger users to modify the robots.txt file of their blogs. However, I don't see the point for Google to crawl duplicated because 99% of the /search pages are duplicated of your existing posts.
well my posts are actually not duplicated. much of the content is the same (fire incidents) but the actual details (when, where, why) is what I want to be searchable on Google.
I have two blogger blogs. One has minimal traffic and one has had much better results. I just checked in Adsense today, and my "minimal traffic" blog is making me some chump change, but the blog with all the traffic is showing as ZERO page impressions (although I've had 10,000+ visitors). I then used the Google Tools which told me that 47 pages of content are being blocked by robots.txt. I understand the whole "duplicate content" vs "original content", but why would nothing at all be showing up?
Sorry, I am confused. I am a user of both Blogger and Google Webmaster but I don't know how to access that robot.txt. I am having this problem because I really need to remove a page that google has cached. I tried everythin but it still hasn't worked. Now I want to try to put robots.txt to my blogger but I don't know how to do it. Help?
@ cindydanda Currently, Blogger aren't able to edit their blog's robot.txt. To remove a cache from Google, follow the instructions listed here: http://www.google.com/support/webmasters/bin/answer.py?answer=61062
Hello Keith,
I am seeing my robots.txt as
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow: /search
Disallow: /
My main page itself is blocked by the robots.txt so nothing comes in search result when I put my link for searching in google.
How to get rid of this? Any idea?
Hi,
thanks for this.
I saw there were errors in my sitemap, now it's ok.
so, thanks to you ;)
Nice work on the robots.txt. I searched allot the net to find any info why google says I block some pages
The only solution to this is create another blog per post you have. Meaning if you have 50 posts, you should have 50 blogs with 1 post each. But still i think this does not work.
Shall we all transfer to wordpress now? Lets have a massive bloggers transfer to wordpress.
Thank you.
Thanks for your tips I was worried, why Google Site map shows certain URLs of my blog is blocked by robot.txt.
Visit my blog for tips and tricks about blogger and any computer user
I've had problems editing the robots.txt. I did some research about it and found that it couldn't be edited in blogger unless you have your own server like wordpress or page.ph. Anyway, i was just adding a sitemap to get indexed by google and I did it. I don't need a sitemap anymore.
Thanks you very much. But robots.txt isnot changing. Blogger is havent for me any permission.
thanks for your tutorial bro, my this tutorial i share to my blog
Thanks -- I've been a few other pages on the topic and none actually had the correct answer. Good one!
Thank you so much for this great information. It was very helpful and I'm subscribing right now!
Thanks for the information. I looked at it today and noticed it and wondered why. I guess I got lucky finding your site first.
New subscriber to your blog!
I have google verified my site and changed all blogger settings to allow publishing. Yet Google will not index the site do the the auto robots file:
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow: /search
What is the solution. I do not have duplicate posts, do not want 75 blogs for the 75 postings, and must not be the only one having this problem.
Quite helpful. Thanks a lot.
thansk for this information, this is very usefull for me that learning blogspot.
I got errors when I added rss.xml as sitemap, but adding rss.xml?orderby=updated solved my problem. Thanks.
Hello Keith,
Thank you for your great article about sitemaps, I have now added orderby=updated to my google sitemaps.
Yesterday I submitted my site to bing and added the meta tag validation and today I see my website which was being listed in the first and sometimes 5th postion on google with various keywords is totally not visible even after 2nd page.
Keith do you have any idea. I suppose Its because of the bing meta tag code?? as soon as i realised this I have removed the meta tag from the blog. Please advice. I am panicking. what must be teh reason??
Any ideas ??
Thanks for reading!!
@Tanzy: I have exactly the same problem. I added the Bing Meta Tag and now my blog no longer shows up in Google search results.
Removing the Bing Meta Tag is not resolving the problem. Haven't got a clue what to do about it.
How can I change the robot.txt for my blog? Is there something to worry about?
How can I be confirm that my post is duplicate or not. I mean how does google do this.
Really, I still don't understand how to create a new robots.txt file for new site. Do you have that exact information? Thank you.
thnxs
Wow ... excellent information ... this is i am searching form last 10 days. This info is very useful for us.
Thank you.
This is actually bad news, as the robots.txt prevents sites that have their material in google news from being read ... unless there is a workaround. You cannot change the robots.txt in webmaster either, unless the site is hosted on a non.google host.
Nice useful information there.. thanks..
Hi
This is a very nice thing to serve with each other , Web provide us to share our knowledge world wide .
Harendra Gusain
This is odd. I go to google webmaster tools and it shows that googlebot crawled my site, but I see no information, no statistics. I am not sure what else to do at this time. My robot.txt is the same:
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow: /search
Sitemap: http://www.freepctechtips.com/feeds/posts/default?orderby=updated
It is the same as everyone else. What I don't understand does this mean that I am blocked by other crawlers, spiders from other search engines to index my blog? This is really what I need to know. Also is google not crawling my blog so it will show up in the search results?
Thanks,
Rikimaroo
www.freepctechtips.com
Okey, okey.
And the solution from unblocking the content is... ?
robots.txt are bloching some labels on my blog. the solution is to remove the labels?
Very nice and useful article.
Hello Keith,
I am seeing my robots.txt as
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow: /search
Disallow: /
My main page itself is blocked by the robots.txt so nothing comes in search result when I put my link for searching in google.
How to get rid of this? Any idea?
http://learnmicrosoftbi.blogspot.com
thanks. we change our url sitemap and we think it's gonna work. good work on your post! :-)
Very useful information! Thank you! But ploblem is still being.. (
thanks for sharing this
This, post are realy good, just did the changes at google.
Thanks man
Do you know if there is a way to edit robots.txt file as of today?
Do I understand it well, that Google is crawling only pages that are in sitemap? Our blog is growing very fast and sometimes some pages are not crawled as, they are out from RSS feed before the page is crawled.. Does it mean, that some posts are never crawled? :(
Regards!
PS. My blog is http://cdcoverdesign.blogspot.com/
how to change robots.txt in my blog www.easymovieshyderabad.co.in(domain connected to blogger) pls help
Robots.txt is just a request you cant rely on it 100% google webmaster also provide robots generator tool for creating robots.txt file.
Post a Comment
New! You can now receive emails whenever there are follow-up comments by signing into Blogger.
Off-topic posts will be deleted