How to avoid annoying Magento URL params in search engine index
For my latest Magento ecommerce project I just noticed that search engines indexed some URLs with annoying params like ?dir=asc&order=position or ?limit=9. Also some unnecessary URLs like catalog search or checkout page.
Usually we don’t want to Google to index these pages because it could be duplicated content which can be bad for our ecommerce shop‘s ranking. One option would be to have nofollow links in your template but the easier way is to build a robots.txt file your Magento ecommerce shop. I also included some other important directories we usually don’t want to have indexed by search engines.
Here is my robots.txt example:
User-agent: * Disallow: /index.php/ Disallow: /*? Disallow: /*.js$ Disallow: /*.css$ Disallow: /404/ Disallow: /admin/ Disallow: /api/ Disallow: /app/ Disallow: /catalog/category/view/ Disallow: /catalog/product/view/ Disallow: /catalog/product_compare/ Disallow: /catalogsearch/ Disallow: /catalogsearch/advanced/ Disallow: /catalogsearch/term/popular/ Disallow: /cgi-bin/ Disallow: /checkout/ Disallow: /checkout/cart/ Disallow: /contacts/ Disallow: /contacts/index/ Disallow: /contacts/index/post/ Disallow: /customer/ Disallow: /customer/account/ Disallow: /customer/account/login/ Disallow: /downloader/ Disallow: /install/ Disallow: /images/ Disallow: /js/ Disallow: /lib/ Disallow: /magento/ Disallow: /media/ Disallow: /newsletter/ Disallow: /pkginfo/ Disallow: /private/ Disallow: /poll/ Disallow: /report/ Disallow: /review/ Disallow: /sendfriend/ Disallow: /skin/ Disallow: /tag/ Disallow: /var/ Disallow: /wishlist/ Disallow: /anyothercontentyouwouldliketodisallow/ Sitemap: http://www.your-amazing-magento-ecommerce-shop.com/sitemap.xml
Very helpful – thanks very much! Is also annoyed by loads of unnecessary URLs indexed by Google.
Thanks so much, this is really helpful. I am a magento newbie, and I know I wanted to disallow some things in the robots.txt file, but didn’t think about this much. Now that I have seen your list, it put a lot in perspective and is really helpful. Thanks again. Great Work!
~Kaylaa
this post is good . thank you!
Is this still good? Has anyone had any negative results with using this Robots.txt file?
Thanks for a REALLY usful post. I’ve just started to notice a lot of search results pages being indexed by google, so this was a real help. Thanks…
@LarryE
As long as you are generating a valid sitemap and have submitted it to Google, this should be just fine. It’s one of the least restrictive exclusion files I’ve come across.
Exactly what I was looking for. Had anyone trouble with Google position after using this robots for a while?