Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Google will ignore the noindex in the robots.txt as of September 1, 2019


bonbec

Recommended Posts

Google says not to try to block using the robots file (see the "You should not use robots.txt...").  Their reason for the robots change is that they are trying to establish a standard, which they will probably achieve. So we all need to start adjusting our thinking to be what they want. :(

Support Links:

For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc.

Get the latest versions of my addons

Recommended SEO Addons

Link to comment
Share on other sites

3 minutes ago, Hotclutch said:

you can't use a robots.txt disallow directive to stop google indexing

I know it has nothing to do with indexing, it is however one of the recomended alternatives listed by google and I have been using it for years. As google says if you have content you dont wish to be seen  then you can pasword protect it or use disallow, if how ever you dont with it to be indexed but still wish it to be seen the you have to use one other the other alternatives. As allways if your not sure get professional help.

" For those of you who relied on the noindex indexing directive in the robots.txt file, which controls crawling, there are a number of alternative options:

  • Noindex in robots meta tags: Supported both in the HTTP response headers and in HTML, the noindex directive is the most effective way to remove URLs from the index when crawling is allowed.
  • 404 and 410 HTTP status codes: Both status codes mean that the page does not exist, which will drop such URLs from Google's index once they're crawled and processed.
  • Password protection: Unless markup is used to indicate subscription or paywalled content, hiding a page behind a login will generally remove it from Google's index.
  • Disallow in robots.txt: Search engines can only index pages that they know about, so blocking the page from being crawled usually means its content won’t be indexed.  While the search engine may also index a URL based on links from other pages, without seeing the content itself, we aim to make such pages less visible in the future.
  • Search Console Remove URL tool: The tool is a quick and easy method to remove a URL temporarily from Google's search results."

 

Link to comment
Share on other sites

16 minutes ago, JcMagpie said:
  • Disallow in robots.txt: Search engines can only index pages that they know about, so blocking the page from being crawled usually means its content won’t be indexed.  While the search engine may also index a URL based on links from other pages, without seeing the content itself, we aim to make such pages less visible in the future.

This is not true, and most often misunderstood.

If you have something in the index, then putting disallow in the robots.txt won't cause it to drop out of the index. In fact it will now stay there forever, because google cannot crawl the URL to see a noindex directive.

Alternatively, if you don't have something in the index, and you put a disallow in the robots.txt because you think it will prevent search engines from listing the content, then you're mistaken, because an external link to that URL will cause the search engine to still list the URL.

There are only 2 ways to prevent indexing.

1) meta noindex in the header.

2) 301 the URL

A URL that 404s, eventually drops out of the index, but search engines continue to crawl that URL indefinitely, with reduced frequency between crawls. And there's doubt as to how Google handles a 410 response code. 

Link to comment
Share on other sites

Thankyou for your feedback, I'm happy with my understanding of googles recomendations. Others will have to decide what's best for their website for themselves. As i said above...

1 hour ago, JcMagpie said:

As allways if your not sure get professional help.

It's not a big issue as all you need to do is turn on the Robot NoIndex header_tags module in CE, so most people should be fine.

 

Link to comment
Share on other sites

2 hours ago, Allen Solly said:

If this will implement in september then for what purpose we will use the robot.txt file.

The only thing i put in my robots.txt file is a link to sitemap. But putting disallow in the robots.txt can be useful if you're trying to optimise your crawl budget.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...