Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

robots.txt


valquiria23

Recommended Posts

Hello everyone, I hope you are very well!

What is the robots.txt file for? Apparently it is necessary to use it for search engines.

Does someone have the robots.txt file updated?
In what folder is it placed? in the root or the Catalog folder?

From already thank you very much!!

Valqui

:heart: Community Oscommerce fan :heart: You'll find the latest osC community version here.

 

Link to comment
Share on other sites

It's a text file located at the root of your website. It's for providing directions to robots or bots.

There is not one template file that fits all sites, you have to build your own to suit your site.

You can specify the location of an xml sitemap for robots to follow there. eg:

You would make an entry like this:

Sitemap: https://www.shop.com/sitemap.xml

You can specify others things like which URLs to disallow to crawl etc.

Link to comment
Share on other sites

don't put disallow admin as that just gives away the location of your admin

No longer giving free advice. Please place deposit in meter slot provided.  Individual: [=] SME: [==] Corporation: [===]
If deposit does not fit one of the slots provided then you are asking too much! :P

Is your Osc dated try Phoenix  raising oscommerce from the ashes.

Link to comment
Share on other sites

/robots.txt provides guidance to robots (including search engines), but provides no security. "Well-behaved" bots will follow the instructions, but others will use it as information for where to sniff around. For example, you would never put in

Disallow /my_credit_card_numbers

because that's an engraved invitation to look around. There's good information and tutorials online (I have http://www.searchengineworld.com/robots/ bookmarked, but at the moment there seems to be something wrong with it).

Remember to add

sitemap: URL_to_your_sitemap

Link to comment
Share on other sites

22 hours ago, valquiria23 said:

Hello everyone, I hope you are very well!

What is the robots.txt file for? Apparently it is necessary to use it for search engines.

Does someone have the robots.txt file updated?
In what folder is it placed? in the root or the Catalog folder?

From already thank you very much!!

Valqui

Hey,

If you want to understand what it is and how it works then read this,

http://www.robotstxt.org/robotstxt.html

If you just want to add the file, it need to go same place as your index.php,   a basic file for ocS is attached, you can add to it as you wish.

Google has a very good help page on this and a tester for your robots.txt file

https://support.google.com/webmasters/answer/6062596?hl=en

Google also has a tool to make robots.txt files from your site but you need a google account to access this see here

https://www.google.com/webmasters/tools/home?hl=en

you can read about it in this blog and decide if its for you.

https://webmasters.googleblog.com/2008/03/speaking-language-of-robots.html

 

It's a lot to take in but hope it helps.

 

 

robots.txt

 

Link to comment
Share on other sites

Apart from the sitemap entry, which is important, there's not much if anything that you need to put in there. Bots will never get to the admin or include folders because there are no public links pointing there. In any event there is a meta noindex tag in the admin template_top.php so again a bot will never index an admin file. Be sure to password protect the admin folder then the problem is solved. As for disallowing other files, there really is no need. osC has the header tag module -> robots that you can use to apply noindex tags, which is a better option than restricting with robots.txt

Link to comment
Share on other sites

Hello everyone and thanks for your comments.

This is my header tag module -> robots configuration. Would you add or remove something?

robots.jpg.194e32b245d27b374bf638d237475376.jpg

If I add also the robots.txt something happens?

I have my index.php in the catalog folder : catalog\index.php , would I leave the robots.txt? In the folder catalog?

Thanks for your clarifications!

Valqui

:heart: Community Oscommerce fan :heart: You'll find the latest osC community version here.

 

Link to comment
Share on other sites

The robot file has to go in the root of the account, not the shop. So if your shop is located in public_html/catalog/, the robots file goes in public_html/. If it has entries that are specific to the catalog directory, then the path needs to be included. So if there is a line like disallow: /Images/, it should be disallow: /catalog/images/, if it is the shops images directory being blocked. You can read more about the specification here.

Support Links:

For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc.

Get the latest versions of my addons

Recommended SEO Addons

Link to comment
Share on other sites

3 hours ago, Hotclutch said:

Apart from the sitemap entry, which is important, there's not much if anything that you need to put in there. Bots will never get to the admin or include folders because there are no public links pointing there. In any event there is a meta noindex tag in the admin template_top.php so again a bot will never index an admin file. Be sure to password protect the admin folder then the problem is solved. As for disallowing other files, there really is no need. osC has the header tag module -> robots that you can use to apply noindex tags, which is a better option than restricting with robots.txt

Not entirely true. As with robots.txt, "noindex" and "nofollow" are only suggestions to search engines, and they are free to ignore them! Not having public links to admin, etc. is no guarantee that they can't be guessed or obtained by other means (phishing, careless forum posts, etc.), so don't depend on that for security. admin should require an ID and password, which should keep out the curious even if they find the name.

Link to comment
Share on other sites

^^

Noindex is a directive that will always be obeyed, unlike rel="canonical",  at least by googlebot, but I am sure all others. The only time it may appear that google is ignoring the noindex tag is when the page is blocked by a robots.txt file, in which case the bot can't see the noindex directive.

Link to comment
Share on other sites

@Hotclutch @MrPhil @Jack_mcs

My host allows me to have multiple domains all on one account (and server). My primary domain (for example, <domain 1>) is on their server in the public_html folder. All of my other domains are in sub-directories under that ...

public_html --+-- <domain 2>
              |
              +-- <domain 3>
              |
              +-- <domain 4>
              |
            (etc)

Wouldn't each domain would need its own sitemap? And, if I need to have a robots.txt to tell the robots where the sitemep file is, how can I have a robots.txt file for each of the individual domains?

Or, am I confusing things (again)?

M

Link to comment
Share on other sites

Each domain can have it's own sitemap, robots.txt and htaccess files which would go into the folder of the respective domain. but i guess this might depend on how multiple domain hosting is setup by your host.

 

Link to comment
Share on other sites

6 hours ago, Hotclutch said:

Noindex is a directive that will always be obeyed, unlike rel="canonical",  at least by googlebot

Not exactly. Google will ignore the noindex if they think a page needs to be listed. You can see that alluded to here. They won't rank such a page but they may list it.

Support Links:

For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc.

Get the latest versions of my addons

Recommended SEO Addons

Link to comment
Share on other sites

15 hours ago, JcMagpie said:

If you just want to add the file, it need to go same place as your index.php,

Sorry for any confusion, As all my sites are installed in root I just defaulted to that. Its not going to be true for everyone. Below are some basic guide lines from Google ( as it is my largest customer for indexing I tend to follow them)

Format and location rules:

  • The file must be named robots.txt
  • Your site can have only one robots.txt file.
  • The robots.txt file must be located at the root of the website host that it applies to. For instance, to control crawling on all URLs below http://www.example.com/, the robots.txt file must be located at http://www.example.com/robots.txt. It cannot be placed in a subdirectory ( for example, at http://example.com/pages/robots.txt). If you're unsure about how to access your website root, or need permissions to do so, contact your web hosting service provider. If you can't access your website root, use an alternative blocking method such as meta tags.
  • A robots.txt file can apply to subdomains (for example, http://website.example.com/robots.txt) or on non-standard ports (for example, http://example.com:8181/robots.txt).

 

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...