valquiria23 Posted July 4, 2018 Share Posted July 4, 2018 Hello everyone, I hope you are very well! What is the robots.txt file for? Apparently it is necessary to use it for search engines. Does someone have the robots.txt file updated? In what folder is it placed? in the root or the Catalog folder? From already thank you very much!! Valqui Community Oscommerce fan You'll find the latest osC community version here. Link to comment Share on other sites More sharing options...
Hotclutch Posted July 4, 2018 Share Posted July 4, 2018 It's a text file located at the root of your website. It's for providing directions to robots or bots. There is not one template file that fits all sites, you have to build your own to suit your site. You can specify the location of an xml sitemap for robots to follow there. eg: You would make an entry like this: Sitemap: https://www.shop.com/sitemap.xml You can specify others things like which URLs to disallow to crawl etc. Link to comment Share on other sites More sharing options...
241 Posted July 4, 2018 Share Posted July 4, 2018 don't put disallow admin as that just gives away the location of your admin No longer giving free advice. Please place deposit in meter slot provided. Individual: [=] SME: [==] Corporation: [===] If deposit does not fit one of the slots provided then you are asking too much! Is your Osc dated try Phoenix raising oscommerce from the ashes. Link to comment Share on other sites More sharing options...
MrPhil Posted July 4, 2018 Share Posted July 4, 2018 /robots.txt provides guidance to robots (including search engines), but provides no security. "Well-behaved" bots will follow the instructions, but others will use it as information for where to sniff around. For example, you would never put in Disallow /my_credit_card_numbers because that's an engraved invitation to look around. There's good information and tutorials online (I have http://www.searchengineworld.com/robots/ bookmarked, but at the moment there seems to be something wrong with it). Remember to add sitemap: URL_to_your_sitemap Link to comment Share on other sites More sharing options...
♥JcMagpie Posted July 5, 2018 Share Posted July 5, 2018 22 hours ago, valquiria23 said: Hello everyone, I hope you are very well! What is the robots.txt file for? Apparently it is necessary to use it for search engines. Does someone have the robots.txt file updated? In what folder is it placed? in the root or the Catalog folder? From already thank you very much!! Valqui Hey, If you want to understand what it is and how it works then read this, http://www.robotstxt.org/robotstxt.html If you just want to add the file, it need to go same place as your index.php, a basic file for ocS is attached, you can add to it as you wish. Google has a very good help page on this and a tester for your robots.txt file https://support.google.com/webmasters/answer/6062596?hl=en Google also has a tool to make robots.txt files from your site but you need a google account to access this see here https://www.google.com/webmasters/tools/home?hl=en you can read about it in this blog and decide if its for you. https://webmasters.googleblog.com/2008/03/speaking-language-of-robots.html It's a lot to take in but hope it helps. robots.txt Link to comment Share on other sites More sharing options...
Hotclutch Posted July 5, 2018 Share Posted July 5, 2018 Apart from the sitemap entry, which is important, there's not much if anything that you need to put in there. Bots will never get to the admin or include folders because there are no public links pointing there. In any event there is a meta noindex tag in the admin template_top.php so again a bot will never index an admin file. Be sure to password protect the admin folder then the problem is solved. As for disallowing other files, there really is no need. osC has the header tag module -> robots that you can use to apply noindex tags, which is a better option than restricting with robots.txt Link to comment Share on other sites More sharing options...
valquiria23 Posted July 5, 2018 Author Share Posted July 5, 2018 Hello everyone and thanks for your comments. This is my header tag module -> robots configuration. Would you add or remove something? If I add also the robots.txt something happens? I have my index.php in the catalog folder : catalog\index.php , would I leave the robots.txt? In the folder catalog? Thanks for your clarifications! Valqui Community Oscommerce fan You'll find the latest osC community version here. Link to comment Share on other sites More sharing options...
Hotclutch Posted July 5, 2018 Share Posted July 5, 2018 ^^ Yes add robots.txt where you have index.php ( in catalog ). No need to add anything to it, except as i said above the sitemap reference. Link to comment Share on other sites More sharing options...
Jack_mcs Posted July 5, 2018 Share Posted July 5, 2018 The robot file has to go in the root of the account, not the shop. So if your shop is located in public_html/catalog/, the robots file goes in public_html/. If it has entries that are specific to the catalog directory, then the path needs to be included. So if there is a line like disallow: /Images/, it should be disallow: /catalog/images/, if it is the shops images directory being blocked. You can read more about the specification here. Support Links: For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc. Get the latest versions of my addons Recommended SEO Addons Link to comment Share on other sites More sharing options...
Hotclutch Posted July 5, 2018 Share Posted July 5, 2018 My mistake, robots.txt goes in the root of the website. Link to comment Share on other sites More sharing options...
MrPhil Posted July 5, 2018 Share Posted July 5, 2018 3 hours ago, Hotclutch said: Apart from the sitemap entry, which is important, there's not much if anything that you need to put in there. Bots will never get to the admin or include folders because there are no public links pointing there. In any event there is a meta noindex tag in the admin template_top.php so again a bot will never index an admin file. Be sure to password protect the admin folder then the problem is solved. As for disallowing other files, there really is no need. osC has the header tag module -> robots that you can use to apply noindex tags, which is a better option than restricting with robots.txt Not entirely true. As with robots.txt, "noindex" and "nofollow" are only suggestions to search engines, and they are free to ignore them! Not having public links to admin, etc. is no guarantee that they can't be guessed or obtained by other means (phishing, careless forum posts, etc.), so don't depend on that for security. admin should require an ID and password, which should keep out the curious even if they find the name. Link to comment Share on other sites More sharing options...
Hotclutch Posted July 5, 2018 Share Posted July 5, 2018 ^^ Noindex is a directive that will always be obeyed, unlike rel="canonical", at least by googlebot, but I am sure all others. The only time it may appear that google is ignoring the noindex tag is when the page is blocked by a robots.txt file, in which case the bot can't see the noindex directive. Link to comment Share on other sites More sharing options...
ArtcoInc Posted July 5, 2018 Share Posted July 5, 2018 @Hotclutch @MrPhil @Jack_mcs My host allows me to have multiple domains all on one account (and server). My primary domain (for example, <domain 1>) is on their server in the public_html folder. All of my other domains are in sub-directories under that ... public_html --+-- <domain 2> | +-- <domain 3> | +-- <domain 4> | (etc) Wouldn't each domain would need its own sitemap? And, if I need to have a robots.txt to tell the robots where the sitemep file is, how can I have a robots.txt file for each of the individual domains? Or, am I confusing things (again)? M Link to comment Share on other sites More sharing options...
Hotclutch Posted July 5, 2018 Share Posted July 5, 2018 Each domain can have it's own sitemap, robots.txt and htaccess files which would go into the folder of the respective domain. but i guess this might depend on how multiple domain hosting is setup by your host. Link to comment Share on other sites More sharing options...
Jack_mcs Posted July 6, 2018 Share Posted July 6, 2018 6 hours ago, Hotclutch said: Noindex is a directive that will always be obeyed, unlike rel="canonical", at least by googlebot Not exactly. Google will ignore the noindex if they think a page needs to be listed. You can see that alluded to here. They won't rank such a page but they may list it. Support Links: For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc. Get the latest versions of my addons Recommended SEO Addons Link to comment Share on other sites More sharing options...
♥JcMagpie Posted July 6, 2018 Share Posted July 6, 2018 15 hours ago, JcMagpie said: If you just want to add the file, it need to go same place as your index.php, Sorry for any confusion, As all my sites are installed in root I just defaulted to that. Its not going to be true for everyone. Below are some basic guide lines from Google ( as it is my largest customer for indexing I tend to follow them) Format and location rules: The file must be named robots.txt Your site can have only one robots.txt file. The robots.txt file must be located at the root of the website host that it applies to. For instance, to control crawling on all URLs below http://www.example.com/, the robots.txt file must be located at http://www.example.com/robots.txt. It cannot be placed in a subdirectory ( for example, at http://example.com/pages/robots.txt). If you're unsure about how to access your website root, or need permissions to do so, contact your web hosting service provider. If you can't access your website root, use an alternative blocking method such as meta tags. A robots.txt file can apply to subdomains (for example, http://website.example.com/robots.txt) or on non-standard ports (for example, http://example.com:8181/robots.txt). Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.