stevel Posted January 31, 2006 Author Share Posted January 31, 2006 Christine, There is a robots.txt contibution and, unlike spiders.txt, it does not need to be kept updated. The robots.txt file goes at the top level of your site, so that it can be found with the url http://www.example.com/robots.txt for a site example.com Note that you may have to adjust the contents of robots.txt to show the path you use for your store. I recommend that you subscribe to the announcement topic so you can learn of updates. By the way, I occasionally see others post updates of their own, but most of these seem to seriously misunderstand the way this file is used, with most of the additions being redundant or incorrect. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Guest Posted January 31, 2006 Share Posted January 31, 2006 Hi Steve, and thank you also.. I can say this much.. I'm glad I'm learning about this now before my new and updated store opens up.. If I can prevent this sort of malicious behavior from happening.. I will.. :) Thanks for all of your help Steve.. I've subscribed to the above link mentioned by you.. And will be keeping up to date with this for sure.. Oh, and I did have to do a little work on the robots.txt Being that I don't use the catalog folder.. Thanks for pointing that out. :thumbsup: Christine Quote Link to comment Share on other sites More sharing options...
stevel Posted February 3, 2006 Author Share Posted February 3, 2006 Sigh - subscribing to the announcement topic doesn't do anything useful anymore. Harald took a major step backwards in usability by preventing contrib authors from updating the announcements topics. Instead, any new or changed contrib gets a new thread automatically added. It's now impossible to track individual contributions. You get all or none. I understand the problem he was trying to solve, but in my view the cure is worse than the disease. So perhaps what you should do is set a reminder for yourself once a month to check the contrib page for updates. Usually I update twice a month, but it depends on how many new spiders I see. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Guest Posted February 3, 2006 Share Posted February 3, 2006 Sigh - subscribing to the announcement topic doesn't do anything useful anymore. Harald took a major step backwards in usability by preventing contrib authors from updating the announcements topics. Instead, any new or changed contrib gets a new thread automatically added. It's now impossible to track individual contributions. You get all or none. I understand the problem he was trying to solve, but in my view the cure is worse than the disease. So perhaps what you should do is set a reminder for yourself once a month to check the contrib page for updates. Usually I update twice a month, but it depends on how many new spiders I see. Thanks again Steve.. :) I've already started a folder in my favorites in IE.. So I'm adding anything important that I need referring to, and also checking in on the latest contributions every day too.. :P Thanks Christine Quote Link to comment Share on other sites More sharing options...
kaligear Posted April 1, 2006 Share Posted April 1, 2006 Cookies warning page in Firefox. I updated the spiders.txt file and changed the "prevent spider sessions" to true. After changing the setting to false, I got a cookies warning box when trying to add a product to my cart. I am using firefox, which allows for cookies in its preferences, but somehow I was unable to add a product. Have I done something wrong in installing / applying this update? I saw several references in the forum that this does not prevent spiders from accessing websites, but I am still unsure of what I am doing. I saw that there are some words in the text files that will keep out mozilla and firefox users. Do I need to edit these? I am very new to learning about spiders, robots and the like. Any help is greatly appreciated. Thank you, Leslie Quote Link to comment Share on other sites More sharing options...
stevel Posted April 2, 2006 Author Share Posted April 2, 2006 My mistake - one of the strings I added to the 2006-03-31 file disabled Firefox. I've uploaded a new file. My sincere apologies. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
warrenerjm Posted April 2, 2006 Share Posted April 2, 2006 Christine, There is a robots.txt contibution and, unlike spiders.txt, it does not need to be kept updated. The robots.txt file goes at the top level of your site, so that it can be found with the url http://www.example.com/robots.txt for a site example.com Note that you may have to adjust the contents of robots.txt to show the path you use for your store. Hi I have "prevent spider sessions" set to true. Do I use the newest contribution (2/4/06) & replace my existing file in catalog/includes/spiders.txt? I am assuming this is a full list? This is the first time I have touched this file & it is as default. Once I have done this, do I need to check this contribution for updates every so many weeks & replace the file again? I do not have a robots.txt, so do I create(upload) the file to catalog/ or/? What amendments do | need to do in here, as I am a little confused? I am still amending my site before moving it all up a level. Thanks for your help. Julie Quote Link to comment Share on other sites More sharing options...
stevel Posted April 2, 2006 Author Share Posted April 2, 2006 Yes, you replace the existing file includes/spiders.txt with the one in the latest contribution, which is a fill list. I recommend checking once a month or so for updates. robots.txt is a different file for a different purpose. That file goes in the home directory of your web site and is used by search engines. There is a contribution with a suggested robots.txt - the idea is that you want to ask robots to not bother with certain parts of your site, especially those that only an active customer would use (cart, login, my account, write review, etc.) Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
djo Posted April 3, 2006 Share Posted April 3, 2006 Thank you for this contribution. I discovered it after finding that MSN had a ton of links to my site with session id's. I do have a couple of questions. 1. Now that I have updated the spiders.txt, will the MSN links with id's eventually go away? 2. Is there any anything I can do in the meantime to avoid customers getting mixed up and viewing each others info? Thanks again Quote Link to comment Share on other sites More sharing options...
stevel Posted April 3, 2006 Author Share Posted April 3, 2006 The MSN links may or may not go away over time. See http://www.oscommerce.com/community/contributions,2819 for a contribution that will remove SIDs coming from search engine links. It is not foolproof, though. One alternative I thought of, but have not tried, is to change the string that osC uses to identify the SID = "osCsid". If you change this to something else ("xyzsid" or whatever strikes your fancy), then all of the indexed links with SIDs will no longer have sessions. The risk is if you have customers currently in the store, they will lose their sessions as soon as this happens. Francys 1 Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
djo Posted April 3, 2006 Share Posted April 3, 2006 Thanks Steve. I might give that a try in the late hours when traffic is slow. I don't know much about this, so this may be a silly question, but would the links with the "osCsid" still work if I cange to "xyZsid"? Quote Link to comment Share on other sites More sharing options...
stevel Posted April 3, 2006 Author Share Posted April 3, 2006 Nope. That's the whole idea - if you change the session identifier, then all existing links become links without sessions. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
djo Posted April 3, 2006 Share Posted April 3, 2006 I'll give it a shot. Thanks for your help. Quote Link to comment Share on other sites More sharing options...
gregy Posted April 7, 2006 Share Posted April 7, 2006 hi guys what would i have to enter into spiders.txt to recognize this spider http://www.showmyip.com/?ip=195.250.198.45 .. it is slovenian search engine Najdi.si .. thanx in advance Quote Link to comment Share on other sites More sharing options...
stevel Posted April 7, 2006 Author Share Posted April 7, 2006 Show me a line from your access log for this spider. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
gregy Posted April 7, 2006 Share Posted April 7, 2006 i can give you only this from whos online .. i don't have other access .. Active with Cart 01:09:06 Guest 195.250.198.45 20:52:14 22:01:20 /ask_a_question.php?products_id=298&language=It Yes Not Found Quote Link to comment Share on other sites More sharing options...
stevel Posted April 7, 2006 Author Share Posted April 7, 2006 Ok - add the line: interseek I'll add this to my next update. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
warrenerjm Posted April 8, 2006 Share Posted April 8, 2006 Yes, you replace the existing file includes/spiders.txt with the one in the latest contribution, which is a fill list. I recommend checking once a month or so for updates. robots.txt is a different file for a different purpose. That file goes in the home directory of your web site and is used by search engines. There is a contribution with a suggested robots.txt - the idea is that you want to ask robots to not bother with certain parts of your site, especially those that only an active customer would use (cart, login, my account, write review, etc.) Thanks Steve Sorry it has taken a while to reply to your reply. Is this the contribution you would suggest for the robots.txt? http://www.oscommerce.com/community/contri...l/search,robots Thanks Julie Quote Link to comment Share on other sites More sharing options...
gregy Posted April 8, 2006 Share Posted April 8, 2006 Ok - add the line: interseek I'll add this to my next update. i added interseek .. but Najdi.si bot is still not recognized ... and it is green .. all bots (zyborg, google, yahoo) are red in Who's online? Quote Link to comment Share on other sites More sharing options...
stevel Posted April 8, 2006 Author Share Posted April 8, 2006 I can't help further unless I know what user agent string is being presented by this bot. According to the research I did, this bot includes the string "interseek" in its UA. I'll comment that if the bot is followng a URL with a session ID, it will keep the session even if listed in spiders.txt. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
stevel Posted April 8, 2006 Author Share Posted April 8, 2006 Julie, yes, that's the one. Make sure that the paths in the robots.txt are consistent with your own store. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Guest Posted April 8, 2006 Share Posted April 8, 2006 i added interseek .. but Najdi.si bot is still not recognized ... and it is green .. all bots (zyborg, google, yahoo) are red in Who's online? Gegor, When that bot comes back, click the Customer profile link in whos online (I assume you're using whos online enhanced since you mentioned green vs red). That should give you the user agent. Get that and paste it here for Steve. ed Quote Link to comment Share on other sites More sharing options...
gregy Posted April 8, 2006 Share Posted April 8, 2006 (edited) it says it is a guest .. but i found following DAQ on their homepage .. Q: What is User Agent of Najdi.si spiders? A: User agent of Najdi.si spiders is Mozilla/5.0 (compatible; Najdi.si/3.1) .. this helps? ... btw. Najdi.si is biggest slovenian search engine ... so, it is important :) Edited April 8, 2006 by gregy Quote Link to comment Share on other sites More sharing options...
stevel Posted April 8, 2006 Author Share Posted April 8, 2006 Ok, then add the string: najdi Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
minuteman1970 Posted April 11, 2006 Share Posted April 11, 2006 Hello, I am considering using this contribution, and it seems easy enough to implement. Just so I am clear as to its purpose, I am under the impression that the contribution is designed to prevent spiders from creating SID's, which show up when I am found on a google search. I am currently running the following contributions, hopefully they won't conflict: 1. Ultimate URL 2. Header tags 3. Who's online Am I on the right track? thanks. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.