Loon Posted May 31, 2006 Share Posted May 31, 2006 A client just noticed a particular IP address adding things to his cart but never checking out. Web Server log entries: 209.249.86.4 - - [31/May/2006:17:29:43 +0000] "GET /african-conga-drums-c-44.html?products_id=187&action=buy_now&sort=product_sort_order&osCsid=d729d9bc091bdd7b6472e4744dc02d17 HTTP/1.0" 302 0 "-" "Mozilla/5.0 (compatible; Charlotte/1.0b; [email protected])" I added 'charlotte' to my spiders.txt file (the newest from May) and am monitoring ... is this the right thing to do so they quit adding to the cart? Info on Charlotte: http://www.betaspider.com/ Thanks anyone! Quote Link to comment Share on other sites More sharing options...
stevel Posted May 31, 2006 Author Share Posted May 31, 2006 Since that UA includes the word "spider", it would be detected already. The question is - how did it get the session to begin with? Did it have it from a previous visit when Prevent Spider Sessions was not on? It isn't accepting a cookie or else the osCsid wouldn't be in the URL (well, it depends on how it got to that URL...) A more detailed analysis of the access logs would be helpful in seeing what other URLs it visited before this one. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Loon Posted May 31, 2006 Share Posted May 31, 2006 Since that UA includes the word "spider", it would be detected already. The question is - how did it get the session to begin with? Did it have it from a previous visit when Prevent Spider Sessions was not on? It isn't accepting a cookie or else the osCsid wouldn't be in the URL (well, it depends on how it got to that URL...) A more detailed analysis of the access logs would be helpful in seeing what other URLs it visited before this one. Steve, I'm a little confused ... nothing new. This store has been online for almost two years and "prevent spider sessions" has been set to True that whole time .... I can give you today's log if it would help but looks like every hit from this IP (209.249.86.4) has an osCsid on it. It seems to have a different osCsid every so often too .... what am I doing wrong? Thanks -- Quote Link to comment Share on other sites More sharing options...
stevel Posted May 31, 2006 Author Share Posted May 31, 2006 Well, it's getting the SID from somewhere and perhaps not your site. For example, if someone posted a URL from your site including a session ID into a user forum, you could get this problem. Look into the contribution Spider Session Remover and tweak it to look for "Charlotte". This may help. It would also help to turn your Buy Now links into forms. There is another contrib that does that. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Loon Posted May 31, 2006 Share Posted May 31, 2006 Thanks Steve, I will read the Spider Session Remover and apply. It's odd that there are multiple session IDs and Googlebot and others don't have them. Here's another -- wonder where they're starting? I'll tweak for this one too I guess. "tbot" should have stopped it, right? Always something new... 64.124.140.152 - - [31/May/2006:12:27:47 +0000] "GET /deep-carved-djembes-c-21.html?osCsid=6053444377c5e798a5a9c22fc43912ea&products_id=46& sort=product_sort_order HTTP/1.1" 200 37518 "-" "Mozilla/5.0 (compatible; FatBot 2.0; www.FatLens.com)" Quote Link to comment Share on other sites More sharing options...
stevel Posted May 31, 2006 Author Share Posted May 31, 2006 Someone mentioned Fatbot earlier. Note that if the incoming link already has a session ID, then the check of spiders.txt is skipped. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
boxtel Posted June 1, 2006 Share Posted June 1, 2006 Someone mentioned Fatbot earlier. Note that if the incoming link already has a session ID, then the check of spiders.txt is skipped. what makes you think that ? Quote Treasurer MFC Link to comment Share on other sites More sharing options...
warrenerjm Posted June 16, 2006 Share Posted June 16, 2006 (edited) Hi steve Normally in the Who's online enhancement I see spiders which have the light as bots with sessions, but the details in the columns below are showing as sessions "No" & no OSCsids appear. Today I noticed this one: spider (i think) which is showing as a session in the session column. I have spiders.txt updated (20/5/06) but not the 30/6/05 Should I have this one & what do I do now? Just incase some get through, do I need the sid killer contribution installed too? Thanks for your help. Julie Edited June 16, 2006 by warrenerjm Quote Link to comment Share on other sites More sharing options...
stevel Posted June 16, 2006 Author Share Posted June 16, 2006 picsearch is already in the list. I'm not familiar with the "Who's Online Enhancement", but it does seem to be unreliable in identifying spiders from what I see from others. As long as there's no session, then everything is working correctly. You WANT spiders to be visiting your site - you just don't want them to get sessions. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Guest Posted June 17, 2006 Share Posted June 17, 2006 picsearch is already in the list. I'm not familiar with the "Who's Online Enhancement", but it does seem to be unreliable in identifying spiders from what I see from others. As long as there's no session, then everything is working correctly. You WANT spiders to be visiting your site - you just don't want them to get sessions. Steve, WOE uses the spider_flag set by bots when Prevent Spider Sessions is used. So, it only works if that is set to true and works best if the shop has the latest and greatest spiders.txt contrib. The other problem it has is if sessions are not set up just right. It is a little finicky then. e.g. catalog and admin sessions must be stored the same way. ed Quote Link to comment Share on other sites More sharing options...
warrenerjm Posted June 17, 2006 Share Posted June 17, 2006 Steve, WOE uses the spider_flag set by bots when Prevent Spider Sessions is used. So, it only works if that is set to true and works best if the shop has the latest and greatest spiders.txt contrib. The other problem it has is if sessions are not set up just right. It is a little finicky then. e.g. catalog and admin sessions must be stored the same way. ed Hi Steve Thanks for that. I have your latest one, so that is all OK. It just worried me this time, 'cos I saw it registered "Yes" under the session column. Ed I do have "Prevent spider sessions" set to true. I promise I am not constantly on WOE :lol: , so this one I caught by luck & it was different to the way other spiders have been shown. Is there anything else I shouild be checking ie; you mentioned the way sessions is set up. Thanks guys Julie Quote Link to comment Share on other sites More sharing options...
yanarasod Posted June 20, 2006 Share Posted June 20, 2006 this is a question which i posted in another topci, after which choosealogin told me to report it here, sorry for posting twice this is the url of the post : http://www.oscommerce.com/forums/index.php?showtopic=214897 i went to download the latest spiders.txt from the contribtuons section (by stonebridgecomputing 13 jun 2006), but it says the file is password protected. does it require a password what should i do now really confused i tried to skip but all other files it says are password-protected: readme, spiders-large, spiders. so should i completely rest with Stevels' update, any other way. Quote Link to comment Share on other sites More sharing options...
stevel Posted June 20, 2006 Author Share Posted June 20, 2006 (edited) I was not aware of this recent update. Like a lot of well-intentioned updates by others, it is incorrect. Ignore it and just use the most recent one I posted. When I open it, however, I don't see any password protection. Edited June 20, 2006 by stevel Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
yanarasod Posted June 20, 2006 Share Posted June 20, 2006 thanks for confirming. this was one more question i asked there , but suits here "ok, downloaded stevels update. but this spiders file has no googlebot or any reference to it. have you (choosealogin) had any problems with it. this is what the readme says The purpose of this file is to tell if an incoming request is from a spider, NOT to identify a particular spider. Therefore, there are some common substrings in the list such as "spider", "crawl", and "obot" which match many different spiders. For example, "ebot" matches Googlebot, "nbot" matches msnbot. The strings in this file MUST be all lowercase, or else they will be ignored. If you think a particular spider is missing from the list, please post in the support topic and include a line from your access log showing the spider access including the full user agent string. Please do not update this contribution unless you fully understand how it works." Quote Link to comment Share on other sites More sharing options...
stevel Posted June 20, 2006 Author Share Posted June 20, 2006 As the readme says, the string "ebot" catches Googlebot. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
yanarasod Posted June 20, 2006 Share Posted June 20, 2006 thanks for clearing the doubts. Quote Link to comment Share on other sites More sharing options...
stevel Posted July 4, 2006 Author Share Posted July 4, 2006 I just uploaded a new update with several new strings. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Guest Posted July 4, 2006 Share Posted July 4, 2006 has anybody spotted mothra/netscan yet? useragent: Name: Guest Customer ID: 0 IP Address: 68.91.253.93 User Agent: mothra/netscan (+http;//www.rexrobards.com i ; out the url as it's moving quickly around my site and may be a bad bot.. there seems to be no information available online about it Quote Link to comment Share on other sites More sharing options...
stevel Posted July 4, 2006 Author Share Posted July 4, 2006 I haven't seen it on my sites. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
kunal247 Posted July 9, 2006 Share Posted July 9, 2006 Hi guys, I am getting a lot of sessions being created by this IP address - 66.249.66.101 any idea if this is a search engine? Kuanl Quote Link to comment Share on other sites More sharing options...
stevel Posted July 9, 2006 Author Share Posted July 9, 2006 That's Google. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
stevel Posted July 9, 2006 Author Share Posted July 9, 2006 Are you sure it's getting a session? Don't rely on "Who's Online" contributions, look at your access log. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Andreas2003 Posted July 19, 2006 Share Posted July 19, 2006 I got Hits from : United Kingdom 212.95.252.16 dex-252-16.dxi.net Is this a spider and if yes, how do I have to exclude them, so they didnt get a session ID ? Thanks in advance Andreas Quote Link to comment Share on other sites More sharing options...
stevel Posted July 19, 2006 Author Share Posted July 19, 2006 Doesn't look like a spider - looks more like a DSL or cable modem user. More important than the IP is the user agent string - do you have that? Unfortunately, there are many individuals who run their own indexing or "site scraper" programs which are not readily detectable as spiders. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Andreas2003 Posted July 19, 2006 Share Posted July 19, 2006 No, not at hand. User string was only winnt ..... nothing special. I guess, you're right, that this is a user. I should be happy :-) Thanks, Andreas Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.