Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Updated spiders.txt Official Support Topic


stevel

Recommended Posts

A client just noticed a particular IP address adding things to his cart but never checking out. Web Server log entries:

 

209.249.86.4 - - [31/May/2006:17:29:43 +0000] "GET /african-conga-drums-c-44.html?products_id=187&action=buy_now&sort=product_sort_order&osCsid=d729d9bc091bdd7b6472e4744dc02d17 HTTP/1.0" 302 0 "-" "Mozilla/5.0 (compatible; Charlotte/1.0b; [email protected])"

 

I added 'charlotte' to my spiders.txt file (the newest from May) and am monitoring ... is this the right thing to do so they quit adding to the cart?

 

Info on Charlotte: http://www.betaspider.com/

 

Thanks anyone!

Link to comment
Share on other sites

Since that UA includes the word "spider", it would be detected already. The question is - how did it get the session to begin with? Did it have it from a previous visit when Prevent Spider Sessions was not on? It isn't accepting a cookie or else the osCsid wouldn't be in the URL (well, it depends on how it got to that URL...) A more detailed analysis of the access logs would be helpful in seeing what other URLs it visited before this one.

Link to comment
Share on other sites

Since that UA includes the word "spider", it would be detected already. The question is - how did it get the session to begin with? Did it have it from a previous visit when Prevent Spider Sessions was not on? It isn't accepting a cookie or else the osCsid wouldn't be in the URL (well, it depends on how it got to that URL...) A more detailed analysis of the access logs would be helpful in seeing what other URLs it visited before this one.

 

Steve, I'm a little confused ... nothing new.

 

This store has been online for almost two years and "prevent spider sessions" has been set to True that whole time .... I can give you today's log if it would help but looks like every hit from this IP (209.249.86.4) has an osCsid on it. It seems to have a different osCsid every so often too .... what am I doing wrong?

 

Thanks --

Link to comment
Share on other sites

Well, it's getting the SID from somewhere and perhaps not your site. For example, if someone posted a URL from your site including a session ID into a user forum, you could get this problem.

 

Look into the contribution Spider Session Remover and tweak it to look for "Charlotte". This may help.

 

It would also help to turn your Buy Now links into forms. There is another contrib that does that.

Link to comment
Share on other sites

Thanks Steve, I will read the Spider Session Remover and apply. It's odd that there are multiple session IDs and Googlebot and others don't have them.

 

Here's another -- wonder where they're starting? I'll tweak for this one too I guess. "tbot" should have stopped it, right? Always something new...

 

64.124.140.152 - - [31/May/2006:12:27:47 +0000] "GET /deep-carved-djembes-c-21.html?osCsid=6053444377c5e798a5a9c22fc43912ea&products_id=46& sort=product_sort_order HTTP/1.1" 200 37518 "-" "Mozilla/5.0 (compatible; FatBot 2.0; www.FatLens.com)"

Link to comment
Share on other sites

Someone mentioned Fatbot earlier. Note that if the incoming link already has a session ID, then the check of spiders.txt is skipped.

 

what makes you think that ?

Treasurer MFC

Link to comment
Share on other sites

  • 3 weeks later...

Hi steve

 

Normally in the Who's online enhancement I see spiders which have the light as bots with sessions, but the details in the columns below are showing as sessions "No" & no OSCsids appear. Today I noticed this one:

 

spider (i think)

 

which is showing as a session in the session column. I have spiders.txt updated (20/5/06) but not the 30/6/05 Should I have this one & what do I do now? Just incase some get through, do I need the sid killer contribution installed too?

 

Thanks for your help.

 

Julie

Edited by warrenerjm
Link to comment
Share on other sites

picsearch is already in the list. I'm not familiar with the "Who's Online Enhancement", but it does seem to be unreliable in identifying spiders from what I see from others. As long as there's no session, then everything is working correctly. You WANT spiders to be visiting your site - you just don't want them to get sessions.

Link to comment
Share on other sites

picsearch is already in the list. I'm not familiar with the "Who's Online Enhancement", but it does seem to be unreliable in identifying spiders from what I see from others. As long as there's no session, then everything is working correctly. You WANT spiders to be visiting your site - you just don't want them to get sessions.

Steve,

 

WOE uses the spider_flag set by bots when Prevent Spider Sessions is used. So, it only works if that is set to true and works best if the shop has the latest and greatest spiders.txt contrib. The other problem it has is if sessions are not set up just right. It is a little finicky then. e.g. catalog and admin sessions must be stored the same way.

 

ed

Link to comment
Share on other sites

Steve,

 

WOE uses the spider_flag set by bots when Prevent Spider Sessions is used. So, it only works if that is set to true and works best if the shop has the latest and greatest spiders.txt contrib. The other problem it has is if sessions are not set up just right. It is a little finicky then. e.g. catalog and admin sessions must be stored the same way.

 

ed

Hi Steve

 

Thanks for that. I have your latest one, so that is all OK. It just worried me this time, 'cos I saw it registered "Yes" under the session column.

 

Ed

I do have "Prevent spider sessions" set to true. I promise I am not constantly on WOE :lol: , so this one I caught by luck & it was different to the way other spiders have been shown. Is there anything else I shouild be checking ie; you mentioned the way sessions is set up.

 

Thanks guys

 

Julie

Link to comment
Share on other sites

this is a question which i posted in another topci, after which choosealogin told me to report it here, sorry for posting twice

 

this is the url of the post : http://www.oscommerce.com/forums/index.php?showtopic=214897

 

 

i went to download the latest spiders.txt from the contribtuons section (by stonebridgecomputing 13 jun 2006), but it says the file is password protected. does it require a password what should i do now really confused

i tried to skip but all other files it says are password-protected: readme, spiders-large, spiders. so should i completely rest with Stevels' update, any other way.

 

 

pass.gif

Link to comment
Share on other sites

I was not aware of this recent update. Like a lot of well-intentioned updates by others, it is incorrect. Ignore it and just use the most recent one I posted.

 

When I open it, however, I don't see any password protection.

Edited by stevel
Link to comment
Share on other sites

thanks for confirming.

 

this was one more question i asked there , but suits here

 

"ok, downloaded stevels update.

 

but this spiders file has no googlebot or any reference to it. have you (choosealogin) had any problems with it.

 

this is what the readme says

 

The purpose of this file is to tell if an incoming request is from a spider, NOT to identify a particular spider. Therefore, there are some common substrings in the list such as "spider", "crawl", and "obot" which match many different spiders. For example, "ebot" matches Googlebot, "nbot" matches msnbot. The strings in this file MUST be all lowercase, or else they will be ignored. If you think a particular spider is missing from the list, please post in the support topic and include a line from your access log showing the spider access including the full user agent string. Please do not update this contribution unless you fully understand how it works."

Link to comment
Share on other sites

  • 2 weeks later...

has anybody spotted mothra/netscan yet?

 

useragent:

Name: Guest

Customer ID: 0

IP Address: 68.91.253.93

User Agent: mothra/netscan (+http;//www.rexrobards.com

 

 

i ; out the url as it's moving quickly around my site and may be a bad bot.. there seems to be no information available online about it

Link to comment
Share on other sites

  • 2 weeks later...

Doesn't look like a spider - looks more like a DSL or cable modem user. More important than the IP is the user agent string - do you have that?

 

Unfortunately, there are many individuals who run their own indexing or "site scraper" programs which are not readily detectable as spiders.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...