Jump to content
stevel

Updated spiders.txt Official Support Topic

Recommended Posts

A client just noticed a particular IP address adding things to his cart but never checking out. Web Server log entries:

 

209.249.86.4 - - [31/May/2006:17:29:43 +0000] "GET /african-conga-drums-c-44.html?products_id=187&action=buy_now&sort=product_sort_order&osCsid=d729d9bc091bdd7b6472e4744dc02d17 HTTP/1.0" 302 0 "-" "Mozilla/5.0 (compatible; Charlotte/1.0b; charlotte@betaspider.com)"

 

I added 'charlotte' to my spiders.txt file (the newest from May) and am monitoring ... is this the right thing to do so they quit adding to the cart?

 

Info on Charlotte: http://www.betaspider.com/

 

Thanks anyone!

Share this post


Link to post
Share on other sites

Since that UA includes the word "spider", it would be detected already. The question is - how did it get the session to begin with? Did it have it from a previous visit when Prevent Spider Sessions was not on? It isn't accepting a cookie or else the osCsid wouldn't be in the URL (well, it depends on how it got to that URL...) A more detailed analysis of the access logs would be helpful in seeing what other URLs it visited before this one.

Share this post


Link to post
Share on other sites
Since that UA includes the word "spider", it would be detected already. The question is - how did it get the session to begin with? Did it have it from a previous visit when Prevent Spider Sessions was not on? It isn't accepting a cookie or else the osCsid wouldn't be in the URL (well, it depends on how it got to that URL...) A more detailed analysis of the access logs would be helpful in seeing what other URLs it visited before this one.

 

Steve, I'm a little confused ... nothing new.

 

This store has been online for almost two years and "prevent spider sessions" has been set to True that whole time .... I can give you today's log if it would help but looks like every hit from this IP (209.249.86.4) has an osCsid on it. It seems to have a different osCsid every so often too .... what am I doing wrong?

 

Thanks --

Share this post


Link to post
Share on other sites

Well, it's getting the SID from somewhere and perhaps not your site. For example, if someone posted a URL from your site including a session ID into a user forum, you could get this problem.

 

Look into the contribution Spider Session Remover and tweak it to look for "Charlotte". This may help.

 

It would also help to turn your Buy Now links into forms. There is another contrib that does that.

Share this post


Link to post
Share on other sites

Thanks Steve, I will read the Spider Session Remover and apply. It's odd that there are multiple session IDs and Googlebot and others don't have them.

 

Here's another -- wonder where they're starting? I'll tweak for this one too I guess. "tbot" should have stopped it, right? Always something new...

 

64.124.140.152 - - [31/May/2006:12:27:47 +0000] "GET /deep-carved-djembes-c-21.html?osCsid=6053444377c5e798a5a9c22fc43912ea&products_id=46& sort=product_sort_order HTTP/1.1" 200 37518 "-" "Mozilla/5.0 (compatible; FatBot 2.0; www.FatLens.com)"

Share this post


Link to post
Share on other sites
Someone mentioned Fatbot earlier. Note that if the incoming link already has a session ID, then the check of spiders.txt is skipped.

 

what makes you think that ?


Treasurer MFC

Share this post


Link to post
Share on other sites

Hi steve

 

Normally in the Who's online enhancement I see spiders which have the light as bots with sessions, but the details in the columns below are showing as sessions "No" & no OSCsids appear. Today I noticed this one:

 

spider (i think)

 

which is showing as a session in the session column. I have spiders.txt updated (20/5/06) but not the 30/6/05 Should I have this one & what do I do now? Just incase some get through, do I need the sid killer contribution installed too?

 

Thanks for your help.

 

Julie

Edited by warrenerjm

Share this post


Link to post
Share on other sites

picsearch is already in the list. I'm not familiar with the "Who's Online Enhancement", but it does seem to be unreliable in identifying spiders from what I see from others. As long as there's no session, then everything is working correctly. You WANT spiders to be visiting your site - you just don't want them to get sessions.

Share this post


Link to post
Share on other sites
picsearch is already in the list. I'm not familiar with the "Who's Online Enhancement", but it does seem to be unreliable in identifying spiders from what I see from others. As long as there's no session, then everything is working correctly. You WANT spiders to be visiting your site - you just don't want them to get sessions.

Steve,

 

WOE uses the spider_flag set by bots when Prevent Spider Sessions is used. So, it only works if that is set to true and works best if the shop has the latest and greatest spiders.txt contrib. The other problem it has is if sessions are not set up just right. It is a little finicky then. e.g. catalog and admin sessions must be stored the same way.

 

ed


Answers to osCommerce's most persistent questions! Tips & Tricks | Configuration | Common Problems.

Seek and ye shall find Contributions.

My Contributions

My Blog

Share this post


Link to post
Share on other sites
Steve,

 

WOE uses the spider_flag set by bots when Prevent Spider Sessions is used. So, it only works if that is set to true and works best if the shop has the latest and greatest spiders.txt contrib. The other problem it has is if sessions are not set up just right. It is a little finicky then. e.g. catalog and admin sessions must be stored the same way.

 

ed

Hi Steve

 

Thanks for that. I have your latest one, so that is all OK. It just worried me this time, 'cos I saw it registered "Yes" under the session column.

 

Ed

I do have "Prevent spider sessions" set to true. I promise I am not constantly on WOE :lol: , so this one I caught by luck & it was different to the way other spiders have been shown. Is there anything else I shouild be checking ie; you mentioned the way sessions is set up.

 

Thanks guys

 

Julie

Share this post


Link to post
Share on other sites

this is a question which i posted in another topci, after which choosealogin told me to report it here, sorry for posting twice

 

this is the url of the post : http://forums.oscommerce.com/index.php?showtopic=214897

 

 

i went to download the latest spiders.txt from the contribtuons section (by stonebridgecomputing 13 jun 2006), but it says the file is password protected. does it require a password what should i do now really confused

i tried to skip but all other files it says are password-protected: readme, spiders-large, spiders. so should i completely rest with Stevels' update, any other way.

 

 

pass.gif

Share this post


Link to post
Share on other sites

I was not aware of this recent update. Like a lot of well-intentioned updates by others, it is incorrect. Ignore it and just use the most recent one I posted.

 

When I open it, however, I don't see any password protection.

Edited by stevel

Share this post


Link to post
Share on other sites

thanks for confirming.

 

this was one more question i asked there , but suits here

 

"ok, downloaded stevels update.

 

but this spiders file has no googlebot or any reference to it. have you (choosealogin) had any problems with it.

 

this is what the readme says

 

The purpose of this file is to tell if an incoming request is from a spider, NOT to identify a particular spider. Therefore, there are some common substrings in the list such as "spider", "crawl", and "obot" which match many different spiders. For example, "ebot" matches Googlebot, "nbot" matches msnbot. The strings in this file MUST be all lowercase, or else they will be ignored. If you think a particular spider is missing from the list, please post in the support topic and include a line from your access log showing the spider access including the full user agent string. Please do not update this contribution unless you fully understand how it works."

Share this post


Link to post
Share on other sites

has anybody spotted mothra/netscan yet?

 

useragent:

Name: Guest

Customer ID: 0

IP Address: 68.91.253.93

User Agent: mothra/netscan (+http;//www.rexrobards.com

 

 

i ; out the url as it's moving quickly around my site and may be a bad bot.. there seems to be no information available online about it

Share this post


Link to post
Share on other sites

Hi guys,

I am getting a lot of sessions being created by this IP address - 66.249.66.101

any idea if this is a search engine?

 

Kuanl

Share this post


Link to post
Share on other sites

I got Hits from :

United Kingdom 212.95.252.16

dex-252-16.dxi.net

 

Is this a spider and if yes, how do I have to exclude them, so they didnt get a session ID ?

 

Thanks in advance

Andreas

Share this post


Link to post
Share on other sites

Doesn't look like a spider - looks more like a DSL or cable modem user. More important than the IP is the user agent string - do you have that?

 

Unfortunately, there are many individuals who run their own indexing or "site scraper" programs which are not readily detectable as spiders.

Share this post


Link to post
Share on other sites

No, not at hand. User string was only winnt ..... nothing special.

I guess, you're right, that this is a user. I should be happy :-)

 

Thanks,

Andreas

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×