Jump to content
stevel

Updated spiders.txt Official Support Topic

Recommended Posts

Hi,

 

I have installed the oscommerce 2.2 and I have installed the SID Killer and Ultimate SEO URL Contribution... But nothing is happening there...

 

I want to remove the SID and URL changed into SEO Friendly...

 

anybody give their suggestion to solve the problem...

 

Thanks

Bharathi

Share this post


Link to post
Share on other sites
A replacement for catalog/includes/spiders.txt - updated with newly seen spiders and optimized for quicker processing. For 2.2-MS2 or later.

 

Comments, questions and suggestions welcomed here.

 

http://www.oscommerce.com/community/contributions,2455

 

I am getting the following errors after instaling who is online enhancement

 

On my browser

 

1054 - Unknown column 'hostname' in 'field list'

 

insert into whos_online (customer_id, full_name, session_id, ip_address, hostname, time_entry, time_last_click, last_page_url, http_referer, user_agent) values ('0', 'Guest', '068c9e28d9d03f07cfffc77be2792723', '127.0.0.1', 'localhost', '1237867301', '1237867301', '/osctest1/', '', 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 1.1.4322)')

 

[TEP STOP]

 

 

on my admin panel

1054 - Unknown column 'hostname' in 'field list'

 

select customer_id, full_name, ip_address, hostname, time_entry, time_last_click, last_page_url, http_referer, user_agent, session_id from whos_online order by time_last_click DESC

 

[TEP STOP]

Share this post


Link to post
Share on other sites
Steve,

 

First and foremost, you da Man!!! :thumbsup:

 

I wanted to make you aware that the current spiders list will send a browser with the Teoma Toolbar to the cookies_usage page.

 

Based on my research Teoma's robot reports itself as:

 

Mozilla/2.0 (compatible; Ask Jeeves/Teoma)

 

I would recommend just adding a / in front of the current entry for teoma. I added this to my site and successfully tested the Teoma Toolbar.

I think I'm seeing Teoma getting sessions and adding stuff to carts now (based on the IPs I see on Who's Online). Has anyone else seen this, or am I wrong? Should the / be removed? Can it be listed both with and without the /?

Share this post


Link to post
Share on other sites
If you leave off the slash, then users with the toolbar won't see your store. What's a sample entry from the access log?

66.235.127.136 - - [01/Apr/2009:06:00:56 -0400] "GET /catalog/index.php?cPath=326&sort=products_sort_order&action=buy_now&products_id=735 HTTP/1.1" 302 680 "-" "RedCarpet/1.4 (http://www.pronto.com/robots.html)"

 

Maybe it wasn't Teoma after all..........

 

This is what made me think it was:

NetRange: 66.235.112.0 - 66.235.127.255

CIDR: 66.235.112.0/20

OriginAS: AS16798

NetName: ASK-DOT-COM-NETWORK

NetHandle: NET-66-235-112-0-1

Parent: NET-66-0-0-0-0

NetType: Direct Assignment

NameServer: NAME1.ASK.COM

NameServer: NAME2.ASK.COM

NameServer: NAME5.ASK.COM

NameServer: NAME6.ASK.COM

Comment: http://www.ask.com

 

Need to add RedCarpet?

Edited by baddog

Share this post


Link to post
Share on other sites

No, because the user agent string includes "obot". Note that there is no osCsid in the URL. Unless this is an unusual robot that accepts cookies, it cannot add things to the cart. And because the UA has "obot", it should not be getting a session at all if you have properly enabled Prevent Spidetr Sessions.

Share this post


Link to post
Share on other sites
No, because the user agent string includes "obot". Note that there is no osCsid in the URL. Unless this is an unusual robot that accepts cookies, it cannot add things to the cart. And because the UA has "obot", it should not be getting a session at all if you have properly enabled Prevent Spidetr Sessions.

Well, I have Prevent Spider Sessions set to True and I can tell you that this thing puts stuff in a cart.

Share this post


Link to post
Share on other sites

I don't know what to tell you, then, other than suggesting debugging this with the Firefox add-in User Agent Switcher to "pretend" that you are this robot and see how your store behaves.

Share this post


Link to post
Share on other sites
I don't know what to tell you, then, other than suggesting debugging this with the Firefox add-in User Agent Switcher to "pretend" that you are this robot and see how your store behaves.

I've never used that tool before. I'll play around with it. Can you suggest what I should enter in the boxes to set up RedCarpet as a new User Agent?

 

Description:

User Agent:

App Name:

App Version:

Platform:

Vendor:

Vendor Sub:

 

Thanks.

Share this post


Link to post
Share on other sites
Is there any particular reason why Googlebot and its variations is missing from this file?

I think you will find that they are covered. Do they get a session?


The Coopco Underwear Shop

 

If you live to be 100 years of age, that means you have lived for 36,525 days. Don't waste another, there aren't many left.

Share this post


Link to post
Share on other sites

The latest version of spiders.txt I donloaded contained here http://addons.oscommerce.com/info/2455 clearly doesn't include the user ages specified by Google at http://www.google.com/support/webmasters/b...mp;answer=40364, however when I spoof my UserAgent as Googlebot with cookies disabled session IDs are indeed removed from links.

 

Strange.

Share this post


Link to post
Share on other sites
The latest version of spiders.txt I donloaded contained here http://addons.oscommerce.com/info/2455 clearly doesn't include the user ages specified by Google at http://www.google.com/support/webmasters/b...mp;answer=40364, however when I spoof my UserAgent as Googlebot with cookies disabled session IDs are indeed removed from links.

 

Strange.

 

Not strange at all. In order to match the largest number of bots with the fewest and shortest comparisons, the spiders.txt file contains string fragments that are compared against the User Agent strings.

 

Googlebot is matched by the string "ebot" in the file.

 

The file is optimized to put the most common strings at the top, to reduce overhead.

 

--Glen

Edited by SteveDallas

Share this post


Link to post
Share on other sites

Thanks for the update. I hope this works. Gogglebot is causing havoc on my download site. every time I get visitors it seems to be tagging along and messing up their downloads.

Share this post


Link to post
Share on other sites

I've found a not listed spider: gonzo*

 

It's from a german search site "suchen.de". Please add this to your updates. Thank you for the great contrib!

Share this post


Link to post
Share on other sites

I uploaded new spider.txt. Will this make my site a bit slow?

In catalog/include/configure.php, define('STORE_SESSIONS', 'mysql');

do I have to change mysql to session?

Edited by nudylady

Share this post


Link to post
Share on other sites

is This the latest Spiders.txt file (28/12/2008)

 

I cannot find any other variants.

 

Bling

 

65.55.109.244,

65.55.110.23

64.233.173.2

msnbot-65-55-110-23.search.msn.com

 

and

 

AOL?

195.93.21.68

cache-los-XXXX.proxy.aol.com

 

are picking up session id's and are not being reported as BOTS in visitors tracking or super tracker

Share this post


Link to post
Share on other sites

Andreas, I cannot find any reference to a spider named "gonzo". What is the user agent string?

 

nudylady, typically you want to store sessions in MySQL Using spiders.txt will not noticeably slow down your pages, and will help by preventing search engine spiders from getting session IDs and, in most cases, adding items to carts. It also prevents session IDs from showing up in search results.

 

Chris, yes, that is the latest I have updated. I have not seen any new spiders come along in a while. IPs aren't useful to me - I need to see user agent strings from the access log. msnbot is definitely picked up by spiders.txt. The AOL IPs are from AOL users, not spiders.

Share this post


Link to post
Share on other sites
Andreas, I cannot find any reference to a spider named "gonzo". What is the user agent string?

 

Hi,

 

it's gonzo*, e.g. "GET /robots.txt HTTP/1.1" 200 391 "-" "gonzo2[P] +http://www.suchen.de/faq.html"

 

cheers

Share this post


Link to post
Share on other sites

Steve,

 

Thanks for your reply, msnbot was not being picked up, i understand that if any user agent contains any of the words in spiders.txt then it is regarded as a spider. an extract of your spider.txt has the following that contains 'nbot' which in theory should pickup msnbot, however i have had to specifically add 'msnbot' for it to be recognised in supertracker and visitors tracking (it may be those mods that are not using spiders.txt correctly)

 

lbot

mbot

nbot

pbot

rbot

sbot

tbot

vbot

ybot

zbot

bot.

bot/

_bot

.bot

/bot

-bot

:bot

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×