Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Updated spiders.txt Official Support Topic


stevel

Recommended Posts

Hi,

 

I have installed the oscommerce 2.2 and I have installed the SID Killer and Ultimate SEO URL Contribution... But nothing is happening there...

 

I want to remove the SID and URL changed into SEO Friendly...

 

anybody give their suggestion to solve the problem...

 

Thanks

Bharathi

Link to comment
Share on other sites

  • 3 weeks later...
A replacement for catalog/includes/spiders.txt - updated with newly seen spiders and optimized for quicker processing. For 2.2-MS2 or later.

 

Comments, questions and suggestions welcomed here.

 

http://www.oscommerce.com/community/contributions,2455

 

I am getting the following errors after instaling who is online enhancement

 

On my browser

 

1054 - Unknown column 'hostname' in 'field list'

 

insert into whos_online (customer_id, full_name, session_id, ip_address, hostname, time_entry, time_last_click, last_page_url, http_referer, user_agent) values ('0', 'Guest', '068c9e28d9d03f07cfffc77be2792723', '127.0.0.1', 'localhost', '1237867301', '1237867301', '/osctest1/', '', 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 1.1.4322)')

 

[TEP STOP]

 

 

on my admin panel

1054 - Unknown column 'hostname' in 'field list'

 

select customer_id, full_name, ip_address, hostname, time_entry, time_last_click, last_page_url, http_referer, user_agent, session_id from whos_online order by time_last_click DESC

 

[TEP STOP]

Link to comment
Share on other sites

I am getting the following errors after instaling who is online enhancement

 

Who's Online Enhancement has its own support thread, which you will find at http://www.oscommerce.com/forums/index.php?showtopic=124853.

 

The above link appears in the description of most full uploads to Who's Online Enhancement.

 

--Glen

Link to comment
Share on other sites

  • 2 weeks later...
Steve,

 

First and foremost, you da Man!!! :thumbsup:

 

I wanted to make you aware that the current spiders list will send a browser with the Teoma Toolbar to the cookies_usage page.

 

Based on my research Teoma's robot reports itself as:

 

Mozilla/2.0 (compatible; Ask Jeeves/Teoma)

 

I would recommend just adding a / in front of the current entry for teoma. I added this to my site and successfully tested the Teoma Toolbar.

I think I'm seeing Teoma getting sessions and adding stuff to carts now (based on the IPs I see on Who's Online). Has anyone else seen this, or am I wrong? Should the / be removed? Can it be listed both with and without the /?

Link to comment
Share on other sites

If you leave off the slash, then users with the toolbar won't see your store. What's a sample entry from the access log?

66.235.127.136 - - [01/Apr/2009:06:00:56 -0400] "GET /catalog/index.php?cPath=326&sort=products_sort_order&action=buy_now&products_id=735 HTTP/1.1" 302 680 "-" "RedCarpet/1.4 (http://www.pronto.com/robots.html)"

 

Maybe it wasn't Teoma after all..........

 

This is what made me think it was:

NetRange: 66.235.112.0 - 66.235.127.255

CIDR: 66.235.112.0/20

OriginAS: AS16798

NetName: ASK-DOT-COM-NETWORK

NetHandle: NET-66-235-112-0-1

Parent: NET-66-0-0-0-0

NetType: Direct Assignment

NameServer: NAME1.ASK.COM

NameServer: NAME2.ASK.COM

NameServer: NAME5.ASK.COM

NameServer: NAME6.ASK.COM

Comment: http://www.ask.com

 

Need to add RedCarpet?

Edited by baddog
Link to comment
Share on other sites

No, because the user agent string includes "obot". Note that there is no osCsid in the URL. Unless this is an unusual robot that accepts cookies, it cannot add things to the cart. And because the UA has "obot", it should not be getting a session at all if you have properly enabled Prevent Spidetr Sessions.

Link to comment
Share on other sites

No, because the user agent string includes "obot". Note that there is no osCsid in the URL. Unless this is an unusual robot that accepts cookies, it cannot add things to the cart. And because the UA has "obot", it should not be getting a session at all if you have properly enabled Prevent Spidetr Sessions.

Well, I have Prevent Spider Sessions set to True and I can tell you that this thing puts stuff in a cart.

Link to comment
Share on other sites

I don't know what to tell you, then, other than suggesting debugging this with the Firefox add-in User Agent Switcher to "pretend" that you are this robot and see how your store behaves.

Link to comment
Share on other sites

I don't know what to tell you, then, other than suggesting debugging this with the Firefox add-in User Agent Switcher to "pretend" that you are this robot and see how your store behaves.

I've never used that tool before. I'll play around with it. Can you suggest what I should enter in the boxes to set up RedCarpet as a new User Agent?

 

Description:

User Agent:

App Name:

App Version:

Platform:

Vendor:

Vendor Sub:

 

Thanks.

Link to comment
Share on other sites

  • 2 weeks later...
Is there any particular reason why Googlebot and its variations is missing from this file?

I think you will find that they are covered. Do they get a session?

Link to comment
Share on other sites

The latest version of spiders.txt I donloaded contained here http://addons.oscommerce.com/info/2455 clearly doesn't include the user ages specified by Google at http://www.google.com/support/webmasters/b...mp;answer=40364, however when I spoof my UserAgent as Googlebot with cookies disabled session IDs are indeed removed from links.

 

Strange.

Link to comment
Share on other sites

The latest version of spiders.txt I donloaded contained here http://addons.oscommerce.com/info/2455 clearly doesn't include the user ages specified by Google at http://www.google.com/support/webmasters/b...mp;answer=40364, however when I spoof my UserAgent as Googlebot with cookies disabled session IDs are indeed removed from links.

 

Strange.

 

Not strange at all. In order to match the largest number of bots with the fewest and shortest comparisons, the spiders.txt file contains string fragments that are compared against the User Agent strings.

 

Googlebot is matched by the string "ebot" in the file.

 

The file is optimized to put the most common strings at the top, to reduce overhead.

 

--Glen

Edited by SteveDallas
Link to comment
Share on other sites

  • 1 month later...

Thanks for the update. I hope this works. Gogglebot is causing havoc on my download site. every time I get visitors it seems to be tagging along and messing up their downloads.

Link to comment
Share on other sites

  • 2 months later...
  • 2 weeks later...
  • 2 weeks later...

is This the latest Spiders.txt file (28/12/2008)

 

I cannot find any other variants.

 

Bling

 

65.55.109.244,

65.55.110.23

64.233.173.2

msnbot-65-55-110-23.search.msn.com

 

and

 

AOL?

195.93.21.68

cache-los-XXXX.proxy.aol.com

 

are picking up session id's and are not being reported as BOTS in visitors tracking or super tracker

Link to comment
Share on other sites

Andreas, I cannot find any reference to a spider named "gonzo". What is the user agent string?

 

nudylady, typically you want to store sessions in MySQL Using spiders.txt will not noticeably slow down your pages, and will help by preventing search engine spiders from getting session IDs and, in most cases, adding items to carts. It also prevents session IDs from showing up in search results.

 

Chris, yes, that is the latest I have updated. I have not seen any new spiders come along in a while. IPs aren't useful to me - I need to see user agent strings from the access log. msnbot is definitely picked up by spiders.txt. The AOL IPs are from AOL users, not spiders.

Link to comment
Share on other sites

Steve,

 

Thanks for your reply, msnbot was not being picked up, i understand that if any user agent contains any of the words in spiders.txt then it is regarded as a spider. an extract of your spider.txt has the following that contains 'nbot' which in theory should pickup msnbot, however i have had to specifically add 'msnbot' for it to be recognised in supertracker and visitors tracking (it may be those mods that are not using spiders.txt correctly)

 

lbot

mbot

nbot

pbot

rbot

sbot

tbot

vbot

ybot

zbot

bot.

bot/

_bot

.bot

/bot

-bot

:bot

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...