Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Updated spiders.txt Official Support Topic


stevel

Recommended Posts

... do i just ftp spiders.txt to my side and replace the default one? is that it?

If you're looking at the same contribution I am (http://addons.oscommerce.com/info/2455), it says "A replacement for catalog/includes/spiders.txt - updated with newly seen spiders and optimized for quicker processing. For 2.2-MS2 or later." The readme file is worth looking at as well.

 

To answer your question directly, yes, that's all you have to do. Decide which file you want to use, rename it if you pick the large one, and replace the stock one.

Link to comment
Share on other sites

If you're looking at the same contribution I am (http://addons.oscommerce.com/info/2455), it says "A replacement for catalog/includes/spiders.txt - updated with newly seen spiders and optimized for quicker processing. For 2.2-MS2 or later." The readme file is worth looking at as well.

 

To answer your question directly, yes, that's all you have to do. Decide which file you want to use, rename it if you pick the large one, and replace the stock one.

 

Thanks! Very easy!

Link to comment
Share on other sites

  • 3 weeks later...

Hi All I am having problems with livebot still getting session id's Googlebot does not but Livebot and msnbot is starting to annoy me.

 

livebot-65-55-210-42.search.live.com 22:55:16 22:55:16 /cookie_usage.php Yes Not Found

Name: Guest

 

ID: 0

 

IP Address: 65.55.210.42

 

User Agent: msnbot/1.1 (+http://search.msn.com/msnbot.htm)

 

I do have nbot in my spiders.txt but it does not seem to work and clues would be appreciated.

Link to comment
Share on other sites

I do not trust the display you are showing here. Post the entry from your web access log showing the GET of the page from msnbot.

 

 

GET /index.php cPath=42 80 - 65.55.210.37 msnbot/1.1+(+http://search.msn.com/msnbot.htm) 200 0 0 5654 298

GET /shopping_cart.php osCsid=1red00mdgjglncmjijk8rg5ig0 80 - 65.55.210.35 msnbot/1.1+(+http://search.msn.com/msnbot.htm) 200 0 0 4751 331

 

It keeps getting a session ID and being identified in WHo's Online as a customer not a BOT

I also have this one, not sure why it is spidering my site but :

 

GET /index.php - 80 - 208.122.4.142 FreeWebMonitoring+SiteChecker/0.1+(+http://www.freewebmonitoring.com) 200 0 64 343 238

Link to comment
Share on other sites

Are you sure that msnbot hasn't held onto an old osCsid and is using that to access your site? If the first access has the ID, that is likely.

 

That SiteChecker is probably not spidering your site - it is just looking to see if the site is up. You should not see any access other than index.php.

Link to comment
Share on other sites

Are you sure that msnbot hasn't held onto an old osCsid and is using that to access your site? If the first access has the ID, that is likely.

 

That SiteChecker is probably not spidering your site - it is just looking to see if the site is up. You should not see any access other than index.php.

 

 

That is what I thought at first, but I have never seen msnbot without a session ID.

 

Is their a way I can force a 301 redirect if the page is hit by a bot listed in spiders.txt, which should over time remove any session Id's from the index?

 

You help by the way if greatly appreciated !

Link to comment
Share on other sites

There's a contrib called "spider session killer" or similar that does this.

 

If you'll give me your store URL I'll test it to see if spiders.txt is properly being used.

Link to comment
Share on other sites

That is what I thought at first, but I have never seen msnbot without a session ID.

 

Is their a way I can force a 301 redirect if the page is hit by a bot listed in spiders.txt, which should over time remove any session Id's from the index?

 

You help by the way if greatly appreciated !

 

 

url is http:// shop . calibraweighing . co.uk

Edited by Jan Zonjee
Link to comment
Share on other sites

My test shows that msnbot does not get assigned a session on new visits. You may need to get rid of the sessions that it has previously indexed with Spider Session Remover.

Link to comment
Share on other sites

My test shows that msnbot does not get assigned a session on new visits. You may need to get rid of the sessions that it has previously indexed with Spider Session Remover.

 

 

Thank you for that, the Mod is an Apache rewrite which IIS does not have the functionality to this out of the box so not much use to me.

Link to comment
Share on other sites

Hmm. Well, you can do the equivalent in PHP by searching the user agent string for msnbot and if you find it and $session_started is true, use the "header" command to do a 301 redirect to the same URL minus the sid.

Link to comment
Share on other sites

Hmm. Well, you can do the equivalent in PHP by searching the user agent string for msnbot and if you find it and $session_started is true, use the "header" command to do a 301 redirect to the same URL minus the sid.

Hummm what an idea, but even better if we spent some time turning spidrs.txt into an array, then do it, this would speed up the processing time somewhat do you think?

Link to comment
Share on other sites

It gets turned into an array when processed in application_top.php. But what you're implying is that the array will get searched for every connection. As it is now, it gets searched only if there is no sid in the URL (or cookie), and then the only effect is to not start a new session.

Link to comment
Share on other sites

It gets turned into an array when processed in application_top.php. But what you're implying is that the array will get searched for every connection. As it is now, it gets searched only if there is no sid in the URL (or cookie), and then the only effect is to not start a new session.

Maybe somthing like this, my php is a bit rusty so don't laugh !!

<?php 
if (eregi ('oscsid', $_SERVER['REQUEST_URI'])) {
 $user_agent = $_SERVER['HTTP_USER_AGENT'];
 $bots= array("msnbot", "nbot");
  if (eregi ($bots, $user_agent)){
 header('Status: 301 Moved Permanently'); 
 header('Location: http://www.example.com/newurl.html'); 
 exit(); 
 }
 }
?>

Link to comment
Share on other sites

Maybe somthing like this, my php is a bit rusty so don't laugh !!

<?php 
if (eregi ('oscsid', $_SERVER['REQUEST_URI'])) {
 $user_agent = $_SERVER['HTTP_USER_AGENT'];
 $bots= array("msnbot", "nbot");
  if (eregi ($bots, $user_agent)){
 header('Status: 301 Moved Permanently'); 
 header('Location: http://www.example.com/newurl.html'); 
 exit(); 
 }
 }
?>

 

or maybe add an & to the spiders.txt then

<?PHP

if (eregi ('oscsid', $_SERVER['REQUEST_URI'])) {
$filename = "spiders.txt";
$handle = fopen($filename, "r");
$contents = fread($handle, filesize($filename));
fclose($handle);
$spiders_array = explode("&",$contents);
$user_agent = $_SERVER['HTTP_USER_AGENT'];

  if (eregi ($spiders_array, $user_agent)){
 header('Status: 301 Moved Permanently'); 
 header('Location: http://www.example.com/newurl.html'); 
 exit(); 
 }
 }

Edited by parksey
Link to comment
Share on other sites

You like eregi, don't you? :D I would generally code the test for the SID as isset($_GET['osCsid']) I don't quite get what you're doing with the &. The "prevent spider sessions" code already creates an array, one element per record in spiders.txt.

 

I think a good compromise would be to test for the GET parameter, because for MOST users, that will be present for only one page and the cookie will take care of the rest. So if the osCsid GET parameter is present, do the spiders.txt search anyway, and if found (and the sid is in the URL), do the 301 redirect.

Link to comment
Share on other sites

  • 1 month later...

Hi, i did download the updated spiders.txt, but right now it seems a bot is crawling my site:

64.124.148.21 k01.fatlens.com

64.124.148.22 k02.fatlens.com

64.124.148.23 k03.fatlens.com

64.124.148.24 k04.fatlens.com

64.124.148.26 k06.fatlens.com

64.124.148.27 k07.fatlens.com

64.124.148.28 k08.fatlens.com

64.124.148.65 k10.fatlens.com

64.124.148.66 k11.fatlens.com

64.124.148.67 k12.fatlens.com

 

and its creating sessions... i have searched google for fatlens and it seems its a bot from thefind.com (i did added my site some days ago).

 

i have tried to find something similar to "fatlens" in spiders.txt but didnt found anything.... what should i do?

 

thanks

Link to comment
Share on other sites

Show me a line from your access log for one of these. The IP doesn't help.

 

A Google search suggests that the user agent includes the string "Fatbot" which the spiders.txt string "tbot" should pick up.

Link to comment
Share on other sites

Show me a line from your access log for one of these. The IP doesn't help.

 

A Google search suggests that the user agent includes the string "Fatbot" which the spiders.txt string "tbot" should pick up.

 

thanks for the fast reply Steve, sorry im newbie on this and not sure where to look for my access log.... i did install "Visitor Web Stats" and "Who's online enhancement" and thats what im using... but i think you mean something else..

in Visitor Web Stats its just showing as

64.124.148.67

k12.fatlens.com 05/04/2008 00:00:36 1>>> 00:00:00 Guest en-us,en;q=0.5

english Direct

 

for comparison, another line for the googlebot shows

66.249.72.137

crawl-66-249-72-137.googlebot.com 05/03/2008 07:23:11 36>>> 16:36:46 Guest [Mozilla]

english Direct

 

 

and in the who's online, its just k01.fatlens.com....

 

where should i look for the log? has my host to give me access?

 

thanks and sorry for my english :)

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...