adam777 Posted February 11, 2008 Share Posted February 11, 2008 ... do i just ftp spiders.txt to my side and replace the default one? is that it? Quote Link to comment Share on other sites More sharing options...
Guest Posted February 11, 2008 Share Posted February 11, 2008 ... do i just ftp spiders.txt to my side and replace the default one? is that it? If you're looking at the same contribution I am (http://addons.oscommerce.com/info/2455), it says "A replacement for catalog/includes/spiders.txt - updated with newly seen spiders and optimized for quicker processing. For 2.2-MS2 or later." The readme file is worth looking at as well. Â To answer your question directly, yes, that's all you have to do. Decide which file you want to use, rename it if you pick the large one, and replace the stock one. Quote Link to comment Share on other sites More sharing options...
adam777 Posted February 11, 2008 Share Posted February 11, 2008 If you're looking at the same contribution I am (http://addons.oscommerce.com/info/2455), it says "A replacement for catalog/includes/spiders.txt - updated with newly seen spiders and optimized for quicker processing. For 2.2-MS2 or later." The readme file is worth looking at as well. To answer your question directly, yes, that's all you have to do. Decide which file you want to use, rename it if you pick the large one, and replace the stock one.  Thanks! Very easy! Quote Link to comment Share on other sites More sharing options...
parksey Posted March 2, 2008 Share Posted March 2, 2008 Hi All I am having problems with livebot still getting session id's Googlebot does not but Livebot and msnbot is starting to annoy me.  livebot-65-55-210-42.search.live.com 22:55:16 22:55:16 /cookie_usage.php Yes Not Found Name: Guest  ID: 0  IP Address: 65.55.210.42  User Agent: msnbot/1.1 (+http://search.msn.com/msnbot.htm)  I do have nbot in my spiders.txt but it does not seem to work and clues would be appreciated. Quote Link to comment Share on other sites More sharing options...
stevel Posted March 3, 2008 Author Share Posted March 3, 2008 I do not trust the display you are showing here. Post the entry from your web access log showing the GET of the page from msnbot. Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
parksey Posted March 3, 2008 Share Posted March 3, 2008 I do not trust the display you are showing here. Post the entry from your web access log showing the GET of the page from msnbot. Â Â GET /index.php cPath=42 80 - 65.55.210.37 msnbot/1.1+(+http://search.msn.com/msnbot.htm) 200 0 0 5654 298 GET /shopping_cart.php osCsid=1red00mdgjglncmjijk8rg5ig0 80 - 65.55.210.35 msnbot/1.1+(+http://search.msn.com/msnbot.htm) 200 0 0 4751 331 Â It keeps getting a session ID and being identified in WHo's Online as a customer not a BOT I also have this one, not sure why it is spidering my site but : Â GET /index.php - 80 - 208.122.4.142 FreeWebMonitoring+SiteChecker/0.1+(+http://www.freewebmonitoring.com) 200 0 64 343 238 Quote Link to comment Share on other sites More sharing options...
stevel Posted March 3, 2008 Author Share Posted March 3, 2008 Are you sure that msnbot hasn't held onto an old osCsid and is using that to access your site? If the first access has the ID, that is likely.  That SiteChecker is probably not spidering your site - it is just looking to see if the site is up. You should not see any access other than index.php. Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
parksey Posted March 3, 2008 Share Posted March 3, 2008 Are you sure that msnbot hasn't held onto an old osCsid and is using that to access your site? If the first access has the ID, that is likely. That SiteChecker is probably not spidering your site - it is just looking to see if the site is up. You should not see any access other than index.php.   That is what I thought at first, but I have never seen msnbot without a session ID.  Is their a way I can force a 301 redirect if the page is hit by a bot listed in spiders.txt, which should over time remove any session Id's from the index?  You help by the way if greatly appreciated ! Quote Link to comment Share on other sites More sharing options...
stevel Posted March 4, 2008 Author Share Posted March 4, 2008 There's a contrib called "spider session killer" or similar that does this.  If you'll give me your store URL I'll test it to see if spiders.txt is properly being used. Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
parksey Posted March 4, 2008 Share Posted March 4, 2008 (edited) That is what I thought at first, but I have never seen msnbot without a session ID. Is their a way I can force a 301 redirect if the page is hit by a bot listed in spiders.txt, which should over time remove any session Id's from the index?  You help by the way if greatly appreciated !   url is http:// shop . calibraweighing . co.uk Edited March 4, 2008 by Jan Zonjee Quote Link to comment Share on other sites More sharing options...
stevel Posted March 5, 2008 Author Share Posted March 5, 2008 My test shows that msnbot does not get assigned a session on new visits. You may need to get rid of the sessions that it has previously indexed with Spider Session Remover. Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
parksey Posted March 5, 2008 Share Posted March 5, 2008 My test shows that msnbot does not get assigned a session on new visits. You may need to get rid of the sessions that it has previously indexed with Spider Session Remover. Â Â Thank you for that, the Mod is an Apache rewrite which IIS does not have the functionality to this out of the box so not much use to me. Quote Link to comment Share on other sites More sharing options...
stevel Posted March 5, 2008 Author Share Posted March 5, 2008 Hmm. Well, you can do the equivalent in PHP by searching the user agent string for msnbot and if you find it and $session_started is true, use the "header" command to do a 301 redirect to the same URL minus the sid. Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
parksey Posted March 5, 2008 Share Posted March 5, 2008 Hmm. Well, you can do the equivalent in PHP by searching the user agent string for msnbot and if you find it and $session_started is true, use the "header" command to do a 301 redirect to the same URL minus the sid. Hummm what an idea, but even better if we spent some time turning spidrs.txt into an array, then do it, this would speed up the processing time somewhat do you think? Quote Link to comment Share on other sites More sharing options...
stevel Posted March 5, 2008 Author Share Posted March 5, 2008 It gets turned into an array when processed in application_top.php. But what you're implying is that the array will get searched for every connection. As it is now, it gets searched only if there is no sid in the URL (or cookie), and then the only effect is to not start a new session. Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
parksey Posted March 5, 2008 Share Posted March 5, 2008 It gets turned into an array when processed in application_top.php. But what you're implying is that the array will get searched for every connection. As it is now, it gets searched only if there is no sid in the URL (or cookie), and then the only effect is to not start a new session. Maybe somthing like this, my php is a bit rusty so don't laugh !! <?php if (eregi ('oscsid', $_SERVER['REQUEST_URI'])) { $user_agent = $_SERVER['HTTP_USER_AGENT']; $bots= array("msnbot", "nbot"); if (eregi ($bots, $user_agent)){ header('Status: 301 Moved Permanently'); header('Location: http://www.example.com/newurl.html'); exit(); } } ?> Quote Link to comment Share on other sites More sharing options...
stevel Posted March 5, 2008 Author Share Posted March 5, 2008 You'll want to downcase the user agent, but that's the general idea. Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
parksey Posted March 5, 2008 Share Posted March 5, 2008 (edited) Maybe somthing like this, my php is a bit rusty so don't laugh !! <?php if (eregi ('oscsid', $_SERVER['REQUEST_URI'])) { $user_agent = $_SERVER['HTTP_USER_AGENT']; $bots= array("msnbot", "nbot"); if (eregi ($bots, $user_agent)){ header('Status: 301 Moved Permanently'); header('Location: http://www.example.com/newurl.html'); exit(); } } ?> Â or maybe add an & to the spiders.txt then <?PHP if (eregi ('oscsid', $_SERVER['REQUEST_URI'])) { $filename = "spiders.txt"; $handle = fopen($filename, "r"); $contents = fread($handle, filesize($filename)); fclose($handle); $spiders_array = explode("&",$contents); $user_agent = $_SERVER['HTTP_USER_AGENT']; if (eregi ($spiders_array, $user_agent)){ header('Status: 301 Moved Permanently'); header('Location: http://www.example.com/newurl.html'); exit(); } } Edited March 5, 2008 by parksey Quote Link to comment Share on other sites More sharing options...
stevel Posted March 6, 2008 Author Share Posted March 6, 2008 You like eregi, don't you? :D I would generally code the test for the SID as isset($_GET['osCsid']) I don't quite get what you're doing with the &. The "prevent spider sessions" code already creates an array, one element per record in spiders.txt.  I think a good compromise would be to test for the GET parameter, because for MOST users, that will be present for only one page and the cookie will take care of the rest. So if the osCsid GET parameter is present, do the spiders.txt search anyway, and if found (and the sid is in the URL), do the 301 redirect. Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
mr_absinthe Posted March 6, 2008 Share Posted March 6, 2008 Many sites are part of hacker safe program, so my suggestion would be adding ScanAlert (http://www.scanalert.com/bot.jsp) to the list. Just a suggestion... Thanks for keeping it up to date! Quote Absinthe Original Liquor Store Link to comment Share on other sites More sharing options...
stevel Posted March 6, 2008 Author Share Posted March 6, 2008 Ok - thanks. Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
stevel Posted March 8, 2008 Author Share Posted March 8, 2008 (edited) ScanAlert will already be detected as its UA includes the string "/bot". Edited March 8, 2008 by stevel Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
papillon Posted May 3, 2008 Share Posted May 3, 2008 Hi, i did download the updated spiders.txt, but right now it seems a bot is crawling my site: 64.124.148.21 k01.fatlens.com 64.124.148.22 k02.fatlens.com 64.124.148.23 k03.fatlens.com 64.124.148.24 k04.fatlens.com 64.124.148.26 k06.fatlens.com 64.124.148.27 k07.fatlens.com 64.124.148.28 k08.fatlens.com 64.124.148.65 k10.fatlens.com 64.124.148.66 k11.fatlens.com 64.124.148.67 k12.fatlens.com  and its creating sessions... i have searched google for fatlens and it seems its a bot from thefind.com (i did added my site some days ago).  i have tried to find something similar to "fatlens" in spiders.txt but didnt found anything.... what should i do?  thanks Quote Link to comment Share on other sites More sharing options...
stevel Posted May 3, 2008 Author Share Posted May 3, 2008 Show me a line from your access log for one of these. The IP doesn't help.  A Google search suggests that the user agent includes the string "Fatbot" which the spiders.txt string "tbot" should pick up. Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
papillon Posted May 3, 2008 Share Posted May 3, 2008 Show me a line from your access log for one of these. The IP doesn't help. A Google search suggests that the user agent includes the string "Fatbot" which the spiders.txt string "tbot" should pick up.  thanks for the fast reply Steve, sorry im newbie on this and not sure where to look for my access log.... i did install "Visitor Web Stats" and "Who's online enhancement" and thats what im using... but i think you mean something else.. in Visitor Web Stats its just showing as 64.124.148.67 k12.fatlens.com 05/04/2008 00:00:36 1>>> 00:00:00 Guest en-us,en;q=0.5 english Direct  for comparison, another line for the googlebot shows 66.249.72.137 crawl-66-249-72-137.googlebot.com 05/03/2008 07:23:11 36>>> 16:36:46 Guest [Mozilla] english Direct   and in the who's online, its just k01.fatlens.com....  where should i look for the log? has my host to give me access?  thanks and sorry for my english :) Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.