adam777 0 Posted February 11, 2008 ... do i just ftp spiders.txt to my side and replace the default one? is that it? Share this post Link to post Share on other sites
baddog 2 Posted February 11, 2008 ... do i just ftp spiders.txt to my side and replace the default one? is that it? If you're looking at the same contribution I am (http://addons.oscommerce.com/info/2455), it says "A replacement for catalog/includes/spiders.txt - updated with newly seen spiders and optimized for quicker processing. For 2.2-MS2 or later." The readme file is worth looking at as well. To answer your question directly, yes, that's all you have to do. Decide which file you want to use, rename it if you pick the large one, and replace the stock one. Share this post Link to post Share on other sites
adam777 0 Posted February 11, 2008 If you're looking at the same contribution I am (http://addons.oscommerce.com/info/2455), it says "A replacement for catalog/includes/spiders.txt - updated with newly seen spiders and optimized for quicker processing. For 2.2-MS2 or later." The readme file is worth looking at as well. To answer your question directly, yes, that's all you have to do. Decide which file you want to use, rename it if you pick the large one, and replace the stock one. Thanks! Very easy! Share this post Link to post Share on other sites
parksey 0 Posted March 2, 2008 Hi All I am having problems with livebot still getting session id's Googlebot does not but Livebot and msnbot is starting to annoy me. livebot-65-55-210-42.search.live.com 22:55:16 22:55:16 /cookie_usage.php Yes Not Found Name: Guest ID: 0 IP Address: 65.55.210.42 User Agent: msnbot/1.1 (+http://search.msn.com/msnbot.htm) I do have nbot in my spiders.txt but it does not seem to work and clues would be appreciated. Share this post Link to post Share on other sites
stevel 2 Posted March 3, 2008 I do not trust the display you are showing here. Post the entry from your web access log showing the GET of the page from msnbot. Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Share this post Link to post Share on other sites
parksey 0 Posted March 3, 2008 I do not trust the display you are showing here. Post the entry from your web access log showing the GET of the page from msnbot. GET /index.php cPath=42 80 - 65.55.210.37 msnbot/1.1+(+http://search.msn.com/msnbot.htm) 200 0 0 5654 298 GET /shopping_cart.php osCsid=1red00mdgjglncmjijk8rg5ig0 80 - 65.55.210.35 msnbot/1.1+(+http://search.msn.com/msnbot.htm) 200 0 0 4751 331 It keeps getting a session ID and being identified in WHo's Online as a customer not a BOT I also have this one, not sure why it is spidering my site but : GET /index.php - 80 - 208.122.4.142 FreeWebMonitoring+SiteChecker/0.1+(+http://www.freewebmonitoring.com) 200 0 64 343 238 Share this post Link to post Share on other sites
stevel 2 Posted March 3, 2008 Are you sure that msnbot hasn't held onto an old osCsid and is using that to access your site? If the first access has the ID, that is likely. That SiteChecker is probably not spidering your site - it is just looking to see if the site is up. You should not see any access other than index.php. Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Share this post Link to post Share on other sites
parksey 0 Posted March 3, 2008 Are you sure that msnbot hasn't held onto an old osCsid and is using that to access your site? If the first access has the ID, that is likely. That SiteChecker is probably not spidering your site - it is just looking to see if the site is up. You should not see any access other than index.php. That is what I thought at first, but I have never seen msnbot without a session ID. Is their a way I can force a 301 redirect if the page is hit by a bot listed in spiders.txt, which should over time remove any session Id's from the index? You help by the way if greatly appreciated ! Share this post Link to post Share on other sites
stevel 2 Posted March 4, 2008 There's a contrib called "spider session killer" or similar that does this. If you'll give me your store URL I'll test it to see if spiders.txt is properly being used. Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Share this post Link to post Share on other sites
parksey 0 Posted March 4, 2008 (edited) That is what I thought at first, but I have never seen msnbot without a session ID. Is their a way I can force a 301 redirect if the page is hit by a bot listed in spiders.txt, which should over time remove any session Id's from the index? You help by the way if greatly appreciated ! url is http:// shop . calibraweighing . co.uk Edited March 4, 2008 by Jan Zonjee Share this post Link to post Share on other sites
stevel 2 Posted March 5, 2008 My test shows that msnbot does not get assigned a session on new visits. You may need to get rid of the sessions that it has previously indexed with Spider Session Remover. Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Share this post Link to post Share on other sites
parksey 0 Posted March 5, 2008 My test shows that msnbot does not get assigned a session on new visits. You may need to get rid of the sessions that it has previously indexed with Spider Session Remover. Thank you for that, the Mod is an Apache rewrite which IIS does not have the functionality to this out of the box so not much use to me. Share this post Link to post Share on other sites
stevel 2 Posted March 5, 2008 Hmm. Well, you can do the equivalent in PHP by searching the user agent string for msnbot and if you find it and $session_started is true, use the "header" command to do a 301 redirect to the same URL minus the sid. Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Share this post Link to post Share on other sites
parksey 0 Posted March 5, 2008 Hmm. Well, you can do the equivalent in PHP by searching the user agent string for msnbot and if you find it and $session_started is true, use the "header" command to do a 301 redirect to the same URL minus the sid. Hummm what an idea, but even better if we spent some time turning spidrs.txt into an array, then do it, this would speed up the processing time somewhat do you think? Share this post Link to post Share on other sites
stevel 2 Posted March 5, 2008 It gets turned into an array when processed in application_top.php. But what you're implying is that the array will get searched for every connection. As it is now, it gets searched only if there is no sid in the URL (or cookie), and then the only effect is to not start a new session. Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Share this post Link to post Share on other sites
parksey 0 Posted March 5, 2008 It gets turned into an array when processed in application_top.php. But what you're implying is that the array will get searched for every connection. As it is now, it gets searched only if there is no sid in the URL (or cookie), and then the only effect is to not start a new session. Maybe somthing like this, my php is a bit rusty so don't laugh !! <?php if (eregi ('oscsid', $_SERVER['REQUEST_URI'])) { $user_agent = $_SERVER['HTTP_USER_AGENT']; $bots= array("msnbot", "nbot"); if (eregi ($bots, $user_agent)){ header('Status: 301 Moved Permanently'); header('Location: http://www.example.com/newurl.html'); exit(); } } ?> Share this post Link to post Share on other sites
stevel 2 Posted March 5, 2008 You'll want to downcase the user agent, but that's the general idea. Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Share this post Link to post Share on other sites
parksey 0 Posted March 5, 2008 (edited) Maybe somthing like this, my php is a bit rusty so don't laugh !! <?php if (eregi ('oscsid', $_SERVER['REQUEST_URI'])) { $user_agent = $_SERVER['HTTP_USER_AGENT']; $bots= array("msnbot", "nbot"); if (eregi ($bots, $user_agent)){ header('Status: 301 Moved Permanently'); header('Location: http://www.example.com/newurl.html'); exit(); } } ?> or maybe add an & to the spiders.txt then <?PHP if (eregi ('oscsid', $_SERVER['REQUEST_URI'])) { $filename = "spiders.txt"; $handle = fopen($filename, "r"); $contents = fread($handle, filesize($filename)); fclose($handle); $spiders_array = explode("&",$contents); $user_agent = $_SERVER['HTTP_USER_AGENT']; if (eregi ($spiders_array, $user_agent)){ header('Status: 301 Moved Permanently'); header('Location: http://www.example.com/newurl.html'); exit(); } } Edited March 5, 2008 by parksey Share this post Link to post Share on other sites
stevel 2 Posted March 6, 2008 You like eregi, don't you? :D I would generally code the test for the SID as isset($_GET['osCsid']) I don't quite get what you're doing with the &. The "prevent spider sessions" code already creates an array, one element per record in spiders.txt. I think a good compromise would be to test for the GET parameter, because for MOST users, that will be present for only one page and the cookie will take care of the rest. So if the osCsid GET parameter is present, do the spiders.txt search anyway, and if found (and the sid is in the URL), do the 301 redirect. Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Share this post Link to post Share on other sites
mr_absinthe 0 Posted March 6, 2008 Many sites are part of hacker safe program, so my suggestion would be adding ScanAlert (http://www.scanalert.com/bot.jsp) to the list. Just a suggestion... Thanks for keeping it up to date! Absinthe Original Liquor Store Share this post Link to post Share on other sites
stevel 2 Posted March 6, 2008 Ok - thanks. Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Share this post Link to post Share on other sites
stevel 2 Posted March 8, 2008 (edited) ScanAlert will already be detected as its UA includes the string "/bot". Edited March 8, 2008 by stevel Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Share this post Link to post Share on other sites
papillon 0 Posted May 3, 2008 Hi, i did download the updated spiders.txt, but right now it seems a bot is crawling my site: 64.124.148.21 k01.fatlens.com 64.124.148.22 k02.fatlens.com 64.124.148.23 k03.fatlens.com 64.124.148.24 k04.fatlens.com 64.124.148.26 k06.fatlens.com 64.124.148.27 k07.fatlens.com 64.124.148.28 k08.fatlens.com 64.124.148.65 k10.fatlens.com 64.124.148.66 k11.fatlens.com 64.124.148.67 k12.fatlens.com and its creating sessions... i have searched google for fatlens and it seems its a bot from thefind.com (i did added my site some days ago). i have tried to find something similar to "fatlens" in spiders.txt but didnt found anything.... what should i do? thanks Share this post Link to post Share on other sites
stevel 2 Posted May 3, 2008 Show me a line from your access log for one of these. The IP doesn't help. A Google search suggests that the user agent includes the string "Fatbot" which the spiders.txt string "tbot" should pick up. Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Share this post Link to post Share on other sites
papillon 0 Posted May 3, 2008 Show me a line from your access log for one of these. The IP doesn't help. A Google search suggests that the user agent includes the string "Fatbot" which the spiders.txt string "tbot" should pick up. thanks for the fast reply Steve, sorry im newbie on this and not sure where to look for my access log.... i did install "Visitor Web Stats" and "Who's online enhancement" and thats what im using... but i think you mean something else.. in Visitor Web Stats its just showing as 64.124.148.67 k12.fatlens.com 05/04/2008 00:00:36 1>>> 00:00:00 Guest en-us,en;q=0.5 english Direct for comparison, another line for the googlebot shows 66.249.72.137 crawl-66-249-72-137.googlebot.com 05/03/2008 07:23:11 36>>> 16:36:46 Guest [Mozilla] english Direct and in the who's online, its just k01.fatlens.com.... where should i look for the log? has my host to give me access? thanks and sorry for my english :) Share this post Link to post Share on other sites