Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

[Contribution] Spider Session Remover


peterr

Recommended Posts

  • Replies 76
  • Created
  • Last Reply

Top Posters In This Topic

Hi,

 

I already found some entries in Google including session ID's for my shop which isn't even live yet. I wonder however how this could have been as I have Prevent Spider Sessions set to true from the early start ?!

 

Will this have an impact on the performance of the shop ?

 

I tried a google on this:

 

mod_rewrite +server +load

 

lots of interesting articles there, there are certainly a lot of factors involved in determining any impact.

 

Even if there is a (slight) degradation in 'performance' (which I doubt, there are only a few conditions and a few rules), then you may well ask yourself whether you want the session ID's to continue in search engines, or whether you want a slightly slower response time (if there is one ?? ) only for a short period of time.

 

After all, you don't need the mod_rewrite code to be there, once you are 100% certain that all the 'offending' spiders have updated their db and indexes,etc, for your site, .... in effect no more SID's in results, then you can comment out the mod_rewrite code. :D

 

If you then find, for some reason, that you may have missed a spider, or another one has (somehow) got 'osCsid' in theoi results, just add the one name to the code.

 

Peter

Link to comment
Share on other sites

Hi,

I tried a google on this:

 

mod_rewrite +server +load

 

lots of interesting articles there, there are certainly a lot of factors involved in determining any impact.

 

Even if there is a (slight) degradation in 'performance' (which I doubt, there are only a few conditions and a few rules), then you may well ask yourself whether you want the session ID's to continue in search engines, or whether you want a slightly slower response time (if there is one ?? ) only for a short period of time.

 

After all, you don't need the mod_rewrite code to be there, once you are 100% certain that all the 'offending' spiders have updated their db and indexes,etc, for your site, .... in effect no more SID's in results, then you can comment out the mod_rewrite code.  :D

 

If you then find, for some reason, that you may have missed a spider, or another one has (somehow) got 'osCsid' in theoi results, just add the one name to the code.

 

Peter

 

Thanks for the link to this official thread and the explanation !

I'll read up with Google later on but I guess it is more important indeed to get rid of spidered osC session ID's as a 'might be' performance impact for a short or a bit longer while ;)

 

It must be very early for you too or very late in Aussie land (haven't synchronized my time with Australia) :D

Link to comment
Share on other sites

I've just installed your contribution, but I've:

500 Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

 

Please contact the server administrator, [email protected] and inform them of the time the error occurred, and anything you might have done that may have caused the error.

 

More information about this error may be available in the server error log.

 

Thank you for support

Skype: centoasa

Skype: remigioruberto

Link to comment
Share on other sites

I have today installed the contribution as instructed but ketp getting a 403 page error. I then decided to have a play and marked out one of the options called FollowSymLinks. This seemed to do the trick as i can access my site.

 

What i need to know is will this still work. My webroot is at public_html. Is the option a requirement for this to work.

 

# Set some options

Options -Indexes

#Options -FollowSymLinks

 

 

Thanks in advance. By the way this is a fantastic idea.

Edited by djmatrix
Link to comment
Share on other sites

Hi,

 

I've just installed your contribution, but I've:

500 Internal Server Error

 

More information about this error may be available in the server error log.

 

 

Set AllowOverride FileInfo Options at a minumum.

 

Also, examine your web server logs, they will tell you more than the 500 error, usually.

 

Peter

Link to comment
Share on other sites

Hi,

 

I have today installed the contribution as instructed but ketp getting a 403 page error. I then decided to have a play and marked out one of the options called FollowSymLinks. This seemed to do the trick as i can access my site.

 

What i need to know is will this still work. My webroot is at public_html. Is the option a requirement for this to work.

 

# Set some options

Options -Indexes

#Options -FollowSymLinks

Thanks in advance. By the way this is a fantastic idea.

 

I wouldn't comment out Options -FollowSymLinks, because it is needed to enable mod_rewrite.

 

Rather, comment out Options -Indexes, or possibly a fix to that problem is to replace one line to:

 

Options +SymlinksIfOwnerMatch

 

Peter

Link to comment
Share on other sites

I have this installed and it is working well for me in that Yahoo's spider is getting redirected properly. However, I am skeptical that Yahoo, in particular, pays any attention to 301s. When I put up osC a year ago, I added RedirectPermanent lines for the old product pages. Most spiders took the hint - Google in particular - but a year later, Yahoo is still hitting the old pages.

 

Time will tell... I do appreciate your putting this together, as it WILL be effective for some spiders.

Link to comment
Share on other sites

Hi,

 

I installed this .htaccess file(no changes made, except i added a directoryindex), however i still see:

 

http://www.awedeals.com/catalog/mcafee-vir...e63bf462e60a366

 

osCsid=e381bd39e75ef02bae63bf462e60a366

 

is there another way to get rid of this?

 

What was the spider agent name ?

 

Also, remember that there is absolutely nothing anyone can ever do to stop a spider, or another web visitor, from doing a GET with the "osCsid" in the URL. What this contribution does, is removes the 'osCsid' if it is used by one of the agents specified in .htacess, after the spider does the GET.

 

In fact, if you look at your web logs, the first GET will return the 301, and the second GET returns a "200" , and the mod_rewite has removed the osCsid from the URL.

 

Peter

Link to comment
Share on other sites

Can anyone help, firstly i know nothing about this 'sid' business, but i seem to be getting very high hits from USA in my stats and MSNBot - i think it relates to what you are all discussing.

 

I am in the UK - http://www.pleasurezone.uk.com

 

Just tried to install SID Killer v1.2 but ended up with completely blank page when re-loaded.

 

I tried the direct replacement of .htaccess file for the mod and when didn?t work I tried my existing htaccess contents WITH the mod contents of htaccess.

 

Whatever I tried, I got blank screen, when loading my site.

 

I take it that all you do is load the file - .htaccess, as it is, without altering it?

 

Here is what i put in .htaccess:

 

 

# $Id: .htaccess,v 1.3 2003/06/12 10:53:20 hpdl Exp $

 

# Set some options

Options -Indexes

Options FollowSymLinks

 

RewriteEngine on

RewriteBase /

#

# Skip the next two rewriterules if NOT a spider

RewriteCond %{HTTP_USER_AGENT} !(msnbot|slurp|googlebot) [NC]

RewriteRule .* - [s=2]

#

# case: leading and trailing parameters

RewriteCond %{QUERY_STRING} ^(.+)&osCsid=[0-9a-z]+&(.+)$ [NC]

RewriteRule (.*) $1?%1&%2 [R=301,L]

#

# case: leading-only, trailing-only or no additional parameters

RewriteCond %{QUERY_STRING} ^(.+)&osCsid=[0-9a-z]+$|^osCsid=[0-9a-z]+&?(.*)$ [NC]

RewriteRule (.*) $1?%1 [R=301,L]

 

#

# This is used with Apache WebServers

#

# For this to work, you must include the parameter 'Options' to

# the AllowOverride configuration

#

# Example:

#

# <Directory "/usr/local/apache/htdocs">

# AllowOverride Options

# </Directory>

#

# 'All' with also work. (This configuration is in the

# apache/conf/httpd.conf file)

 

# The following makes adjustments to the SSL protocol for Internet

# Explorer browsers

 

<IfModule mod_setenvif.c>

<IfDefine SSL>

SetEnvIf User-Agent ".*MSIE.*" \

nokeepalive ssl-unclean-shutdown \

downgrade-1.0 force-response-1.0

</IfDefine>

</IfModule>

 

# Fix certain PHP values

# (commented out by default to prevent errors occuring on certain

# servers)

 

#<IfModule mod_php4.c>

# php_value session.use_trans_sid 0

# php_value register_globals 1

#</IfModule>

 

then i tried it WITH my original, which is this:

 

RewriteEngine on

RewriteCond %{HTTP_REFERER} !^http://pleasurezone.uk.com/.*$ [NC]

RewriteCond %{HTTP_REFERER} !^http://pleasurezone.uk.com$ [NC]

RewriteCond %{HTTP_REFERER} !^http://www.pleasurezone.uk.com/.*$ [NC]

RewriteCond %{HTTP_REFERER} !^http://www.pleasurezone.uk.com$ [NC]

RewriteRule .*\.(jpg|jpeg|gif|png|bmp)$ http://www.pleasurezone.uk.com [R,NC]

 

This alone, obviously works, but the mod screwed it up.

 

I have tried to enable Friendly URL's in admin/configuration, but when i do so, it screws up my site. It doesn't load my right boxes and some content, so i have to dis-regard this.

 

I really don't know if i have a spider/search engine problem or if someone is stealing bandwidth by linking to my pictures.

 

My bandwidth has tripled in the last 30 days, using about 350mb per month, now in January, up to 1200mb, and i had to upgrade via my hosting company.

 

I know somethings wrong as although i suddenly had 1000's more hits, i had zero sales!

 

I know its a stupid comment, but i thought the idea of search engine's spidering our sites was a good thing? Also, i have recently enabled 'Forced Cookie' usage - is that a good thing to do, or not.

 

Thank you for your time and patience.

 

Steve

http://www.pleasurezone.uk.com

Link to comment
Share on other sites

Hi,

 

Can anyone help, firstly i know nothing about this 'sid' business, but i seem to be getting very high hits from USA in my stats and MSNBot - i think it relates to what you are all discussing.

 

The number of hits you are getting from any spider/bot/crawler has absolutely nothing to do with what this contribution is designed for.

 

Please read _all_ of the description on this contribution, at:

 

http://www.oscommerce.com/community/contributions,2819

 

especially 'the problem' and 'the solution'.

 

I would advise that you do not need this contribution at all.

 

Just tried to install SID Killer v1.2 but ended up with completely blank page when re-loaded.

 

Why do you need the SID Killer, the osCommerce MS-2 admin setting, to turn sessions off for spiders is sufficient. Just make sure you have the file /spiders.txt updated, especially include 'msnbot' in the list.

 

I tried the direct replacement of .htaccess file for the mod and when didn?t work I tried my existing htaccess contents WITH the mod contents of htaccess.

 

Whatever you have tried or will try with .htaccess, I'd advise that you do not need this contribution, doing so is only complicating your situation.

 

You haven't stated what the problem is ?? I mean the initial problem, exactly why there is a need to use this contribution ?

 

Peter

Link to comment
Share on other sites

  • 3 weeks later...
Hi,

 

Just a quick one to say this is the official support thread for the Spider Session Remover.

 

Peter

 

 

Hi,

Just a quick question from a newbie. I have recently installed the SEO Contribution that states in the install instructions to set "Prevent Spider Sessions" to "false".

 

Now, it seems that if I want to get the most benefit from the Spider Session Remover that I should probably have the Prevent Spider Sessions set to True.

 

Any thoughts on this? How do I manage to accomplish all these things at the same time? Or should I worry about first preventing the spider sessions until the Spider Session Remover does it's trick and then use the SEO Sitemap contribution? Is there a way to use both together?

 

A little confused,

Kathy

Link to comment
Share on other sites

Hi Kathy,

 

1. There is no doubt a thread for the SEO Contribution, so as I don't know anything about that contribution, best to ask _some_ questions there.

 

2. In summary, to answer your questions about what to use and what to set, etc, the Spider Session Remover is only needed as follows:

 

The problem

=========

 

You may use one of the following:

 

* 2-2MS2 "Prevent Spider Sessions" admin feature is set to true.

* SID Killer contribution (http://www.oscommerce.com/community/contributions,952)

* Spider Killer for MS1 contribution (http://www.oscommerce.com/community/contributions,1089)

 

All of these features are very good, and aim to prevent spiders from adding an session ID (osCsid) to the url.

 

However, what if a spider started to crawl your website BEFORE you enabled one of the above features ? What can happen, is that the (previously) harvested URLS with SIDs in them will show as results in search engines. Afterwards, often many months later, you will still see the spider trying to access the the URLs it harvested earlier with the session ID in it.

 

In summary, URL's with sessions ID's were harvested PRIOR to any session disabling, and therefore these URL's are now indexed in search engines, and the spiders continue to re-visit your website using the URL's with the 'osCsid' in them.

 

It is only needed IF search engines have picked up session ID's, and the session ID is appearing in search results.

 

If you have a new site, you have no need for this contribution at all, assuming that you have _some_ means or methods to turn off session ID's for spiders.

 

Peter

Link to comment
Share on other sites

Hi Kathy,

 

1. There is no doubt a thread for the SEO Contribution, so as I don't know anything about that contribution, best to ask _some_ questions there.

 

2. In summary, to answer your questions about what to use and what to set, etc, the Spider Session Remover is only needed as follows:

It is only needed IF search engines have picked up session ID's, and the session ID is appearing in search results.

 

If you have a new site, you have no need for this contribution at all, assuming that you have _some_ means or methods to turn off session ID's for spiders.

 

Peter

 

Hi Again,

 

No, my site is not new, so yes, I have some session ID's that I don't want appearing... hence why I really want to use your contribution.... but I am assuming that I need to use some sort of sid killer , session killer with it, be it the OSC one or not... is this correct?

I have just posted a similar question at the SEO Urls thread to see what is suggested there.

 

I was just hoping that you may already know about any issues between this contribution and some others and I had also wanted to confirm that I need some sort of session killer used along with your contribution.

 

Thanks for your help,

Kathy

Link to comment
Share on other sites

Hi Kathy,

 

No, my site is not new, so yes, I have some session ID's that I don't want appearing... hence why I really want to use your contribution....

 

If the search results for your website are appearing in any search engine WITH the session ID ("osCsid"), then , and only then, would you need this contribution.

 

Don't use it if there are no session ID's appearing in search engine results, as simple as that.

 

.. but I am assuming that I need to use some sort of sid killer , session killer with it, be it the OSC one or not...  is this correct?

 

Yes, you will definitely need to turn off the sessions, for ANY spiders, what method you use is up to you, but this contribution is NOT for that purpose.

 

In summary, you will need to decide what to do, choose one of the following:

 

1. If session ID's, for your website, are appearing in search engine results, then you will need:

 

(i) This contribution

(ii) Plus ONE of these ...........

 

* 2-2MS2 "Prevent Spider Sessions" admin feature is set to true.

* SID Killer contribution (http://www.oscommerce.com/community/contributions,952)

* Spider Killer for MS1 contribution (http://www.oscommerce.com/community/contributions,1089)

 

 

2. If session ID"s, for your website, are NOT appearing in search engine results, then you will need ONE of these:

 

* 2-2MS2 "Prevent Spider Sessions" admin feature is set to true.

* SID Killer contribution (http://www.oscommerce.com/community/contributions,952)

* Spider Killer for MS1 contribution (http://www.oscommerce.com/community/contributions,1089)

 

Peter

Link to comment
Share on other sites

Hi Kathy,

If the search results for your website are appearing in any search engine WITH the session ID ("osCsid"), then , and only then, would you need this contribution.

 

Don't use it if there are no session ID's appearing in search engine results, as simple as that.

Yes, you will definitely need to turn off the sessions, for ANY spiders, what method you use is up to you, but this contribution is NOT for that purpose.

 

In summary, you will need to decide what to do, choose one of the following:

 

1.  If session ID's, for your website, are appearing in search engine results, then you will need:

 

(i) This contribution

(ii) Plus ONE of these ...........

 

* 2-2MS2 "Prevent Spider Sessions" admin feature is set to true.

* SID Killer contribution (http://www.oscommerce.com/community/contributions,952)

* Spider Killer for MS1 contribution (http://www.oscommerce.com/community/contributions,1089)

2. If session ID"s, for your website, are NOT appearing in search engine results, then you will need ONE of these:

 

* 2-2MS2 "Prevent Spider Sessions" admin feature is set to true.

* SID Killer contribution (http://www.oscommerce.com/community/contributions,952)

* Spider Killer for MS1 contribution (http://www.oscommerce.com/community/contributions,1089)

 

Peter

 

 

Hi,

Me again.

 

I have been seeing spiders such as yahoo and msn coming to my site last night and today and I am seeing things in my logs where the spider has a url like /privacy.html?osCsid=43ca0d2a0592ed69f888b4cc20a15e05 and gets the 301 error then goes to /privacy.html but gets a 404 error.... /privacy.html doesn't work only privacy. php. I know that since yesterday or so, MSN has reindexed my site and now only has my index page as a working link....

 

I also have SEO sitemap installed and and thus have both spider session remover and SEO sitemap information in .htaccess. Could there be a problem with my .htaccess file? ... is it a possible conflict with the two contributions (as I think the seo sitemap is responsible for converting .php files to .html ).

 

The spider session remover might not have anything to do with my problem but wanted to check.

 

Here is my .htaccess file. I have the original .htaccess, the seo sitemap and the spider session remover code all in there, so possible I messed something up.

 

 

# $Id: .htaccess,v 1.3 2003/06/12 10:53:20 hpdl Exp $

 

# Set some options

 

RewriteEngine on

RewriteBase /

 

Options +FollowSymLinks

Options -Indexes

 

DirectoryIndex index.php index.html

AddType application/x-httpd-php php php4 php3 html htm

 

#

# Skip the next two rewriterules if NOT a spider

RewriteCond %{HTTP_USER_AGENT} !(msnbot|slurp|googlebot) [NC]

RewriteRule .* - [s=2]

#

# case: leading and trailing parameters

RewriteCond %{QUERY_STRING} ^(.+)&osCsid=[0-9a-z]+&(.+)$ [NC]

RewriteRule (.*) $1?%1&%2 [R=301,L]

#

# case: leading-only, trailing-only or no additional parameters

RewriteCond %{QUERY_STRING} ^(.+)&osCsid=[0-9a-z]+$|^osCsid=[0-9a-z]+&?(.*)$ [NC]

RewriteRule (.*) $1?%1 [R=301,L]

 

 

RewriteRule ^(.*)-p-(.*).html$ product_info.php?products_id=$2&%{QUERY_STRING}

RewriteRule ^(.*)-c-(.*).html$ index.php?cPath=$2&%{QUERY_STRING}

RewriteRule ^(.*)-m-(.*).html$ index.php?manufacturers_id=$2&%{QUERY_STRING}

 

RewriteRule ^sitemap_categories.html$ sitemap_categories.php [L]

RewriteRule ^sitemap_products.html$ sitemap_products.php [L]

RewriteRule ^category_([1-9][0-9]*)_([1-9][0-9]*)_([1-9][0-9]*)_([1-9][0-9]*)_([1-9][0-9]*)_([1-9][0-9]*)\.html$ index.php?cPath=$1_$2_$3_$4_$5_$6 [L]

RewriteRule ^category_([1-9][0-9]*)_([1-9][0-9]*)_([1-9][0-9]*)_([1-9][0-9]*)_([1-9][0-9]*)\.html$ index.php?cPath=$1_$2_$3_$4_$5 [L]

RewriteRule ^category_([1-9][0-9]*)_([1-9][0-9]*)_([1-9][0-9]*)_([1-9][0-9]*)\.html$ index.php?cPath=$1_$2_$3_$4 [L]

RewriteRule ^category_([1-9][0-9]*)_([1-9][0-9]*)_([1-9][0-9]*)\.html$ index.php?cPath=$1_$2_$3 [L]

RewriteRule ^category_([1-9][0-9]*)_([1-9][0-9]*)\.html$ index.php?cPath=$1_$2 [L]

RewriteRule ^category_([1-9][0-9]*)\.html$ index.php?cPath=$1 [L]

RewriteRule ^product_([1-9][0-9]*)\.html$ product_info.php?&products_id=$1 [L]

 

 

#

# This is used with Apache WebServers

#

# For this to work, you must include the parameter 'Options' to

# the AllowOverride configuration

#

# Example:

#

# <Directory "/usr/local/apache/htdocs">

# AllowOverride Options

# </Directory>

#

# 'All' with also work. (This configuration is in the

# apache/conf/httpd.conf file)

 

# The following makes adjustments to the SSL protocol for Internet

# Explorer browsers

 

<IfModule mod_setenvif.c>

<IfDefine SSL>

SetEnvIf User-Agent ".*MSIE.*" \

nokeepalive ssl-unclean-shutdown \

downgrade-1.0 force-response-1.0

</IfDefine>

</IfModule>

 

# Fix certain PHP values

# (commented out by default to prevent errors occuring on certain

# servers)

 

#<IfModule mod_php4.c>

# php_value session.use_trans_sid 0

# php_value register_globals 1

#</IfModule>

 

 

Also, is it possible that the spider session remover could have anything to do with the "Redirection limit for this url exceeded. Unable to load the requested page. This may be caused by cookies that are being blocked" message that I get when click on my shopping cart? I have only done a few things to my site recently, one of them being the spider session remover contribution (and the .htaccess modifications)... so thought that I would ask. Could this problem have something to do with my urls not working properly?

 

Thanks,

Kathy

Link to comment
Share on other sites

Hi,

 

The spider session remover might not have anything to do with my problem but wanted to check.

 

I made it very clear in post #19, that you need to decide if you need this contribution, there were two simple choices, and the user needs to make the choice.

 

I don't know why you would need this contribution ??

 

Please answer this question, 'why do you think you need this contribution' ?

 

Also, please read "The Problem" thoroughly at:

 

http://www.oscommerce.com/community/contributions,2819

 

If you don't have _that_ problem, you don't need this contribution, and will only complicate matters by using contributions you do not need, or do not understand why there is a need for them.

 

........if it ain't broke, ......don't fix it. :D

 

Peter

Link to comment
Share on other sites

Hi,

I made it very clear in post #19, that you need to decide if you need this contribution, there were two simple choices, and the user needs to make the choice.

 

I don't know why you would need this contribution ??

 

Please answer this question, 'why do you think you need this contribution' ?

 

Also, please read "The Problem" thoroughly at:

 

http://www.oscommerce.com/community/contributions,2819

 

If you don't have _that_ problem, you don't need this contribution, and will only complicate matters by using contributions you do not need, or do not understand why there is a need for them.

 

........if it ain't broke, ......don't fix it.    :D

 

Peter

 

 

Hi again,

 

I actually do need this contributiion as my site is an existing one and all the spiders have been listing my site with the sid attached.

 

But if you don't think that your contribution could have anything to do with my problem... then I will probably remove it for now, as I think at least MSN grabbed many of my urls without the sid (the problem is the new urls that is has don't work now... but that must be an issue with another contribution).

 

Thanks anyway,

Kathy

Link to comment
Share on other sites

Hi,

 

I actually do need this contributiion as my site is an existing one and all the spiders have been listing my site with the sid attached.

 

Okay, then that is established, if search engine results (ie.e not the actual spider/crawl, but SE _results_ ) are showing your site with the sid attached, then yes, you do need this contribution. You need some 'method' to force the 301, but after the 301, the spider needs a 200 (okay/found), definitely not a 404.

 

I don't know anything about the SEO sitemap, but you do need to have this functionality for spiders:

 

(i) Recognise the spider

(ii) Don't append the 'osCsid' to the url, if it is a spider/crawler/bot,etc.

 

In regards to the mod_rewrites in the .htaccess, I'm sure it could be narrowed down a lot by more use of wildcards, and I have no idea why anyone would want to turn a URL with ".php" into ".htm" or ".html", it's a myth that it affects SEO. Spiders, bots,etc don't have a preference for 'non PHP' files. :D

 

If you must have all those rules in the .htaccess, you may well be better to cut/paste the code for the 'spider session remover' to the end, because otherwise the "forced 301" will be done before all the other re-writing. From memory, the mod_rewrite code for the '301' does like an 'exit', so if there are either of those 3 spiders, I assume no further processing of rules will be done by Apache ??

 

Peter

Link to comment
Share on other sites

  • 2 weeks later...

This contrib will not instantly change what spiders have already indexed. It should make things better over time, at least for some search engines.

 

For MSN, you probably also need my Updated spiders.txt.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...