Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

SID Killer


Vger

Recommended Posts

I am testing out a contribution I haven't installed before, and am unclear about one thing in the installation instruction.

 

Where it says to install this

 

global $kill_sid, $HTTP_GET_VARS;

 

it doesn't say to delete this

 

global $request_type, $session_started, $SID;

 

I'm fairly certain you do delete that, but would like clarification.

 

Thanks

 

Vger

Link to comment
Share on other sites

Turning on 'Prevent Spider Sessions' has some major drawbacks. For one thing, some spiders then get locked into what is called the 'Google Dance' where they go around and around in ever decreasing circles. It also means that you have to keep the spiders.txt file up to date all of the time, as new spiders won't be affected by it. It also seems to badly affect the indexing of web sites by search engines - which is unacceptable if you want to get the best rankings you can for your site. No, the SID Killer is, in my opinion, a good solution to the problem of spiders creating sessions.

 

Normally I would prevent that by using a full SSL, and turning on 'Force Cookie Use', but on one site that's not possible because the type of Credit/Debit card module requires that the site generates a session id before the customer goes to the site of the Merchant Bank - else the session id coming back will be one generated by the Bank and won't be recognised by osCommerce - hence no e-mails being sent out, and nothing added to the database.

 

Vger

Link to comment
Share on other sites

Turning on 'Prevent Spider Sessions' has some major drawbacks. For one thing, some spiders then get locked into what is called the 'Google Dance' where they go around and around in ever decreasing circles. It also means that you have to keep the spiders.txt file up to date all of the time, as new spiders won't be affected by it. It also seems to badly affect the indexing of web sites by search engines - which is unacceptable if you want to get the best rankings you can for your site. No, the SID Killer is, in my opinion, a good solution to the problem of spiders creating sessions.

 

Not sure where you got that idea, you've got it backwards. By NOT turning on 'Prevent Spider Sessions' , search engines can get locked into and enless loop. Not the other way around. Preventing spider sessions has absolutly no adverse effects on indexing or parsing your site whatsoever.

 

You do have to maintain a short list of spider user agents, however, realistically, there are only about 20 or so spiders that get trapped by sessions. I haven't updated my list in over a year, and I have no spiders trapped, and enjoy tops rankings for all of my SERPs.

 

The SID killer is fundamentally flawed, because it doesn't catch the sessiosn caused by the 'buy now' links. It also has trouble with SEF urls. I'd recommend not using it. In fact, it should be removed from the contributions section.

-------------------------------------------------------------------------------------------------------------------------

NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit.

If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help.

Link to comment
Share on other sites

You are entitled to your opinion.  I just happen to disagree with it.

 

Vger

 

You can disagree with it all you want. The fact is, it doesn't work, and never did.

 

Ask the author.

-------------------------------------------------------------------------------------------------------------------------

NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit.

If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help.

Link to comment
Share on other sites

Okay, you've had your say, and I'm not going to get into one of these forum discussions where someone says (paraphrased) "You have to agree with me, because I'm right and you're wrong".

 

So, can you now retire gracefully, and let the thread get back to the question that was posed by me initially?

 

Vger

Link to comment
Share on other sites

I've already said you are free to dissent

 

As Ialso said, you should ask the author about that contribution. I worked with him pretty cloesely in it's development, along with several other community members. We all came to the conclusion that trying to prevent spider sessions in tthis fashion would not work. There are a couple of different flaws in the basic premise that are not fixable. I'm sorry to have to be the one to tell you about this. I'm not sure why this contribution is still active.

 

I just don't want the 100's of other unsuspecting osc members out there to get the wrong idea, and think that this is a good method to use to try to get rid of spider sessions. To be clear for all of those new users out there, the 'prevent spider sessions' feature in your admin works great. The SID killer does not.

-------------------------------------------------------------------------------------------------------------------------

NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit.

If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help.

Link to comment
Share on other sites

Just looked at the contribution again, and while there are still flaws wit hthe SID killer in there, the 'buy now links to forms' is a very good contribution, by itself, and I would recommend it.

 

Just don't depend on this SID killer getting rid of all of your spider sids.

-------------------------------------------------------------------------------------------------------------------------

NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit.

If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help.

Link to comment
Share on other sites

OK, I went and dug up the fatal problem with that contribution, for anyone that cares.

 

From the Author himself....

 

The problem seems to be when moving from ssl to nonssl on systems where the ssl domain is different from the nonssl domain.

 

I must admit to not being 100% sure how internally the domain checking works. I would have thought that if you were having problems going from ssl->nonssl you would also have problems going from nonssl->ssl

 

This issue was never address. To this day, people using this contribution will have this problem. If you have a shared SSL, and move from SSL to NONSSL, your cusotmers will lose their session.

 

So, while it may work for you after all, it's not suggested for anyone on a shared SSL. There are also some problems with the 'buy now' links and spiders, which the 'buy now' links to forms doesn't fully address. But I don't think it's as big of a problem.

Edited by wizardsandwars

-------------------------------------------------------------------------------------------------------------------------

NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit.

If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help.

Link to comment
Share on other sites

Thanks for that very instructive and illuminative input. It fits in with what I am trying to do. Let me explain.

 

I have a site with a full SSL, and is IP based. I was using Force Cookie Use, and it was working fine. Then, in place of the PayPal IPN, I installed the HSBC Module for their e-Secure payments system. The way this system works is that if osCommerce does not pass a session id to HSBC then HSBC generates its own, which it passes back to osCommerce. When the transaction was complete it returned to checkout_success, but because osCommerce did not recognise the session id it did not include the information on the sale into the database, and did not fire off any e-mails - so the owner of the site did not even know if a sale had been made. So, I had to turn off Force Cookie Use and revert to sessions.

 

I then turned on 'Prevent Spider Sessions, and immediately in the stats I noticed very large number of entries going to the cookie_usage.php page, and monitored spiders stuck on the site for hours (msnbot, googlebot in particular). This is why I was trying to use the SID Killer with the Buy Now Buttons feature. On my own web server I tested the Sid Killer and it appeared to work fine. However, when I installed the Buy Now Buttons feature (per the install instructions) items stopped being added to the cart.

 

My original post on this related to the Sid Killer, but I'm more interested now in why the Buy Now Buttons feature did not work. But even without this feature, I think that the Sid Killer may be useful for me - as this site has a full ssl. It is very useful to know that it does not work with a shared ssl.

 

Thanks for the input again - Vger

Link to comment
Share on other sites

Well, I suppose that it could very well be usefull for you.

 

However, I'm afraid that what you had witnessed was a little misleading. Setting 'Prevent Spider Sessions' will not cause spiders to get stuck on your 'cookie_usage.php' page.

 

What causes that is a very well documented issue that occurs when you set 'Force Cookie Use' to true. If needed, I can point you to the thread that fully discusses that issue.

 

What probably happend was that while you had the 'Force Cookie Usage' set to true, those bots 'harvested' the cookie_usage.php' pages (which is typical SE spider behavior) and were revisiting them at a later time.

 

The explaination must be something along those line because 'Prevent Spider Sessions' set to ture can in no way cause spiders to visit the 'cookie_usage' page.

 

As far as the 'sid killer' contribution, the other MAJOR issue with it was that sessions would be dropped when customers use the 'buy now' links. Converting the buy now links to forms was 'supposed' to fix that issue, but as far as I know, it never actually did. The last time I tried it, using a buy now link would still cause the session to be dropped.

 

So, as it is currently constructed, you can use the 'sid killer' but you have to have your own SSL, and you can't use any of the 'Buy Now' links.

-------------------------------------------------------------------------------------------------------------------------

NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit.

If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help.

Link to comment
Share on other sites

The explaination must be something along those line because 'Prevent Spider Sessions' set to ture can in no way cause spiders to visit the 'cookie_usage' page.

 

Yes, if there were just a lot of entries directly for cookie_usage I would go with that. Unfortunately every entry for cookie_usage.php was preceeded by a visit to one page (could be any page on the site). I have Urchin Stats which tracks the progression from page to page on each visit, and this is how I know what was happening.

 

So, the situation was that spiders were visiting one page (checked the ip addresses), and getting redirected to cookie_usage.php for the next page.

 

Oh well, maybe this is just one of those things that will never be fully figured out. There are so many variables - type of server, installs on server, mods of installs, patches etc.

 

I may try to figure this out on another server (Unix this time not Linux), and see if the same thing happens.

 

Vger

Link to comment
Share on other sites

Yes, if there were just a lot of entries directly for cookie_usage I would go with that. Unfortunately every entry for cookie_usage.php was preceeded by a visit to one page (could be any page on the site). I have Urchin Stats which tracks the progression from page to page on each visit, and this is how I know what was happening.

 

Right, but those could still bepreviously harvested urls. We see this all the time. Often, it's even a little hard to discern when your spider killer is working because if it harvested URLS with SIDs in them BEFORE the 'Prevent Spider Sessions' was set to true. Afterwards, sometimes MONTHS afterwards, you'll still see the bot trying to access the the URLs it harvested earlier with the URLs in it. Typical spider behavior.

 

There's just no possible way that 'Prevent Spider Sessions' would cause a spider to be redirected to the 'cookie usage' page. The logic is just not there to cause that. It can cause a spider to lose a session (or not), but a spider without a SID will never be re-directed to cookie_usage unless the 'Force Cookie Usage' is set to ture.

 

If the bot was being redirected live, then it was because 'Force Cookie Usage' was either still enabled, or the logic that forces it was still in place. The latter could be caused by a mis-installed contribution or something.

-------------------------------------------------------------------------------------------------------------------------

NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit.

If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help.

Link to comment
Share on other sites

  • 1 month later...
  • 3 weeks later...

Hi,

 

Right, but those could still be previously harvested urls. We see this all the time. Often, it's even a little hard to discern when your spider killer is working because if it harvested URLS with SIDs in them BEFORE the 'Prevent Spider Sessions' was set to true. Afterwards, sometimes MONTHS afterwards, you'll still see the bot trying to access the the URLs it harvested earlier with the URLs in it. Typical spider behavior.

 

We were noticing session id's from Yahoo and msnbot and as the site is a snapshot 'somewhere' between MS-1 and MS-2, of course can't use the "Admin/Prevent Spiders,etc". These session id's were being noticed even with the SID Killer installed, and we therefore uninstalled it, and installed the "Spider Killer" instead. After testing on various 'spider simulators', forcing 'slurp' and 'msnbot' as the agents, there were absolutely no session id's, so we assumed the "Spider Killer" was going to do the job correctly.

 

That was a few weeks ago, and now we see session id's once again appearing in the server logs, despite the fact that I went to the spider simulators again, and tested several URL's , all resulted in no session id's.

 

Therefore, even though this is of some concern of course, it was of 'some relief' to read your post about "previously harvested urls", and that is all I can put it down to at present.

 

It seems there is little more we can do. :(

 

Peter

Link to comment
Share on other sites

  • 2 weeks later...

Hi,

 

Right, but those could still bepreviously harvested urls. We see this all the time. Often, it's even a little hard to discern when your spider killer is working because if it harvested URLS with SIDs in them BEFORE the 'Prevent Spider Sessions' was set to true. Afterwards, sometimes MONTHS afterwards, you'll still see the bot trying to access the the URLs it harvested earlier with the URLs in it. Typical spider behavior.

 

I'm glad you posted this, because we finally realised what was happening, exactly as you have explained.

 

With the help of some 'mod_rewrite gurus', I have been testing some code to go in .htaccess, the only way to _really_ force the url to be re-written without 'osCsid' in it.

 

Looks good so far, another week or so of checking the server logs, and I will post the code.

 

Do you think this should be a contribution ? Even with using either of the two 'spider killer' contributions, or with MS-2 and having the "Prevent Spider Sessions" set to true, if your site has been spidered beforehand, those session ID's are almost impossible to 'extract' out of search engine results.

 

Peter

Link to comment
Share on other sites

... those session ID's are almost impossible to 'extract' out of search engine results.

 

Peter

 

 

Any luck on extracting them? I started killing sessions about 9 months ago, but Yahoo! still visits with osC ID's it harvested from between 12 and 9 months ago, before I started killing sessions. With a modified "Who's Online" I can see that when Yahoo! visits with an osC ID, that it is an old ID since looking at the visit with the aforementioned modified "Who's Online" displays NO session being stored and thus NO cart being created.

 

So, how do we rid all Yahoo! URL's that have previously harvested osC ID's? Google didn't seem to have a problem dropping them in the time I started killg osC ID's 9 months ago.

 

Thanks,

BD

Link to comment
Share on other sites

Hi,

 

Any luck on extracting them?

 

I left it for a few days (after putting in the mod_rewrite), then checked 5 days of logs. All 3 spiders (msnbot, googlebot and slurp(yahoo) ) had significant activity on the site during that period. Unfortunately, even though there was a decrease in the no. of SID's used by the spiders, yahoo still continued to have 'high volume' of GET's on those old url's. :(

 

Asked a guru on the subject, and a minor bug was fixed, I'm about to look at the logs again, in the next day or so; I'm expecting zero log entries for those 3 spiders where the osCsid is used.

 

Peter

Link to comment
Share on other sites

Ahh,

So you "301" the GET with the unwanted osC ID (redirect it) to the same URL minus the osC ID? Then Yahoo! is smart enough to drop the original GET's URL and replace it in its database with the alternative URL (the one without the osC ID) that was sent with the 301?

 

Is there a check to make sure that the GET was requested by a bot (either by checking the User Agent against the "spiders.txt" file or by known bot IP)?

 

Sounds great. Boy would this help end an unwanted problem,

BD

Link to comment
Share on other sites

Hi,

 

Ahh,

So you "301" the GET with the unwanted osC ID (redirect it) to the same URL minus the osC ID? Then Yahoo! is smart enough to drop the original GET's URL and replace it in its database with the alternative URL (the one without the osC ID) that was sent with the 301?

 

Yes, that is my understanding of it, nice short article here , on the '301' - a permanent move.

 

Is there a check to make sure that the GET was requested by a bot (either by checking the User Agent against the "spiders.txt" file or by known bot IP)?

 

Well, as far as we could tell, there were only 3 spiders that continued to use the old URL's:

 

msnbot

googlebot

slurp(yahoo)

 

so, rather than make the mod_rewrite lengthy and always needing to maintain it, I only added those 3 spiders. The objective is to 'force' them to do a re-index, so only add the bot/spider that is still doing this. The two lines for that part is just:

 

# Skip the next two rewriterules if NOT a spider
RewriteCond %{HTTP_USER_AGENT} !(msnbot|slurp|googlebot) [NC]
RewriteRule .* - [S=2]

 

I personally don't like using NOT's, but I was assured that doing it that way was the most effecient in the long run, in terms of code maintenance,etc. It is just stating the 'condition' that if the bot is NOT msnbot, OR slurp, OR googlebot, then skip the next 2 rules, which do all the "get rid of the osCsid" work.

 

Peter

Link to comment
Share on other sites

Hi,

 

Word up!

If it works for you, please post the full code somewhere so that it can be verified.

 

I wouldn't like to post it, until I'm 100% certain it does work, and do what it is meant to do. The problem (spiders using old URL's) is important enough for it to be made a contrib, even though it's only about 10 lines of Apache mod_rewrite code. :D

 

Sounds like you need it now though, drop me an email if you do, and I'll send it to you.

 

Peter

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...