Jump to content

Archived

This topic is now archived and is closed to further replies.

humbled

How to remove session ID appended URLs from the search engine index

Recommended Posts

A common scenario is for store owners that were not aware of the "Prevent Spider Sessions" option to have several URLs indexed by spiders with the session ID appended. This situation is troublesome and there are a few options to handle referrals sent through the "wild" session ID URL.

 

However, the true solution to the problem is to REMOVE THE SESSION ID's from the search engine index! So, how hard is it? Pretty easy!

 

In includes/application_top.php find this code:

 

// include the language translations
 require(DIR_WS_LANGUAGES . $language . '.php');

 

Under that paste this code:

 

 if ( $spider_flag == true ){
 if ( eregi(tep_session_name(), $_SERVER['REQUEST_URI']) ){
  $location = tep_href_link(basename($_SERVER['SCRIPT_NAME']), tep_get_all_get_params(array(tep_session_name())), 'NONSSL', false);
  header("HTTP/1.0 301 Moved Permanently"); 
  header("Location: $location"); // redirect...bye bye  
 }
}

Share this post


Link to post
Share on other sites

that is not going to help you much as spiders can cache results for very long time. There are other alternatives if you check the contributions.

Share this post


Link to post
Share on other sites

What other alternatives are there in the contribution area?

 

By sending a 301 header it signals the search engines to remove that entry in the index and replace it with the new one.

 

domain.com/index.php?osCsid=xxx => [ 301 header ] => domain/index.php

 

Bobby

Share this post


Link to post
Share on other sites

http://www.oscommerce.com/community/contributions,952

http://www.oscommerce.com/community/contributions,2819

 

and there are threads in the tips and tricks forum. I personally use this one

http://www.oscommerce.com/community/contributions,4112

because it alters the session during login/create account so even if a store has backlinks with the session appended a new session is created. At the top of everything else I redirect spiders to the cookies usage page if I detect they try to access any pages that goes to SSL.

 

There is a stiff competition with search engines so they keep the cache for very long time. You have more content you are more popular you see.

 

also the request_uri could be manipulated or you may have visitors who already stored the url with the favorites. It's also subject to the server your store is running on.

Share this post


Link to post
Share on other sites

Fair enough...but what if the store is hosted on a Windows (IIS) server? Would the htaccess method still be viable? The answer is no.

 

Allow me to address each of your mentioned contributions.

 

SID Killer

It is urban legend that this contribution is even needed anymore. It has its roots back in the MS1 days (yes, don't let the newly registered account fool you...I've been around since TEP). Back then the application had issues with spider sessions but the MS2 code is reliable as long as you keep the spiders.txt file updated (thanks to Steve).

 

Spider Session Remover

Although this is a nice modification it does not work cross OS. Further, if there are restrictions with AllowOverride it may not work at all. However, I will concede that most NIX servers can use this option and is most directly addressing the issue.

 

Session Regeneration

Nice coding and great implementation...but why not just use

session_regenerate_id()?

 

I understand that SE's keep a cache copy of the URL however that is what 301 headers are for :)

 

REFERENCE: RFC 2616 Status Code Definitions

The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs. Clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references...

 

With respect to your comment of manipulating SERVER global vars...please provide an example for an exploit that is capable of manipulating this global.

 

With respect to customers or non-spider visitors the code would not affect them as the very first conditional verifies that it is a bot.

Share this post


Link to post
Share on other sites

unfortunately Google views this as cloaking and bans your site because you are returning output to a spider that is different to the output that the user sees. Believe me I am talking from experience here. The instructions in the links below will get you kicked out of google.

 

http://www.oscommerce.com/community/contributions,952

http://www.oscommerce.com/community/contributions,2819

 

and there are threads in the tips and tricks forum. I personally use this one

http://www.oscommerce.com/community/contributions,4112

because it alters the session during login/create account so even if a store has backlinks with the session appended a new session is created. At the top of everything else I redirect spiders to the cookies usage page if I detect they try to access any pages that goes to SSL.

 

There is a stiff competition with search engines so they keep the cache for very long time. You have more content you are more popular you see.

 

also the request_uri could be manipulated or you may have visitors who already stored the url with the favorites. It's also subject to the server your store is running on.

Share this post


Link to post
Share on other sites
unfortunately Google views this as cloaking and bans your site because you are returning output to a spider that is different to the output that the user sees. Believe me I am talking from experience here. The instructions in the links below will get you kicked out of google.

 

nonsense.


Treasurer MFC

Share this post


Link to post
Share on other sites
nonsense.

 

I'm not convinced :-" - I don't think the concern is nonsense - not that I know a lot, but I think this is a big issue for how to handle such things in a search engine friendly way and if anything, we need to look at what they would say about it and what would be the best way to go...

 

can anyone provide a referance, has anyone asked Matt Cutts or similar?

Share this post


Link to post
Share on other sites
I'm not convinced :-" - I don't think the concern is nonsense - not that I know a lot, but I think this is a big issue for how to handle such things in a search engine friendly way and if anything, we need to look at what they would say about it and what would be the best way to go...

 

can anyone provide a referance, has anyone asked Matt Cutts or similar?

 

If you were talking from a little more experience you should know that these facilities are there to prevent spiders from obtaining sessions and removing already indexed links with session id's from SE indexes.

Both are a must and both have absolutely nothing to do with providing altered content specifically targeted at search engines.

 

 

has anyone asked Matt Cutts or similar

suggest you take this on your plate and get back to us all.


Treasurer MFC

Share this post


Link to post
Share on other sites
unfortunately Google views this as cloaking and bans your site because you are returning output to a spider that is different to the output that the user sees. Believe me I am talking from experience here. The instructions in the links below will get you kicked out of google.

I don't think so, but if you think it will help you go ahead and compromise your customer's private info by exposing sessions to search engines. See what happens.

Share this post


Link to post
Share on other sites

Ok im gettin mixed reports on this could someone kindly tell what one has to do to get rid of the osCids already listed and 2) keep them from listing future ones.

 

As far as coding goes I have this in place that boxtel was kind enough to share with us on this thread: http://forums.oscommerce.com/index.php?sho...&st=0

 

Is this everything I need or do I need a contrib of some sort???

 

Thanks,

Eric


Eric

 

Keep up on osCommerce changes and updates at

Github | Understand osCommerce a little further at OsCommerce Documentation | Copy and paste your error message in Google add "in osCommerce" at the end to get relevant answers to most issues.

Share this post


Link to post
Share on other sites

So does this work, do I need anything else to go along with this to get the already indexed links un-listed in the SEs?

 

I also need something for a redirect so to speak for when the links from the SE is clicked it gives a customer a new id#

 

Anyone??

 

Thanks,

Eric


Eric

 

Keep up on osCommerce changes and updates at

Github | Understand osCommerce a little further at OsCommerce Documentation | Copy and paste your error message in Google add "in osCommerce" at the end to get relevant answers to most issues.

Share this post


Link to post
Share on other sites
So does this work, do I need anything else to go along with this to get the already indexed links un-listed in the SEs?

 

I also need something for a redirect so to speak for when the links from the SE is clicked it gives a customer a new id#

 

Anyone??

 

Thanks,

Eric

 

 

////////////////////////////////////////////

 

Hi Eric, Use this in your .htaccess

 

# Skip the next two rewriterules if NOT a spider

RewriteCond %{HTTP_USER_AGENT}!(msnbot?slurp?googlebot) [NC]

RewriteRule .* - [s=2]

 

# case: leading and trailing parameters

RewriteCond %{QUERY_STRING} ^(.+)&osCSid=[0-9a-z]+&(.+)$ [NC]

RewriteRule (.*) $1?%1&%2 [R=301,L]

#

# case: leading-only, trailing-only or no additional parameters

RewriteCond %{QUERY_STRING} ^(.+)&osCSid=[0-9a-z]+$?^osCSid=[0-9a-z]+&?(.*)$ [NC]

RewriteRule (.*) $1?%1 [R=301,L]

Share this post


Link to post
Share on other sites

At what point in time would one remove this code from their application_top?

In includes/application_top.php find this code:

 

CODE

// include the language translations

require(DIR_WS_LANGUAGES . $language . '.php');

 

 

Under that paste this code:

 

CODE

if ( $spider_flag == true ){

if ( eregi(tep_session_name(), $_SERVER['REQUEST_URI']) ){

$location = tep_href_link(basename($_SERVER['SCRIPT_NAME']), tep_get_all_get_params(array(tep_session_name())), 'NONSSL', false);

header("HTTP/1.0 301 Moved Permanently");

header("Location: $location"); // redirect...bye bye

}

}

 

I added this way back when I first set up the site. Should I remove it now? I dont have any pages indexed with the oscid any longer.

Share this post


Link to post
Share on other sites

A common scenario is for store owners that were not aware of the "Prevent Spider Sessions" option to have several URLs indexed by spiders with the session ID appended. This situation is troublesome and there are a few options to handle referrals sent through the "wild" session ID URL.

 

However, the true solution to the problem is to REMOVE THE SESSION ID's from the search engine index! So, how hard is it? Pretty easy!

 

In includes/application_top.php find this code:

 

// include the language translations
 require(DIR_WS_LANGUAGES . $language . '.php');

 

Under that paste this code:

 

 if ( $spider_flag == true ){
 if ( eregi(tep_session_name(), $_SERVER['REQUEST_URI']) ){
  $location = tep_href_link(basename($_SERVER['SCRIPT_NAME']), tep_get_all_get_params(array(tep_session_name())), 'NONSSL', false);
  header("HTTP/1.0 301 Moved Permanently"); 
  header("Location: $location"); // redirect...bye bye  
 }
}

 

Can someone confirm whether this actually works?

Share this post


Link to post
Share on other sites

It would be far easier, I think just to add this to your robots.txt file and more can be found on google search for robots.txt

 

User Agent:

Disallow: /osCid.*

 

The .* is a wild card for everything

 

Also see above post as well


Eric

 

Keep up on osCommerce changes and updates at

Github | Understand osCommerce a little further at OsCommerce Documentation | Copy and paste your error message in Google add "in osCommerce" at the end to get relevant answers to most issues.

Share this post


Link to post
Share on other sites

Unfortunately it's not that easy. Your code will block the search engines from accessing a file named osCid.* in the root of the site.

 

Regards

Jim


See my profile for a list of my addons and ways to get support.

Share this post


Link to post
Share on other sites

Just sign up at Google webmaster central and add in your site and then add osCid to the "ignore" list

 

Note: this will affect Google search listings only

Share this post


Link to post
Share on other sites

Just sign up at Google webmaster central and add in your site and then add osCid to the "ignore" list

 

Note: this will affect Google search listings only

 

Is there something (a setting, contribution, code change, etc...) that will remove the oscsid from the url's?

 

I have been looking for an answer to this and all I find is confusing post's back and forth about whether post "A" will work or whether "Post "B" is better or if Post "A" will get you banned from Google.

 

In admin I have admin>Configuration>Session setup so "Prevent Spider Sessions" and "Recreate Sessions" are set to true. But I still have the oscsid at the end of my urls.

 

*Edit*

 

I just looked at the url's that google has for my items (in my google merchant account) and they show the url fine (without the oscsid at the end of the url google has listed).

 

But when I browse my site I see the oscsid in my url's.

 

There is a way to get rid of the oscsid in the actual url's, right? I don't know if google is ignoring them or what, but my concern is that there still appear when browsing the site.

Share this post


Link to post
Share on other sites

You just need to setup your configure file correctly: http://forums.oscomm...topic=193738=

 

I was told this in another thread (I think by you) and I have gone through this thread (and just did it again), but I don't see where anything is set wrong

 

Is there a specific line/command I should be focusing on? I just get overwhemled looking at all the similar lines/commands and url info. Maybe I am continually missing something.

Share this post


Link to post
Share on other sites

It depends on what you are trying to accomplish??

 

If you merely want to change the value of osCsid to something like sid or id then that is on line 133 of application_top.php

// set the session name and save path
 tep_session_name('osCsid');  <<<<---------This line here
 tep_session_save_path(SESSION_WRITE_DIRECTORY);

If you want to remove the sid all together use one of the seo url addons

 

I always recomend locking down a site so only I can view it, if I am unable to develop it locally. To do this use this bit in your .htaccess file where your catalogs index.php is located. (I also use this to add another layer of security to the admin folder.)

AuthUserFile /dev/null
AuthGroupFile /dev/null
AuthName "Access Control"
AuthType Basic
order deny,allow
deny from all
# IP address of my 2nd home computer
allow from xxx.xxx.xxx.xxx
# IP addresses of my two work computers
allow from 00.000.000.000

Like I said at the beginning though, its all in what youre trying to accomplish.

 

Hope this helps some


Eric

 

Keep up on osCommerce changes and updates at

Github | Understand osCommerce a little further at OsCommerce Documentation | Copy and paste your error message in Google add "in osCommerce" at the end to get relevant answers to most issues.

Share this post


Link to post
Share on other sites

×