Jump to content


Corporate Sponsors


Latest News: (loading..)

- - - - -

How to remove session ID appended URLs from the search engine index


23 replies to this topic

#1 humbled

  • Banned
  • 23 posts
  • Real Name:B.E.

Posted 10 July 2006, 19:02

A common scenario is for store owners that were not aware of the "Prevent Spider Sessions" option to have several URLs indexed by spiders with the session ID appended. This situation is troublesome and there are a few options to handle referrals sent through the "wild" session ID URL.

However, the true solution to the problem is to REMOVE THE SESSION ID's from the search engine index! So, how hard is it? Pretty easy!

In includes/application_top.php find this code:

// include the language translations
  require(DIR_WS_LANGUAGES . $language . '.php');

Under that paste this code:

 if ( $spider_flag == true ){
  if ( eregi(tep_session_name(), $_SERVER['REQUEST_URI']) ){
   $location = tep_href_link(basename($_SERVER['SCRIPT_NAME']), tep_get_all_get_params(array(tep_session_name())), 'NONSSL', false);
   header("HTTP/1.0 301 Moved Permanently"); 
   header("Location: $location"); // redirect...bye bye  
  }
 }


#2 enigma1

  • Community Member
  • 8,206 posts
  • Real Name:Mark Samios

Posted 10 July 2006, 21:37

that is not going to help you much as spiders can cache results for very long time. There are other alternatives if you check the contributions.

#3 humbled

  • Banned
  • 23 posts
  • Real Name:B.E.

Posted 10 July 2006, 21:41

What other alternatives are there in the contribution area?

By sending a 301 header it signals the search engines to remove that entry in the index and replace it with the new one.

domain.com/index.php?osCsid=xxx => [ 301 header ] => domain/index.php

Bobby

#4 enigma1

  • Community Member
  • 8,206 posts
  • Real Name:Mark Samios

Posted 10 July 2006, 22:14

http://www.oscommerce.com/community/contributions,952
http://www.oscommerce.com/community/contributions,2819

and there are threads in the tips and tricks forum. I personally use this one
http://www.oscommerce.com/community/contributions,4112
because it alters the session during login/create account so even if a store has backlinks with the session appended a new session is created. At the top of everything else I redirect spiders to the cookies usage page if I detect they try to access any pages that goes to SSL.

There is a stiff competition with search engines so they keep the cache for very long time. You have more content you are more popular you see.

also the request_uri could be manipulated or you may have visitors who already stored the url with the favorites. It's also subject to the server your store is running on.

Edited by enigma1, 10 July 2006, 22:17.


#5 humbled

  • Banned
  • 23 posts
  • Real Name:B.E.

Posted 10 July 2006, 22:39

Fair enough...but what if the store is hosted on a Windows (IIS) server? Would the htaccess method still be viable? The answer is no.

Allow me to address each of your mentioned contributions.

SID Killer
It is urban legend that this contribution is even needed anymore. It has its roots back in the MS1 days (yes, don't let the newly registered account fool you...I've been around since TEP). Back then the application had issues with spider sessions but the MS2 code is reliable as long as you keep the spiders.txt file updated (thanks to Steve).

Spider Session Remover
Although this is a nice modification it does not work cross OS. Further, if there are restrictions with AllowOverride it may not work at all. However, I will concede that most NIX servers can use this option and is most directly addressing the issue.

Session Regeneration
Nice coding and great implementation...but why not just use
session_regenerate_id()?

I understand that SE's keep a cache copy of the URL however that is what 301 headers are for :)

REFERENCE: RFC 2616 Status Code Definitions

RFC Specification, on section 10.3.2, said:

The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs. Clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references...

With respect to your comment of manipulating SERVER global vars...please provide an example for an exploit that is capable of manipulating this global.

With respect to customers or non-spider visitors the code would not affect them as the very first conditional verifies that it is a bot.

#6 skylla

  • Community Member
  • 274 posts
  • Real Name:Sjaak
  • Location:Edinburgh, Scotland

Posted 27 July 2006, 20:10

Elvis!

#7 Chance

  • Community Member
  • 566 posts
  • Real Name:Jason Chance
  • Location:Atlanta, GA, USA

Posted 28 July 2006, 17:11

Moved to Tips and Tricks forum.
My advice comes in two flavors- Pick the one that won't offend you.

Hard and Cynical: How to Make a Horrible osCommerce Site

Warm and Fuzzy: How to Make an Awesome osCommerce Site

#8 pedrovitch

  • Community Member
  • 1 posts
  • Real Name:Peter Lorimer

Posted 22 August 2006, 10:39

unfortunately Google views this as cloaking and bans your site because you are returning output to a spider that is different to the output that the user sees. Believe me I am talking from experience here. The instructions in the links below will get you kicked out of google.

View Postenigma1, on Jul 10 2006, 11:14 PM, said:

http://www.oscommerce.com/community/contributions,952
http://www.oscommerce.com/community/contributions,2819

and there are threads in the tips and tricks forum. I personally use this one
http://www.oscommerce.com/community/contributions,4112
because it alters the session during login/create account so even if a store has backlinks with the session appended a new session is created. At the top of everything else I redirect spiders to the cookies usage page if I detect they try to access any pages that goes to SSL.

There is a stiff competition with search engines so they keep the cache for very long time. You have more content you are more popular you see.

also the request_uri could be manipulated or you may have visitors who already stored the url with the favorites. It's also subject to the server your store is running on.


#9 boxtel

  • Community Member
  • 5,237 posts
  • Real Name:amanda
  • Location:Taipei, Taiwan

Posted 22 August 2006, 12:32

View Postpedrovitch, on Aug 22 2006, 06:39 PM, said:

unfortunately Google views this as cloaking and bans your site because you are returning output to a spider that is different to the output that the user sees. Believe me I am talking from experience here. The instructions in the links below will get you kicked out of google.

nonsense.
Treasurer MFC

#10 excell

  • Community Member
  • 6 posts
  • Real Name:Excell Ent

Posted 22 August 2006, 13:09

View Postboxtel, on Aug 22 2006, 10:32 PM, said:

nonsense.

I'm not convinced :-" - I don't think the concern is nonsense - not that I know a lot, but I think this is a big issue for how to handle such things in a search engine friendly way and if anything, we need to look at what they would say about it and what would be the best way to go...

can anyone provide a referance, has anyone asked Matt Cutts or similar?

#11 boxtel

  • Community Member
  • 5,237 posts
  • Real Name:amanda
  • Location:Taipei, Taiwan

Posted 22 August 2006, 16:16

View Postexcell, on Aug 22 2006, 09:09 PM, said:

I'm not convinced :-" - I don't think the concern is nonsense - not that I know a lot, but I think this is a big issue for how to handle such things in a search engine friendly way and if anything, we need to look at what they would say about it and what would be the best way to go...

can anyone provide a referance, has anyone asked Matt Cutts or similar?

If you were talking from a little more experience you should know that these facilities are there to prevent spiders from obtaining sessions and removing already indexed links with session id's from SE indexes.
Both are a must and both have absolutely nothing to do with providing altered content specifically targeted at search engines.


Quote

has anyone asked Matt Cutts or similar
suggest you take this on your plate and get back to us all.
Treasurer MFC

#12 enigma1

  • Community Member
  • 8,206 posts
  • Real Name:Mark Samios

Posted 24 August 2006, 00:59

View Postpedrovitch, on Aug 22 2006, 05:39 AM, said:

unfortunately Google views this as cloaking and bans your site because you are returning output to a spider that is different to the output that the user sees. Believe me I am talking from experience here. The instructions in the links below will get you kicked out of google.
I don't think so, but if you think it will help you go ahead and compromise your customer's private info by exposing sessions to search engines. See what happens.

#13 matrix2223

  • Community Member
  • 859 posts
  • Real Name:Eric
  • Gender:Male
  • Location:MD USA

Posted 01 September 2006, 22:21

Ok im gettin mixed reports on this could someone kindly tell what one has to do to get rid of the osCids already listed and 2) keep them from listing future ones.

As far as coding goes I have this in place that boxtel was kind enough to share with us on this thread: http://forums.oscommerce.com/index.php?sho...&st=0&#

Is this everything I need or do I need a contrib of some sort???

Thanks,
Eric
Eric

Keep up on osCommerce changes and updates at Github | Understand osCommerce a little further at OsCommerce Documentation | Copy and paste your error message in Google add "in osCommerce" at the end to get relevant answers to most issues.


#14 matrix2223

  • Community Member
  • 859 posts
  • Real Name:Eric
  • Gender:Male
  • Location:MD USA

Posted 07 September 2006, 00:35

So does this work, do I need anything else to go along with this to get the already indexed links un-listed in the SEs?

I also need something for a redirect so to speak for when the links from the SE is clicked it gives a customer a new id#

Anyone??

Thanks,
Eric
Eric

Keep up on osCommerce changes and updates at Github | Understand osCommerce a little further at OsCommerce Documentation | Copy and paste your error message in Google add "in osCommerce" at the end to get relevant answers to most issues.


#15 Debs

  • Community Member
  • 133 posts
  • Real Name:Debs
  • Gender:Female
  • Location:Fargo, ND UNITED STATES

Posted 23 September 2006, 18:15

View Postmatrix2223, on Sep 7 2006, 12:35 AM, said:

So does this work, do I need anything else to go along with this to get the already indexed links un-listed in the SEs?

I also need something for a redirect so to speak for when the links from the SE is clicked it gives a customer a new id#

Anyone??

Thanks,
Eric


////////////////////////////////////////////

Hi Eric, Use this in your .htaccess

# Skip the next two rewriterules if NOT a spider
RewriteCond %{HTTP_USER_AGENT}!(msnbot¦slurp¦googlebot) [NC]
RewriteRule .* - [S=2]

# case: leading and trailing parameters
RewriteCond %{QUERY_STRING} ^(.+)&osCSid=[0-9a-z]+&(.+)$ [NC]
RewriteRule (.*) $1?%1&%2 [R=301,L]
#
# case: leading-only, trailing-only or no additional parameters
RewriteCond %{QUERY_STRING} ^(.+)&osCSid=[0-9a-z]+$¦^osCSid=[0-9a-z]+&?(.*)$ [NC]
RewriteRule (.*) $1?%1 [R=301,L]

#16 theantiquestore

  • Community Member
  • 584 posts
  • Real Name:Rachael
  • Gender:Female

Posted 03 June 2007, 22:32

At what point in time would one remove this code from their application_top?

Quote

In includes/application_top.php find this code:

CODE
// include the language translations
require(DIR_WS_LANGUAGES . $language . '.php');


Under that paste this code:

CODE
if ( $spider_flag == true ){
if ( eregi(tep_session_name(), $_SERVER['REQUEST_URI']) ){
$location = tep_href_link(basename($_SERVER['SCRIPT_NAME']), tep_get_all_get_params(array(tep_session_name())), 'NONSSL', false);
header("HTTP/1.0 301 Moved Permanently");
header("Location: $location"); // redirect...bye bye
}
}

I added this way back when I first set up the site. Should I remove it now? I dont have any pages indexed with the oscid any longer.

Edited by theantiquestore, 03 June 2007, 22:32.


#17 troubleshooter2000

  • Community Member
  • 30 posts
  • Real Name:SS
  • Gender:Male

Posted 19 August 2010, 20:54

View Posthumbled, on 10 July 2006, 19:02, said:

A common scenario is for store owners that were not aware of the "Prevent Spider Sessions" option to have several URLs indexed by spiders with the session ID appended. This situation is troublesome and there are a few options to handle referrals sent through the "wild" session ID URL.

However, the true solution to the problem is to REMOVE THE SESSION ID's from the search engine index! So, how hard is it? Pretty easy!

In includes/application_top.php find this code:

// include the language translations
  require(DIR_WS_LANGUAGES . $language . '.php');

Under that paste this code:

 if ( $spider_flag == true ){
  if ( eregi(tep_session_name(), $_SERVER['REQUEST_URI']) ){
   $location = tep_href_link(basename($_SERVER['SCRIPT_NAME']), tep_get_all_get_params(array(tep_session_name())), 'NONSSL', false);
   header("HTTP/1.0 301 Moved Permanently"); 
   header("Location: $location"); // redirect...bye bye  
  }
 }

Can someone confirm whether this actually works?

#18 matrix2223

  • Community Member
  • 859 posts
  • Real Name:Eric
  • Gender:Male
  • Location:MD USA

Posted 16 August 2011, 22:49

It would be far easier, I think just to add this to your robots.txt file and more can be found on google search for robots.txt

User Agent:
Disallow: /osCid.*

The .* is a wild card for everything

Also see above post as well
Eric

Keep up on osCommerce changes and updates at Github | Understand osCommerce a little further at OsCommerce Documentation | Copy and paste your error message in Google add "in osCommerce" at the end to get relevant answers to most issues.


#19 kymation

  • Community Sponsor
  • 5,657 posts
  • Real Name:Jim Keebaugh
  • Gender:Male
  • Location:Aberdeen WA USA

Posted 17 August 2011, 03:07

Unfortunately it's not that easy. Your code will block the search engines from accessing a file named osCid.* in the root of the site.

Regards
Jim
My Addons

Banners Box 2.3.1 Support
Categories Accordion Box 2.3.1 Support
Categories Images Box 2.2x 2.3.1 Support
Closest Shipper 2.2x Support
Document Manager 2.2x Support
Generic Box 2.3.1 Support
Get 1 Free 2.2x Support
Include HTML and Text Boxes 2.2x
jQuery Banner Rotator 2.2x 2.3.1 Support
Modular Front Page 2.3.1 Support
Modular SEO Header Tags 2.3.1 Support
More Pics 2.2x Support
MVS 2.2x Support
osC Catalog 2.2x Support
PDF Datasheet 2.3.1 Support
Price Updater 2.2x
Products Specifications 2.2x 2.3.1 Development Version Support Bugs/Suggestions
Request a Review 2.2x - 2.3.1 Support
Similar Products Box 2.2x
Theme Switcher 2.3.1 Support

#20 toyicebear

  • Community Sponsor
  • 5,692 posts
  • Real Name:Nick
  • Location:World Citizen

Posted 17 August 2011, 03:20

Just sign up at Google webmaster central and add in your site and then add osCid to the "ignore" list

Note: this will affect Google search listings only

Edited by toyicebear, 17 August 2011, 03:21.

Basics for osC 2.2 Design - Basics for Design V2.3+ - Seo & Sef Url's - Meta Tags for Your osC Shop - Steps to prevent Fraud... - MS3 and Team News... - SEO, Meta Tags, SEF Urls and osCommerce

Check out my profile [click here] for information on professional services, custom coding, templates, SEO optimization, modifications, commercial support and help.