Jump to content
Sign in to follow this  
Ian

[Contribution] Googlebot/Spider session id killer

Recommended Posts

does

 

"if (basename($_SERVER['HTTP_REFERER']) == 'allprods.php' ) $kill_sid = true;

if ( ( !tep_session_is_registered('customer_id') ) && ( $cart->count_contents()==0 ) && (!tep_session_is_registered('kill_sid') ) ) $kill_sid = true;

if (basename($PHP_SELF) == FILENAME_LOGIN ) $kill_sid = false;"

 

replace

 

if ( ( !tep_session_is_registered('customer_id') ) && ( $cart->count_contents()==0 ) && (!tep_session_is_registered($kill_sid) ) ) $kill_sid = true;

 

?

Share this post


Link to post
Share on other sites

DO i have to do changes in html_output.php Because everytime i do changes in this file i get error as below

Parse error: parse error, expecting `')'' in /home/quickly/public_html/shop/includes/functions/html_output.php on line 269

But when i checked the code i do not see any problem.


I can't type a trick like i can type.

Share this post


Link to post
Share on other sites

I use phpbb, and recently I made this change to my forum:

 

#-----[ OPEN  ]------------------------------------------ 

includes/sessions.php 



#-----[ FIND ]------------------------------------------ 

  global $SID; 



  if ( !empty($SID) && !eregi('sid=', $url) ) 



#-----[ REPLACE WITH ]------------------------------------------ 

  global $SID, $HTTP_SERVER_VARS; 



  if ( !empty($SID) && !eregi('sid=', $url) && !strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'Googlebot') && !strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'slurp@inktomi.com;'))

 

Since I'm new to oscommerce, and haven't dabbed into the code yet, but this has enabled my entire forum to get crawled when it couldn't before.

 

Maybe this will apply here.

Share this post


Link to post
Share on other sites

hi..

I?d like to make all these Session ID changes suggested by Ian but I?m working with cvs 2.1 instead of 2.2. . Is there a equivalent file for html_output.php?, because there isn?t such a file in 2.1 . Unfortunately 2.1 has to stay now like it is, because the shop is now finished and functioning.

 

I really appreciate any help on this

 

Thanks

Miguel

Share this post


Link to post
Share on other sites

Hi,

 

I'm looking around this thread and the contributin section and I can't find a download. Is it possible that the different changes made in this thread could be combined in one download?

 

fo far an a happy new year

Dominik


Maintainer German Banktransfer 0.92

The answer to the great question of life, the universe and everything is 42 (Douglas Adams)

Share this post


Link to post
Share on other sites
ok, but where does $HTTP_SERVER_VARS get assigned a value ?

 

:shock:

 

OK, $HTTP_SERVER_VARS is a global PHP variable and is set by PHP ! Is it that silly questions never get answered on this BB ? Ah well, I'm just learning PHP - now ask me any traditionally Windows based platform development question :wink:

 

ps this thread is about ALLPRODs contribution and Ian' changes (in work) to kill a session id so that googglebot doesn't treat each page as different and unecessarily increase data traffic (and therefore hosting costs).

 

I believe the answer is to point the search engine at allprods.php and suppress every other page (except perhaps further advert pages) with robots.txt. I shall try this out in the New year and report back.

Share this post


Link to post
Share on other sites
ok, but where does $HTTP_SERVER_VARS get assigned a value ?

 

:shock:

 

OK, $HTTP_SERVER_VARS is a global PHP variable and is set by PHP ! Is it that silly questions never get answered on this BB ? Ah well, I'm just learning PHP - now ask me any traditionally Windows based platform development question :wink:

 

ps this thread is about ALLPRODs contribution and Ian' changes (in work) to kill a session id so that googglebot doesn't treat each page as different and unecessarily increase data traffic (and therefore hosting costs).

 

I believe the answer is to point the search engine at allprods.php and suppress every other page (except perhaps further advert pages) with robots.txt. I shall try this out in the New year and report back.

 

The question was 1. how to ensure all products are SEO indexed and 2. how to suppress irrelevant traffic from session ids.

Share this post


Link to post
Share on other sites

so if I follow here, the only weakness against Ians sid killer here is the "buy now" buttons since they are links???

 

well I have changed the "buy now" buttons from links to forms (and they work quite well), so I should be a stone wall when googlebot or any other bot comes to visit me??


The only thing necessary for evil to flourish is for good men to do nothing

- Edmund Burke

Share this post


Link to post
Share on other sites

my actual site is not online yet... but I have made the changes to a test store.

 

http://www.allthingsnewage.com - hit it hard burt

 

also, I have posted how to make the modifications to change "buy now" buttons to forms (3 very easy and simple steps) in the Another Google thread - sorry


The only thing necessary for evil to flourish is for good men to do nothing

- Edmund Burke

Share this post


Link to post
Share on other sites

Ian,

 

after your code fixed the direct log in problem, it will give a sid to spider on default.php. will this cause the spider trappped ?

 

David

Share this post


Link to post
Share on other sites

The one session id upon the first hit is actually because of breadcrumb.

 

I add in application_top.php

$kill_sid=true;

after

$breadcrumb = new breadcrumb;

 

and it works fine and make the session id go away.

 

This may address brandon's issue. it does generate more sid from 2nd hit.

 

Thanks to sean's post, I tried the spider simulator below,

http://www.searchengineworld.com/cgi-bin/s.../sim_spider.cgi

you can use it to check if there is still sid left and how you page is recognised by spider.

 

burt,

 

if you have time, could you please mimic google again for my site?

 

David

Share this post


Link to post
Share on other sites

@m5n6

 

I try the spider.cgi after adding Ians sessionkiller. The sessionkiller works great but i see that the spider finds no keywords. The Metatags are correct.

 

If i try another html page. The spider.cgi also shows no keaywords. Is this a bug in the script or in my page?

Share this post


Link to post
Share on other sites

Ian-

 

Have you made this as a contribution? I can't find it in the downloads...

 

If not, what is the code needed? I've gone through a couple threads, but there are different parts and I want to make sure that I have it all.

 

Aodhan

Share this post


Link to post
Share on other sites

Hi, I'm a new user of osCommerce and new to your forum! (sorry, if my english isn't the best)

 

I have read this thread and had a new idea of allowing robots without using conflicting with sid's, users or other contributions. I don't have tested it yet, but let's hear what you think about.

 

First, setup a new directory with a second copy of osC and apply the allprods-contribution. This site with all products can be linked from the main page of your shop (mostly visible only or you register this site explicitly on the search engines.

Any other pages of osC must be in robots.txt disallowed.

In the allprods must be used a sid-killer like that from Ian. In the first copy of osC you don't change anything.

 

The first copy of osC is now used from your users and only the second (only the allprods) from the robots.

 

Because the users find your product-info pages in search engines from the second osC you must use there absolute links to the first osC. But Search engines don't follow that links because you have placed all the first osC in robots.txt

 

I hope, I have explained it good. What do you thing about this solution?

 

Greetings,

Dimitrios

Share this post


Link to post
Share on other sites

sorry i have another question

 

The session is killed in the shop. If i go to the admin "Who is online"

I see on every user a session. Is this correct?

 

00:08:39 0 Guest 216.239.46.164 13:25:23 13:25:23 /shop/catalog/index.php/cPath/33/sort/2a/page/1/language/en/osCs

 

00:08:38 0 Guest 216.239.46.118 13:25:24 13:25:24 /shop/catalog/webmaster.php/osCsid/727d633fb2b9b264c14b5be475592

 

00:08:38 0 Guest 216.239.46.87 13:25:24 13:25:24 /shop/catalog/product_info.php/products_id/149/osCsid/04f7d7874c

 

I put the Ians script in the catalog html_output and application_top

Is this correct?

Share this post


Link to post
Share on other sites

I must pain you with my questions once more............

 

Could it be, that the add on for smal,medium,big images produce the sessions? See the file of my error Script

 

The image is on the server and works fine, but the script produce this error.

------------------------------------------------------------------------------

Site: xxx.com (www.xxx.com)

Error Code: 404 Missing URL (File does not exist: /www/virtual/fetish/www.xxx.com/shop/catalog/images/Harness_med.jpg/osCsid/fef23046cf6b292d237a21d0ee68d657)

Occurred: Thu Jan 9 14:44:34 EST 2003

Requested URL: /shop/catalog/images/Harness_med.jpg/osCsid/fef23046cf6b292d237a21d0ee68d657

User Address: 216.239.46.118

User Agent: Googlebot/2.1 (+http://www.googlebot.com/bot.html)

Referer:

------------------------------------------------------------------------------

Share this post


Link to post
Share on other sites

It is indeed disappointing that I am still not able to block session id generation even after applying Ian's mod! Could someone please help me out with my problem?

 

My site under construction is http://shop.amplescope.com

 

(It is a live site, so please don't put your cc number etc.)

 

I checked the session id through http://www.searchengineworld.com/cgi-bin/s.../sim_spider.cgi

 

Sunny

 

PS: BTW, I have also installed the latest addon titled: Changing the "buy now" buttons from links to forms .. is that likely to cause any confilct?


What's the fun if it still looks like osCommerce!

Share this post


Link to post
Share on other sites

Oh, I had misplaced the code for html_output.php ... just moved it one line higher and now it seems perfect!

 

but one link on my default.php (http://shop.amplescope.com/default.php?osCsid=d566f14ac5fcf1faa3ae394f108dab48) still seems to be generating session id ... I guess it must be from the header navigation bar. Perplexed on this.

 

:roll:

 

Sunny


What's the fun if it still looks like osCommerce!

Share this post


Link to post
Share on other sites

I just got it finally in place ... blame it on the splintered posts in this thread : I think I should put in the whole code once again in one go (as contributed by everyone else on this thread ... am just putting the jigs in one place to make the picture whole)

 

Step 1:

 

Add the following code just before the ?> tag at the end of /catalog/includes/application_top.php file:

 

//================================================================ 

if ( ($HTTP_GET_VARS['currency']) ) { 

  tep_session_register('kill_sid'); 

  $kill_sid=false; 

 } 

if ( ($HTTP_GET_VARS['language']) ) { 

 tep_session_register('kill_sid'); 

 $kill_sid = false; 

 } 

if (basename($_SERVER['HTTP_REFERER']) == 'allprods.php' ) $kill_sid = true; 

if ( ( !tep_session_is_registered('customer_id') ) && ( $cart->count_contents()==0 ) && (!tep_session_is_registered('kill_sid') ) ) $kill_sid = true; 

if (basename($PHP_SELF) == FILENAME_LOGIN ) $kill_sid = false; 

//================================================================

Step 2:

 

Find

 

$breadcrumb = new breadcrumb;

 

Add after the above:

 

  $kill_sid=true;

 

Step 3:

 

Now find the function tep_href_link (should be the first one) in /includes/functions/html_output.php

 

After the first line

  function tep_href_link($page = '', $parameters = '', $connection = 'NONSSL', $add_session_id = true, $search_engine_safe = true) {

 

add the line

 

    global $kill_sid;

 

Step 4:

 

With this html_output.php file only:

 

find the lines

 

if ( (isset($sid) ) { 

     $link .= $separator . $sid; 

   }

 

change to:

 

if ( (isset($sid) ) && ( !$kill_sid ) ) { 

$link .= $separator . $sid; 

}

 

Warnings. The code above has been tested on a clean oscommerce installation. The only problem reported has been with 'USE_DEFAULT_LANGUAGE_CURRENCY' set in admin.

 

You may also want to include another mod (download here: http://www.oscommerce.com/contributions/bu...tton_v1.01.zip) that changes buy-now buttons to initiation form actions with *post* action as against the default *get* action to *not generate* session id's.

 

Welll, the story ends here, and the Kings and Queens of osC world may now rest in peace for ever, despite a monthly visit from the dreaded Google Robot!

 

Sunny


What's the fun if it still looks like osCommerce!

Share this post


Link to post
Share on other sites

correction:

 

after line

$breadcrumb = new breadcrumb;

 

please change

 

$kill_sid=true;

 

to

 

if ( ( !tep_session_is_registered('customer_id') ) && ( $cart->count_contents()==0 ) && (!tep_session_is_registered('kill_sid') ) ) {$kill_sid = true;}

 

This will survive the 1st round spider hit.

 

but the login, my account, checkout link on the default page generate sid

on the 2nd round.

 

I have fixed them to forms, and use the simulator to test fine. and I will post the change later.

 

regards

 

david

Share this post


Link to post
Share on other sites

What might be better is to move all of my code

 

[/code]

 

to after

$breadcrumb = new breadcrumb;


Trust me, I'm an Accountant.

Share this post


Link to post
Share on other sites

oops, that should have read

 

What might be better is to move all of my code

 

//================================================================

if ( ($HTTP_GET_VARS['currency']) ) {

  tep_session_register('kill_sid');

  $kill_sid=false;

 }

if ( ($HTTP_GET_VARS['language']) ) {

 tep_session_register('kill_sid');

 $kill_sid = false;

 }

if (basename($_SERVER['HTTP_REFERER']) == 'allprods.php' ) $kill_sid = true;

if ( ( !tep_session_is_registered('customer_id') ) && ( $cart->count_contents()==0 ) && (!tep_session_is_registered('kill_sid') ) ) $kill_sid = true;

if (basename($PHP_SELF) == FILENAME_LOGIN ) $kill_sid = false;

//================================================================

 

to after

 

 $breadcrumb = new breadcrumb;


Trust me, I'm an Accountant.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×