Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Remove & Prevent duplicate content with the canonical tag


spooks

Recommended Posts

Sam's Remove & Prevent duplicate content with the canonical tag

 

This script will dynamically read the query string calling the page, remove any parameters that have/may cause duplicate content issues with search engines and generate a canonical tag with the modified query string. The search engines will effectively interpret this as a 301 redirect, there will be no effect on the page or url for your visitors.

 

  • Produces a correctly formatted canonical tag as per your current url (seo or otherwise).
  • Search engines will modify their index according to the tag, removing any duplicates.
  • Any ranking spread caused by the duplicates will be consolidated onto the correct page.
  • Any un=listed query string param's will remain un-effected, barring ensuring a correctly formatted string.
  • Very simple short code addition
  • Will also optionally remove any duplicate content issues with ssl pages.
  • Any instances of the osCsid param will be removed.
  • Default removed param's are: currency, language, page, sort, ref, affiliate_banner_id & osCsid.

 

If you have an existing install of a meta tag contribution, the install only requires the addition of a single line of code and one added function.

 

 

 

Contribution can be download here

 

 

Enjoy! biggrin.gif

Edited by spooks

Sam

 

Remember, What you think I ment may not be what I thought I ment when I said it.

 

Contributions:

 

Auto Backup your Database, Easy way

 

Multi Images with Fancy Pop-ups, Easy way

 

Products in columns with multi buy etc etc

 

Disable any Category or Product, Easy way

 

Secure & Improve your account pages et al.

Link to comment
Share on other sites

Sam's Remove & Prevent duplicate content with the canonical tag

 

This script will dynamically read the query string calling the page, remove any parameters that have/may cause duplicate content issues with search engines and generate a canonical tag with the modified query string. The search engines will effectively interpret this as a 301 redirect, there will be no effect on the page or url for your visitors.

 

  • Produces a correctly formatted canonical tag as per your current url (seo or otherwise).
  • Search engines will modify their index according to the tag, removing any duplicates.
  • Any ranking spread caused by the duplicates will be consolidated onto the correct page.
  • Any un=listed query string param's will remain un-effected, barring ensuring a correctly formatted string.
  • Very simple short code addition
  • Will also optionally remove any duplicate content issues with ssl pages.
  • Any instances of the osCsid param will be removed.
  • Default removed param's are: currency, language, page, sort, ref, affiliate_banner_id & osCsid.

 

If you have an existing install of a meta tag contribution, the install only requires the addition of a single line of code and one added function.

 

 

 

Contribution can be download here

 

 

Enjoy! biggrin.gif

 

osCommerce could do with canonical tags especially for the "split page" type pages. I must confess that at the time of writing the prevent duplicates thing that I wrote I wasn't aware of these. Will be interesting to see how the bots react to it.

 

There are a few things that I think could be better in the function do you want me to post them?

 

Also the title text for the instructions is for the wrong contib m8 ;)

Link to comment
Share on other sites

There are a few things that I think could be better in the function do you want me to post them?

 

 

Yes, please do.

 

My research showed that they honor the tags generally, though there have been a few cases noted where it has been ignored.

 

The only point where the tag is advised against is where the resulting link would point to a page that has a re-direct in force.

 

 

Sam

 

Remember, What you think I ment may not be what I thought I ment when I said it.

 

Contributions:

 

Auto Backup your Database, Easy way

 

Multi Images with Fancy Pop-ups, Easy way

 

Products in columns with multi buy etc etc

 

Disable any Category or Product, Easy way

 

Secure & Improve your account pages et al.

Link to comment
Share on other sites

Sam just a few bits I'd have done differently, obviously use none of it, part of it or all of it as you wish. This is untested I hasten to add.

 

On the site page ..

 

<?php sams_canonical( $xhtml = true ); ?>

 

the function ..

 

<?php
function sams_canonical( $xhtml = false ) {
 global $request_type;

 $close_tag = ' />';
 if ( false === $xhtml ) {
   $close_tag = '>';
 } 
 $domain = ( $request_type == 'SSL' ? HTTPS_SERVER : HTTP_SERVER );
 // Find the file basename safely = PHP_SELF is unreliable - SCRIPT_NAME can show path to phpcgi
 if ( array_key_exists( 'SCRIPT_NAME', $_SERVER ) && ( substr( basename( $_SERVER['SCRIPT_NAME'] ), -4, 4 ) == '.php' ) ) {
  $basefile = basename( $_SERVER['SCRIPT_NAME'] );
 } elseif ( array_key_exists( 'PHP_SELF', $_SERVER ) && ( substr( basename( $_SERVER['PHP_SELF'] ), -4, 4 ) == '.php' ) ) {
  $basefile = basename( $_SERVER['PHP_SELF'] );
 } else {
   // No base file so we have to return nothing
   return false;
 }
 // Don't produce canonicals for SSL pages that bots shouldn't see
 $ignore_array = array( 'account', 'address', 'checkout', 'login', 'password', 'logoff' );
 if ( in_array( str_replace( '.php', '', $basefile ) , $ignore_array ) ) {
   // Bots shouldn't be here so show no canonical
   return false;
 } 
 // REQUEST_URI usually doesn't exist on Windows servers ( sometimes ORIG_PATH_INFO doesn't either )
 if ( array_key_exists( 'REQUEST_URI', $_SERVER ) ) {
   $request_uri = str_replace( strrchr( $_SERVER['REQUEST_URI'], '?' ), '', $_SERVER['REQUEST_URI'] );
 } elseif( array_key_exists( 'ORIG_PATH_INFO', $_SERVER ) ) {
   $request_uri = str_replace( strrchr( $_SERVER['ORIG_PATH_INFO'], '?' ), '', $_SERVER['ORIG_PATH_INFO'] );
 } else {
   // we need to fail here as we have no REQUEST_URI and return no canonical link html
   return false;
 }
 // We want these _GET keys removed from the canonical link
 $ignore_get_keys = array( 'currency', 'language', 'page', 'sort', 'ref', 'affiliate_banner_id', 'osCsid' );

 if ( !empty( $_GET ) ) {
   $get = $_GET; 
   $newget = array();
   foreach ( $get as $key => $value ) {
     if ( in_array( $key, $ignore_get_keys ) ) {
       unset( $get[$key] );
     } 
   }
 } else {
   $get = '';
 }
 if ( !empty( $get ) ) {
   $get = '?' . http_build_query( $get );
 }

 $canonical =  $domain . $request_uri . $get;
 echo '<link rel="canonical" href="' . $canonical . '"' . $close_tag . PHP_EOL;   
} // function
?>

 

Hope it helps or gives some ideas ( http_build_query is PHP5 but has a wrapper in RC2a not sure about MS2.2)

Link to comment
Share on other sites

 

 

Thanks for that, deals with some issues I did'nt need to worry about when I was writing the code & some I had'nt thought about, I'll create an update in time with some of that as it'll widen its usefulness.

 

Cheers smile.gif

Sam

 

Remember, What you think I ment may not be what I thought I ment when I said it.

 

Contributions:

 

Auto Backup your Database, Easy way

 

Multi Images with Fancy Pop-ups, Easy way

 

Products in columns with multi buy etc etc

 

Disable any Category or Product, Easy way

 

Secure & Improve your account pages et al.

Link to comment
Share on other sites

Uploaded new version 1.1

 

Expanded server support, with many thanks to the code provided by Robert Fisher (FWR Media).

Added contols for xhtml content & ssl duplicated pages.

Sam

 

Remember, What you think I ment may not be what I thought I ment when I said it.

 

Contributions:

 

Auto Backup your Database, Easy way

 

Multi Images with Fancy Pop-ups, Easy way

 

Products in columns with multi buy etc etc

 

Disable any Category or Product, Easy way

 

Secure & Improve your account pages et al.

Link to comment
Share on other sites

Thank you for another very useful contribution. May I ask you to clarify the following:

 

1/ what is the difference between this contribution and Duplicate Content Manager ?

 

2/ if Duplicate Content Manager is installed, should this, understand Remove & Prevent duplicate content also be installed? If yes, should they both be installed?

Absinthe Original Liquor Store

Link to comment
Share on other sites

Thank you for another very useful contribution. May I ask you to clarify the following:

 

1/ what is the difference between this contribution and Duplicate Content Manager ?

 

2/ if Duplicate Content Manager is installed, should this, understand Remove & Prevent duplicate content also be installed? If yes, should they both be installed?

 

 

This uses the canonical tag to redirect search engines to the single correct url to use for links to a site page, thereby consolidating any page ranking spead caused by multiple page url variations & remove any duplicate penalty. It wiil make no visible change on the page or its url.

 

As I understand it (I havent disected it) Robert's add-on deals with the problem of duplicates by presenting a no-follow tag for pages where duplicates have occured, thus causing those to be removed from indexes. Otherwise modifying the page url to remove problem params.

 

I believe Robert's may cause the site a temporary ranking dip as pages are removed. I also suspect he would have used the canonical had it been available when he wrote that add-on.

 

I would advise you use only one or the other. smile.gif

Edited by spooks

Sam

 

Remember, What you think I ment may not be what I thought I ment when I said it.

 

Contributions:

 

Auto Backup your Database, Easy way

 

Multi Images with Fancy Pop-ups, Easy way

 

Products in columns with multi buy etc etc

 

Disable any Category or Product, Easy way

 

Secure & Improve your account pages et al.

Link to comment
Share on other sites

This uses the canonical tag to redirect search engines to the single correct url to use for links to a site page, thereby consolidating any page ranking spead caused by multiple page url variations & remove any duplicate penalty. It wiil make no visible change on the page or its url.

 

As I understand it (I havent disected it) Robert's add-on deals with the problem of duplicates by presenting a no-follow tag for pages where duplicates have occured, thus causing those to be removed from indexes. Otherwise modifying the page url to remove problem params.

 

I believe Robert's may cause the site a temporary ranking dip as pages are removed. I also suspect he would have used the canonical had it been available when he wrote that add-on.

 

I would advise you use only one or the other. smile.gif

 

Sort of .. Prevent duplicates has two modes, 1) being adding text to the title/description/keywords making it a truley unique page .. the second is for where indexed content has to be removed where it adds <meta name="ROBOTS" content="NOINDEX, FOLLOW" /> .. you would switch back to the standard mode once duplicates had been removed.

 

Sam is correct I would have made use of the canonical tag had it been available at the time .. in fact I'm using it in a new meta tags contrib KissMT.

Edited by FWR Media
Link to comment
Share on other sites

Thank you for your replies - can I be a little vampire and ask your opinion relating to the tep_href_link we've been discussing elsewhere? :unsure:

 

This is Sams support thread and I've probably posted in it too much already. I'm certainly not going to answer off topic subjects here.

 

One final thing about canonical I'll say is ... beware .. it's a great new thing but it's not good for everything there are places where a robots noindex follow tag is far far better.

Link to comment
Share on other sites

Expanding the removed param's list

Thank you for your replies - can I be a little vampire and ask your opinion relating to the tep_href_link we've been discussing elsewhere? unsure.gif

 

Sam, can these param's be also included: testimonial_id and reviews_id?

 

 

That can be done, but to simplify it requires changing the code a bit (I could have done this b4, but it was late, so I did'nt)

 

in the function, find the line:

 

$search = array('/&*currency=[A-Z]*/','/&*language=[a-z]*/','/&*page=\d*/','/&*sort=\w*/'
,'/&*ref=\w*/','/&*affiliate_banner_id=\w*/','/&*osCsid.*/','/\?\z/');

 

 

replace with:

 

 $remove_array = array( 'currency','language','page','sort','ref','affiliate_banner_id'); 
foreach ( $remove_array as $value ) {
  $search[] = '/&*' . $value . '=\w*/';
}
$search[] = ('/&*osCsid.*/'); $search[] = ('/\?\z/'); 

 

then to expand the removed param's list, just add to the $remove_array in the same format, seperated by commas. smile.gif

Sam

 

Remember, What you think I ment may not be what I thought I ment when I said it.

 

Contributions:

 

Auto Backup your Database, Easy way

 

Multi Images with Fancy Pop-ups, Easy way

 

Products in columns with multi buy etc etc

 

Disable any Category or Product, Easy way

 

Secure & Improve your account pages et al.

Link to comment
Share on other sites

This is Sams support thread and I've probably posted in it too much already. I'm certainly not going to answer off topic subjects here.

I had absolutely no intention to discuss that topic here - I'm sorry if it wasn't clear. I was just curious if I can have your ideas there, in the other thread.

 

Thank you Sam, I'll try that.

Absinthe Original Liquor Store

Link to comment
Share on other sites

Hi Sam

 

I decided to try your code, even though Header Tags SEO already generates a canonical url. I am also using your product listing contrib and notice that on maximum sort (show all) the canonical url does not ignore the ?max... parameter.

Link to comment
Share on other sites

Hi Sam

 

I decided to try your code, even though Header Tags SEO already generates a canonical url. I am also using your product listing contrib and notice that on maximum sort (show all) the canonical url does not ignore the ?max... parameter.

 

 

Use the code mod I gave ealier & add max to the $remove_array

Sam

 

Remember, What you think I ment may not be what I thought I ment when I said it.

 

Contributions:

 

Auto Backup your Database, Easy way

 

Multi Images with Fancy Pop-ups, Easy way

 

Products in columns with multi buy etc etc

 

Disable any Category or Product, Easy way

 

Secure & Improve your account pages et al.

Link to comment
Share on other sites

Sam, I believe that the final / at the end of the link tag is missing. What we have atm is:

<link rel="canonical" href="http://www.mystore.com/customer_testimonials.php" >

instead of this:

<link rel="canonical" href="http://www.mystore.com/customer_testimonials.php" />

Absinthe Original Liquor Store

Link to comment
Share on other sites

Sam, I believe that the final / at the end of the link tag is missing. What we have atm is:

<link rel="canonical" href="http://www.mystore.com/customer_testimonials.php" >

instead of this:

<link rel="canonical" href="http://www.mystore.com/customer_testimonials.php" />

 

Change the function call parameter $xhtml = false to $xhtml = true will get u that. smile.gif

 

 

Sam

 

Remember, What you think I ment may not be what I thought I ment when I said it.

 

Contributions:

 

Auto Backup your Database, Easy way

 

Multi Images with Fancy Pop-ups, Easy way

 

Products in columns with multi buy etc etc

 

Disable any Category or Product, Easy way

 

Secure & Improve your account pages et al.

Link to comment
Share on other sites

I'm using this contrib and works well for me.

 

However is good to delete the language_id from the canonical tag?

 

For example if i have a product_info.php page in 10 languages with 10 different description (one per language), is this considered as duplicate page?

Link to comment
Share on other sites

I'm using this contrib and works well for me.

 

However is good to delete the language_id from the canonical tag?

 

For example if i have a product_info.php page in 10 languages with 10 different description (one per language), is this considered as duplicate page?

 

osCommerce cannot deal with multiple languages . . they simply won't get indexed.

 

If you need multi language SEO support then you need the following contribution (in development) ..

 

http://www.oscommerce.com/forums/topic/349814-ultimate-seo-urls-5-pro/

 

For correct use of canonical tags with meta tags you should look at KissMT this is also in development but both are at a stage of near release.

 

KissMT would make Prevent duplicates unnecessary.

 

Hope that helps.

Link to comment
Share on other sites

One final thing about canonical I'll say is ... beware .. it's a great new thing but it's not good for everything there are places where a robots noindex follow tag is far far better.

Can you please tell us where would it be better to use noindex?

Absinthe Original Liquor Store

Link to comment
Share on other sites

Can you please tell us where would it be better to use noindex?

 

Yes sure.

 

What has to be remembered is that canonical tags are not the holy grail, very useful to be sure but not good for everything.

 

canonical tags are effectively to the bots a soft 301 redirect. So if you take split page results pages (a typical duplicate content example) . . adding a canonical tag where there is e.g. page and sort in the querystring informs the bots that this is a non page and that only the main page is the canonical version .. this in turn means that the links on those pages will not be followed possibly leading to those products not being indexed.

 

So look at what we actually want from these pages .. we don't want them indexed as they are "non pages" with no true content nor meaning .. we do however want the bots to follow the links and index the pages they find .. so .. in comes the ..

 

<meta name="robots" content="noindex, follow" />

 

It does exactly this .. tells the bots not to index the page but to follow the links and index those it finds .. perfect!

Link to comment
Share on other sites

Yes sure.

 

What has to be remembered is that canonical tags are not the holy grail, very useful to be sure but not good for everything.

 

canonical tags are effectively to the bots a soft 301 redirect. So if you take split page results pages (a typical duplicate content example) . . adding a canonical tag where there is e.g. page and sort in the querystring informs the bots that this is a non page and that only the main page is the canonical version .. this in turn means that the links on those pages will not be followed possibly leading to those products not being indexed.

 

So look at what we actually want from these pages .. we don't want them indexed as they are "non pages" with no true content nor meaning .. we do however want the bots to follow the links and index the pages they find .. so .. in comes the ..

 

<meta name="robots" content="noindex, follow" />

 

It does exactly this .. tells the bots not to index the page but to follow the links and index those it finds .. perfect!

 

Do you have any evidence to support this statement? after all the tag does not say don't follow links & does not prevent pages being visited.

Sam

 

Remember, What you think I ment may not be what I thought I ment when I said it.

 

Contributions:

 

Auto Backup your Database, Easy way

 

Multi Images with Fancy Pop-ups, Easy way

 

Products in columns with multi buy etc etc

 

Disable any Category or Product, Easy way

 

Secure & Improve your account pages et al.

Link to comment
Share on other sites

Do you have any evidence to support this statement? after all the tag does not say don't follow links & does not prevent pages being visited.

 

Firstly Sam my apologies for replying to this at all in your support topic . . I had assumed the post to be in one of my own support topics when I saw the email (all I saw was prevent duplicate) .. I just clicked on it and answered.

 

Which bits did you want confirmation of? the canonical tag or the robots tag? I'll try to find something.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...