Jump to content
Sign in to follow this  
spooks

Remove & Prevent duplicate content with the canonical tag

Recommended Posts

Firstly Sam my apologies for replying to this at all in your support topic . . I had assumed the post to be in one of my own support topics when I saw the email (all I saw was prevent duplicate) .. I just clicked on it and answered.

 

Which bits did you want confirmation of? the canonical tag or the robots tag? I'll try to find something.

 

 

That the canaonical tag could prevent child pages being indexed and that once the bots see the tag they will only visit the page indicated by that, which is what I thinnk you were saying.


Sam

 

Remember, What you think I ment may not be what I thought I ment when I said it.

 

Contributions:

 

Auto Backup your Database, Easy way

 

Multi Images with Fancy Pop-ups, Easy way

 

Products in columns with multi buy etc etc

 

Disable any Category or Product, Easy way

 

Secure & Improve your account pages et al.

Share this post


Link to post
Share on other sites

That the canaonical tag could prevent child pages being indexed and that once the bots see the tag they will only visit the page indicated by that, which is what I thinnk you were saying.

 

If you watch the video by Mat Cutts this gives a great insight into the canonical element.

 

The key bit for me is where he explains that we should view the element as a 301 redirect.

 

If you 301 redirected a page would Google index it and follow the links on the page? the answer is of course no. I also have an example of this in practise on another forum where canonical tags were used for split pages and the products dropped from the index.

 

Hope this helps.

 

Rob

Edited by FWR Media

Share this post


Link to post
Share on other sites

Something is wrong

 

Hi this contrib looks good but now when i have installed it i see there are a missing tag to get it work. look att this.

 

Now its like this : <link rel="canonical" href="http://www.homepage.com/" >

 

But it must be like this : <link rel="canonical" href="http://www.homepage.com/" />

 

This is missing at the end of it />

 

There is something in this line that must be changed : echo '<link rel="canonical" href="' . $domain . $request_uri . '"' . $close_tag . PHP_EOL;

 

Is there anyone that nows ?

 

/mvh Marcus

Edited by swekidz

Share this post


Link to post
Share on other sites

Require self closing tag

 

Please take care not to make wild statements b4 your sure you know what you are saying!!

 

Self closing tags are only valid on certain doctypes & are invalid otherwise.

 

Change the function call parameter $xhtml = false to $xhtml = true will give you a self closing tag . smile.gif

 


Sam

 

Remember, What you think I ment may not be what I thought I ment when I said it.

 

Contributions:

 

Auto Backup your Database, Easy way

 

Multi Images with Fancy Pop-ups, Easy way

 

Products in columns with multi buy etc etc

 

Disable any Category or Product, Easy way

 

Secure & Improve your account pages et al.

Share this post


Link to post
Share on other sites

First of all, Happy New Year and thank you for this great contribution!

 

Sam, I've done some tweaks to remove "index.php" from the URL, including this in the .htaccess

#redirect index.php to root 
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\ HTTP/ 
RewriteRule ^index\.php$ http://www.yourdomain.com/ [R=301,L]

 

However, when I click any language on the home page, the canonical URL for /index.php?language=fr is including the index.php:

<link rel="canonical" href="http://www.mystore.com/index.php" />

 

Is there a way of removing the index.php from the above?


Absinthe Original Liquor Store

Share this post


Link to post
Share on other sites

remove index.php from the uri

Sam, I've done some tweaks to remove "index.php" from the URL, including this in the .htaccess

#redirect index.php to root 
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\ HTTP/ 
RewriteRule ^index\.php$ http://www.yourdomain.com/ [R=301,L]

 

However, when I click any language on the home page, the canonical URL for /index.php?language=fr is including the index.php:

<link rel="canonical" href="http://www.mystore.com/index.php" />

 

Is there a way of removing the index.php from the above?

 

 

Assuming you have done the code mod to expand the removed param's list that I gave b4.

 

b4:

 

$request_uri = preg_replace('/\?&/', '?', preg_replace($search, '', $request_uri )); 

 

add:

 

$search[] = ('/index.php\/*/'); 

 

Note that if you are using Search-Engine Safe URLs (still in development) this change is likely to cause problems with the created urls, however other seo urls should be fine, make sure you test them though. smile.gif


Sam

 

Remember, What you think I ment may not be what I thought I ment when I said it.

 

Contributions:

 

Auto Backup your Database, Easy way

 

Multi Images with Fancy Pop-ups, Easy way

 

Products in columns with multi buy etc etc

 

Disable any Category or Product, Easy way

 

Secure & Improve your account pages et al.

Share this post


Link to post
Share on other sites

Thank you for your fast reply, it seems to work just fine. I never used Search-Engine Safe URLs, I'm using Ultimate SEO URLs and it works fine. I would definitely vote this contribution in TOP TEN!!! :thumbsup:


Absinthe Original Liquor Store

Share this post


Link to post
Share on other sites

osCommerce could do with canonical tags especially for the "split page" type pages. I must confess that at the time of writing the prevent duplicates thing that I wrote I wasn't aware of these. Will be interesting to see how the bots react to it.

 

There are a few things that I think could be better in the function do you want me to post them?

 

Also the title text for the instructions is for the wrong contib m8 ;)

 

 

Hi thanks for make these great contribs :

 

I Have a problem on my site i cant remove the currency with this canonial tag and on allproducts it should be canonical tag when to http://www.swehealth.se/allprods.php'>http://www.swehealth.se/allprods.php becouse now i get dublicate on allproducts pages.

 

This is the homepage :

 

http://www. swehealth.se/index.php?currency=eur

 

http://www. swehealth.se/allprods.php?fl=B

 

When i installs the canonical tag it it will be <link rel="canonical" href="http://www. swehealth.se/allprods.php?fl=B" >

It want it to be <link rel="canonical" href="http://www. swehealth.se/allprods.php" > on allproducts pages

 

/Best Regards Marcus

Share this post


Link to post
Share on other sites

Hi Sam,

 

If I have Header Tags SEO installed, and I use your canonical tag contribution, do I unselect the "canonical" checkbox in the Header Tags SEO meta tags control box for each page then?

 

Google had put some extra jibberish at the end of my index.php file... "index.php/test-m-12?sort=3a&page=1" and I'm assuming this is an example of duplicate content?

 

My site was back to number one on google for a week, then it disappeared again. I'm thinking I had a canonical configuration error, and now am trying to fix that.

 

I recently installed your new contribution for the anti-hacker account creation, and it works like a charm, but now I've got to figure out why Google dropped my site. The last time it did this was when I had the php insertion hack happen. That was about four months ago. I have installed all sorts of security methods since rebuilding the newer, better site.

 

So should I install this new canonical tag contribution of yours? How does it work with the one for Header Tags SEO? It is better? or does it expand on it's function?

 

Thank you,

 

Jason

Share this post


Link to post
Share on other sites

Header Tags SEO

 

 

If I have Header Tags SEO installed, and I use your canonical tag contribution, do I unselect the "canonical" checkbox in the Header Tags SEO meta tags control box for each page then?

 

Yes

 

Google had put some extra jibberish at the end of my index.php file... "index.php/test-m-12?sort=3a&page=1" and I'm assuming this is an example of duplicate content?

 

thats precisely the sort of issue this will address biggrin.gif

 

How does it work with the one for Header Tags SEO?

 

clearly if you have the above then the Header Tags canonical isn't working for you. huh.gif

 

This leaves the url in whatever format u have, so wont effect others, it will have no effect on other tags on the page. smile.gif


Sam

 

Remember, What you think I ment may not be what I thought I ment when I said it.

 

Contributions:

 

Auto Backup your Database, Easy way

 

Multi Images with Fancy Pop-ups, Easy way

 

Products in columns with multi buy etc etc

 

Disable any Category or Product, Easy way

 

Secure & Improve your account pages et al.

Share this post


Link to post
Share on other sites

Another excellent piece of work Sam, could you give me an example of how I would test this contribution?

 

After telling Header Tags to not use "canonical" across all the pages, I double checked on the source code to make sure Header Tags was no longer adding that tag.

 

After installing your contrib, this is my output for a random product:

 

Address bar location: http://www.oldfoundry.com/catalog/product_info.php?cPath=36&products_id=59

 

Source code when viewed from Safari:

 

<meta name="robots" content="noodp" >

<meta name="slurp" content="noydir" >

<link rel="canonical" href="http://www.oldfoundry.com/catalog/product_info.php?products_id=59" >

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" >

<!-- EOF: Header Tags SEO Generated Meta Tags -->

<link rel="canonical" href="http://www.oldfoundry.com/catalog/product_info.php?cPath=36&products_id=59" >

<base href="http://www.oldfoundry.com/catalog/">

<link rel="stylesheet" type="text/css" href="stylesheet.css">

 

From what I'm seeing, Header Tags may still be doing it's canonical output, with yours being right after the EOF of Header Tags.

 

Not sure if what I'm doing is correct.

 

Thank you,

 

Jason

Edited by rustyclockwork

Share this post


Link to post
Share on other sites

I went ahead and removed all the canonical code references to the header_tags.php file, I hope that was the correct thing to do... I left your code addition alone of course.

 

It's now showing a single canonical link reference outside of the EOF header tags file:

 

<meta name="robots" content="noodp" >

<meta name="slurp" content="noydir" >

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" >

<!-- EOF: Header Tags SEO Generated Meta Tags -->

<link rel="canonical" href="http://www.oldfoundry.com/catalog/product_info.php?products_id=32" >

<base href="http://www.oldfoundry.com/catalog/">

<link rel="stylesheet" type="text/css" href="stylesheet.css">

 

Is this correct?

 

Thanks again,

 

Jason

Share this post


Link to post
Share on other sites

 

 

Yes, it looks like your header tags is still generating a canonical, check again you have followed the instructions, if so raise the issue in the support thread for it.

 

There may be a line in the file you could just comment (ie look for canonical).

 

It looks like you need to add cPath to the removed param's list, use the code change detailed ealier in this thread, or use the larest I just uploaded as that makes adding the param easier. smile.gif


Sam

 

Remember, What you think I ment may not be what I thought I ment when I said it.

 

Contributions:

 

Auto Backup your Database, Easy way

 

Multi Images with Fancy Pop-ups, Easy way

 

Products in columns with multi buy etc etc

 

Disable any Category or Product, Easy way

 

Secure & Improve your account pages et al.

Share this post


Link to post
Share on other sites

Hi Sam,

 

thank you again for taking the time to look at this.

 

After installing your latest version of the contribution, this is my source code now.

 

<meta name="robots" content="noodp" >

<meta name="slurp" content="noydir" >

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" >

<!-- EOF: Header Tags SEO Generated Meta Tags -->

<link rel="canonical" href="http://www.oldfoundry.com/catalog/product_info.php?products_id=32" >

<base href="http://www.oldfoundry.com/catalog/">

<link rel="stylesheet" type="text/css" href="stylesheet.css">

 

All good now? Google bot made some suggestions on my webmaster tools part.

 

Googlebot has crawled your site and made a few parameter suggestions.

cPath Suggested: Don't ignore

sort Suggested: Don't ignore

filter_id Suggested: Don't ignore

articles_id Suggested: Don't ignore

 

Should I be telling Google bot to ignore any of the above?

 

Thanks again,

 

Jason

 

 

Thanks again,

 

Jason

Share this post


Link to post
Share on other sites

 

Googlebot has crawled your site and made a few parameter suggestions.

cPath Suggested: Don't ignore

sort Suggested: Don't ignore

filter_id Suggested: Don't ignore

articles_id Suggested: Don't ignore

 

 

 

cPath is known to cause duplicate content issues

sort will cause duplicate content issues

 

the others I`m not familier with likely added by add-ons


Sam

 

Remember, What you think I ment may not be what I thought I ment when I said it.

 

Contributions:

 

Auto Backup your Database, Easy way

 

Multi Images with Fancy Pop-ups, Easy way

 

Products in columns with multi buy etc etc

 

Disable any Category or Product, Easy way

 

Secure & Improve your account pages et al.

Share this post


Link to post
Share on other sites

Hi Sam,

 

I've downloaded and testing now the fuctionality for this great contribution. Right now I don't call the function within the header_tags.php, in stead I call it within each page I need. After my test I would implement it as you suggest.

 

Now I saw the possibility to enhance the remove-parm-array and I added the manufacturers_id.

 

Right now in each page I'm using the function "CanonicalLink" the parm manufacturers_id will be removed in the generated tag <link rel="canonical". This should be the solution for product_info.php but not for the index.php if I select by the manufacturer-drop down.

 

generated canonical tag in (manufacturer-id selected = 62 (in my case Kokam):

 

index.php = http://www.hobbynet.ch/shop/index.php? in stead of http://www.hobbynet....facturers_id=62

 

product_info.php = <link rel="canonical" href="http://www.hobbynet.ch/shop/product_info.php?products_id=1410" /> this is ok to prevent duplicates e.g. <link rel="canonical" href="http://www.hobbynet.ch/shop/product_info.php?manufacturers_id=62&products_id=1410" /> links to the same page due to the products_id=1410

 

question:

Is it possible to enhance the array with the page name as a key? In that way it is possible to set the removed parm for each page seperatly. The lookup code must then be enhanced to search within the array with the current page.

 

Im looking forward to read some lines about.

 

kind regards

 

peter

Share this post


Link to post
Share on other sites

Hi

 

Great contrib , i have a question about this url :

 

http://www. homepage.se/?currency=eur&cpath=36_43 with this contrib the the canonical link is like this =<link rel="canonical" http://www. homepage.se/?cpath=36_43

 

http://www. homepage.se/?eur&cpath=21_65 with this contrib the canonical link is like this =<link rel="canonical" http://www. homepage.se/?eur&cpath=21_65

 

 

Is it possible to to make this urls to <link rel="canonical" http://www.homepage.se/

 

Instead of the http://www. homepage.se/?cpath=36_43 , http://www. homepage.se/?eur&cpath=21_65 ?

 

Is that possible ?

 

/Best Regards Marcus

Edited by swekidz

Share this post


Link to post
Share on other sites

Remove param on selected pages

 

 

Yes you've raised a good point, I'll look at the issue when I can.


Sam

 

Remember, What you think I ment may not be what I thought I ment when I said it.

 

Contributions:

 

Auto Backup your Database, Easy way

 

Multi Images with Fancy Pop-ups, Easy way

 

Products in columns with multi buy etc etc

 

Disable any Category or Product, Easy way

 

Secure & Improve your account pages et al.

Share this post


Link to post
Share on other sites

 

Is it possible to to make this urls to <link rel="canonical" http://www.homepage.se/

 

Instead of the http://www. homepage.se/?cpath=36_43 , http://www. homepage.se/?eur&cpath=21_65 ?

 

 

Just add cpath to the remove array & that will be removed, note its normally cPath unless moded on your site? the routine is case sensitive.

 

eur on its own its not a valid param, you would be better finding the cause of that erronious param in the uri


Sam

 

Remember, What you think I ment may not be what I thought I ment when I said it.

 

Contributions:

 

Auto Backup your Database, Easy way

 

Multi Images with Fancy Pop-ups, Easy way

 

Products in columns with multi buy etc etc

 

Disable any Category or Product, Easy way

 

Secure & Improve your account pages et al.

Share this post


Link to post
Share on other sites
<br />When i installs the canonical tag it it will be <link rel="canonical" href="http://www. swehealth.se/allprods.php?fl=B" ><br />It want it to be <link rel="canonical" href="http://www. swehealth.se/allprods.php" > on allproducts pages

 

 

 

 

Use the lateset (1.2) and add fl to the list of removed pararms

Edited by spooks

Sam

 

Remember, What you think I ment may not be what I thought I ment when I said it.

 

Contributions:

 

Auto Backup your Database, Easy way

 

Multi Images with Fancy Pop-ups, Easy way

 

Products in columns with multi buy etc etc

 

Disable any Category or Product, Easy way

 

Secure & Improve your account pages et al.

Share this post


Link to post
Share on other sites

Remove param on selected pages

 

 

Yes you've raised a good point, I'll look at the issue when I can.

 

 

Hi Sam,

 

I have implement some line of code within you function. Please take a look:

 

 

//hobbynet: deactivate the original array setting code

//$remove_array = array( 'currency','language','main_page','page','sort','ref','affiliate_banner_id','max');

// Add to this array any additional params you need to remove in the same format as the existing

 

//hobbynet: set parms to be removed in general

$remove_array_general = array( 'currency','language','main_page','page','sort','ref','affiliate_banner_id','max');

 

//hobbynet: set parms to be remove for each defined page

//hobbynet: can be enhanced with the page-name e.g. specials.php as the key and set the parms to be removed in the second array

//hobbynet: keep in mind; each page-specific array entry has to be splitted with "," (comma)

$remove_array_page = array('product_info.php' => array('manufacturers_id') // e.g. manufacturers_id has to be removed in product_info.php

,'index.php' => array() // e.g. nothing to remove in addition to the general-remove-array

);

 

//hobbynet: merge arrays together general-removal-array and page-specific-removal_array

//hobbynet: $basefile is set to the current requested page

//hobbynet: get the page-specific-removal-array keyed by the value in $basefile with "$remove_array_page[$basefile]"

$remove_array_merged = array_merge($remove_array_general, $remove_array_page[$basefile]);

 

//hobbynet: use the new $remove_array_merged in stead of the orig. $remove_array

foreach ( $remove_array_merged as $value ) {

 

 

It will work and it will remove e.g. the manufacturers_id in product_info.php but not in index.php or other pages

 

Kind regards

 

peter

Share this post


Link to post
Share on other sites

Use the lateset (1.2) and add fl to the list of removed pararms

 

Hi

 

You are great i have removed mutch problems now but these will not disapear :

 

/allprods.php/sort/2a/action/buy_now/products_id/366

/allprods.php/sort/2a/action/buy_now/products_id/54

/allprods.php/sort/2a/page/2

/allprods.php/sort/2a/page/4

/allprods.php/sort/2a/page/5

/allprods.php/sort/2a/page/6

 

/specials.php/products_id/491

/specials.php?products_id=477

 

 

This url your script works fine : /allprods.php?sort=2a&page=4 this will be = allprods.php

 

But the other there i cant find out how i can remove that 8 urls , do you have any ide ?

 

/Best Regards Marcus

Edited by swekidz

Share this post


Link to post
Share on other sites

Hi

 

You are great i have removed mutch problems now but these will not disapear :

 

/allprods.php/sort/2a/action/buy_now/products_id/366

/allprods.php/sort/2a/action/buy_now/products_id/54

/allprods.php/sort/2a/page/2

/allprods.php/sort/2a/page/4

/allprods.php/sort/2a/page/5

/allprods.php/sort/2a/page/6

 

/specials.php/products_id/491

/specials.php?products_id=477

 

 

This url your script works fine : /allprods.php?sort=2a&page=4 this will be = allprods.php

 

But the other there i cant find out how i can remove that 8 urls , do you have any ide ?

 

/Best Regards Marcus

 

Hi Marcus,

 

Oh, it seems your problem is related to the search-engine-frindly-url. The SEFU will handle the url in a other way as it handled "normaly". In normal mode the url will be allprods.php?sort=2a&page=2 the SEFU will generate the same url as /allprods.php/sort/2a/page/2. I do not realy know whether the "Replying to Remove & Prevent duplicate ... with canonical tag" can handle the url's generated by the SEFU-module.

 

If you like to remove the products_id for special.php you have to set it to:

 

$remove_array_page = array('product_info.php' => array('manufacturers_id') // e.g. manufacturers_id has to be removed in product_info.php

,'index.php' => array() // e.g. nothing to remove in addition to the general-remove-array in index.php

,'allprods.php' => array('fl')

,'specials.php' => array('products_id')

);

 

Please ignore the product_info, index.php, allprods.php line within the array.

 

If you like you can send me your lines defining the $remove_array_page

 

regards

 

peter

Share this post


Link to post
Share on other sites

Hi

 

You are great i have removed mutch problems now but these will not disapear :

 

/allprods.php/sort/2a/action/buy_now/products_id/366

/allprods.php/sort/2a/action/buy_now/products_id/54

/allprods.php/sort/2a/page/2

/allprods.php/sort/2a/page/4

/allprods.php/sort/2a/page/5

/allprods.php/sort/2a/page/6

 

/specials.php/products_id/491

/specials.php?products_id=477

 

 

This url your script works fine : /allprods.php?sort=2a&page=4 this will be = allprods.php

 

But the other there i cant find out how i can remove that 8 urls , do you have any ide ?

 

/Best Regards Marcus

 

 

This does not support search-engine-frindly-url (still in development), to add some support, change:

 

$search[] = '/&*' . $value . '=\w*/';

 

to

 

$search[] = '/&*' . $value . '[=\/]+\w*/';

 

but issues could arise, ie if a param is set but has no value.

 

products_id should not be added for removal as it is required.

 

buy_now implies you have issues with cart pages, but they should be excluded from the bots with robots.txt


Sam

 

Remember, What you think I ment may not be what I thought I ment when I said it.

 

Contributions:

 

Auto Backup your Database, Easy way

 

Multi Images with Fancy Pop-ups, Easy way

 

Products in columns with multi buy etc etc

 

Disable any Category or Product, Easy way

 

Secure & Improve your account pages et al.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×