Google Duplicate Content with Strange cPath Variable Indexed
#1
Posted 16 November 2011 - 05:17 PM
Tim
#2
Posted 16 November 2011 - 05:25 PM
#3
Posted 16 November 2011 - 06:11 PM
#4
Posted 16 November 2011 - 07:47 PM
#5
Posted 17 November 2011 - 01:36 PM
Hope it will be fixed.
#6
Posted 17 November 2011 - 05:19 PM
/*
Copyright © 2011 clustersolutions.net
Released under the GNU General Public License.
Please give credit where credit is due.
*/
// Validate URL cPath Parameter
function tep_validate_url_cpath() {
global $HTTP_GET_VARS, $products_id;
if (isset($HTTP_GET_VARS['cPath']) && tep_not_null($products_id)) {
$bb = array();
$prod_cat_check_query = tep_db_query("select categories_id from ". TABLE_PRODUCTS_TO_CATEGORIES . " where products_id = " . $products_id);
while ($prod_cat_check = tep_db_fetch_array($prod_cat_check_query)) {
$aa = array();
$path_check['parent_id'] = $prod_cat_check['categories_id'];
do {
array_push($aa, $path_check['parent_id']);
$path_check_query = tep_db_query("select * from " . TABLE_CATEGORIES . " where categories_id = " . $path_check['parent_id']);
$path_check = tep_db_fetch_array($path_check_query);
} while ($path_check['parent_id'] != 0);
array_push($bb, implode('_', array_reverse($aa)));
}
return (in_array($HTTP_GET_VARS['cPath'], $bb) ? true : false);
} else {
return true;
}
}
#7
Posted 27 November 2011 - 08:05 PM
#8
Posted 03 December 2011 - 08:04 AM
#9
Posted 24 December 2011 - 07:53 AM
http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html
which will allow Google to index only the correct pages
#10
Posted 12 January 2012 - 05:19 PM
i'm tryng to use this function and changed code in application_top.php to the following:
// calculate category path
if (isset($_GET['cPath'])) {
$cPath = $_GET['cPath'];
} elseif (isset($_GET['products_id']) && !isset($_GET['manufacturers_id'])) {
$cPath = tep_get_product_path($_GET['products_id']);
} else {
$cPath = '';
}
if (tep_validate_url_cpath($cPath) === false) {
header('HTTP/1.1 404 Not Found');
echo '<h1>404 Not Found</h1>';
tep_exit();
} else {
if (tep_not_null($cPath)) {
$cPath_array = tep_parse_category_path($cPath);
$cPath = implode('_', $cPath_array);
$current_category_id = $cPath_array[(sizeof($cPath_array)-1)];
} else {
$current_category_id = 0;
}
}
But the function it returns always true and goes the 404.
I'm using canonical contribution but they return as canonical links also these duplicate content categoryes because this is part of the ocommerce core.
I think that this is the right place to put the check
#11
Posted 08 March 2012 - 01:29 PM
Unfortunately i tried several canonical urls contribs and using ultimate seo URL 5 pro but unfortunately they cannot resolve this problem.
Google webmaster tools reported up to 2000 duplicated content pages with random cpath in the url.
#12
Posted 10 March 2012 - 10:49 AM
Try to add Disallow: /*?* to robots.txt to remove the duplicate issue.
Still one seo problem remains. Links to products pages from categories pages end with .html?cPath...
Can anyone suggest me how to fix the categories template so links will be ".html" ?
#13
Posted 10 March 2012 - 02:19 PM
<code>
$duplicate = array( '52_260','288_380_186','288_380_2_186','288_504_2_186','504_22_301_77','47_2_114','3_47_22_301','70_34_544_514','288_504_34_389',
'70_34_389','34_389','288_380_389','369_546','288_52_260_531','70_537_160_479','288_380_34_474_491','288_504_34_474_491','70_537_34_474_491','288_504_70_537_560','288_504_443_444',
'380_34_474_602');
if (in_array($_GET['cPath'], $duplicate)) {
header("HTTP/1.1 404 Not Found");
echo "<h1>404 Not Found</h1>";
unset($duplicate);
tep_exit();
}
</code>
#14
Posted 10 March 2012 - 05:55 PM
Regards
Jim
Banners Box 2.3.x Support
Categories Accordion Box 2.3.x Support
Categories Images Box 2.2x 2.3.x Support
Closest Shipper 2.2x Support
Document Manager 2.2x Support
Generic Box 2.3.x Support
Get 1 Free 2.2x Support
jQuery Banner Rotator 2.2x 2.3.x Support
Modular Front Page 2.3.x Support
Modular SEO Header Tags 2.3.x Support
MVS 2.2x Support
PDF Datasheet 2.3.x Support
Price Updater 2.2x
Products Specifications 2.2x 2.3.x Development Version Support Bugs/Suggestions
Request a Review 2.2x - 2.3.x Support
Similar Products Box 2.2x
Specials Image Overlay 2.3x Support
Theme Switcher 2.3.x Support
#15
Posted 12 March 2012 - 09:41 AM
Here is what I found at webmaster centre help at google:
- To block access to all URLs that include a question mark (?) (more specifically, any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string):
User-agent: Googlebot
Disallow: /*?
#16
Posted 12 March 2012 - 07:55 PM
http://forums.oscommerce.com/topic/377890-how-to-get-full-cat-path/
function get_full_cat_from_cPath ($zipote)
{
$query_trabajo_1=tep_db_query("SELECT `parent_id` FROM `categories` WHERE `categories_id` = '" . $zipote . "'");
$land = tep_db_fetch_array($query_trabajo_1);
$cat_completa = $zipote;
while (! $land[parent_id] == 0) {
$query_ciclica=tep_db_query("SELECT `parent_id` FROM `categories` WHERE `categories_id` = '" . $land[parent_id] . "'");
$land=tep_db_fetch_array($query_ciclica);
if (! $land[parent_id] == 0) {
$cat_completa = $land[parent_id] . '_' . $cat_completa;
}
}
return $cat_completa;
}
I put this in general.php but i cannot make it work. If this function can return the fulul catalog path it can be compared with the current url and if doesn't match give a 301 redirect o 404 error code.
Please let me know if this is a good place where to start and how i can make work this function
#17
Posted 12 March 2012 - 08:38 PM
The cPath in a URL is used to provide the navigation in the categories box. You can rewrite it to something else, but the category information must still be in the link somewhere for the navigation to work.
Since you quote Google on robots.txt, why don't you read this Google help page. I'll quote the relevant sentence:
Quote
I suggest you stop wasting time trying to fix a broken URL rewriter that won't do you any good, ans start spending time on things that actually will help your search engine ranking.
Regards
Jim
Edited by kymation, 12 March 2012 - 08:39 PM.
Banners Box 2.3.x Support
Categories Accordion Box 2.3.x Support
Categories Images Box 2.2x 2.3.x Support
Closest Shipper 2.2x Support
Document Manager 2.2x Support
Generic Box 2.3.x Support
Get 1 Free 2.2x Support
jQuery Banner Rotator 2.2x 2.3.x Support
Modular Front Page 2.3.x Support
Modular SEO Header Tags 2.3.x Support
MVS 2.2x Support
PDF Datasheet 2.3.x Support
Price Updater 2.2x
Products Specifications 2.2x 2.3.x Development Version Support Bugs/Suggestions
Request a Review 2.2x - 2.3.x Support
Similar Products Box 2.2x
Specials Image Overlay 2.3x Support
Theme Switcher 2.3.x Support
#18
Posted 14 March 2012 - 12:08 AM
function get_full_cat_from_cPath($zipote)
{
$query1=tep_db_query("SELECT parent_id FROM categories WHERE categories_id = '" . $zipote . "'");
$land = tep_db_fetch_array($query1);
$cat_completa = $zipote;
while (! $land[parent_id] == 0) {
tep_redirect(tep_href_link(FILENAME_DEFAULT));
tep_exit();
$cat_completa = $land[parent_id] . '_' . $cat_completa;
}
return $cat_completa;
}
calling this function in index.php redirects to the default page. UNFORTUNATELY it works only for the categories were the parent_id is not set.
for example if the orginal cPAth=160_479 i go to the correct page, calling only cPAth 479 it redirects to the dafault page (deleting in my case some hundreds of duplicate pages). But if i call 1_479 (1 is a existant parent_id ) this code does not make the redirect.
p.s. in my webmaster tools i have duplicate content for urls with 8 concatenated cPath like 8_256_47_48_8_78_54_132 and still growing!!!!
#19
Posted 17 March 2012 - 10:50 AM
My opinion that would be better to catch in application_top. The tep_parse_category_path() function is good for it.
if (tep_not_null($cPath)) {
$cPath_array = tep_parse_category_path($cPath);
so in the tep_parse_category_path() function can do controll anything and this is the main built in function.
////
// Parse and secure the cPath parameter values
function tep_parse_category_path($cPath) {
// make sure the category IDs are integers
$cPath_array = array_map('tep_string_to_int', explode('_', $cPath));
// make sure no duplicate category IDs exist which could lock the server in a loop
$tmp_array = array();
$n = sizeof($cPath_array);
for ($i=0; $i<$n; $i++) {
if (!in_array($cPath_array[$i], $tmp_array)) {
$tmp_array[] = $cPath_array[$i];
}
}
/*** Here is the estimated controlling place and need to validate cPath string ***/
return $tmp_array;
}
This problem maybe is persist for all cPath used pages.
SCM
v3
and some rewrites :-)
#20
Posted 17 March 2012 - 07:04 PM
function getData() {
global $categories_string, $tree, $languages_id, $cPath, $cPath_array;
$categories_string = '';
$tree = array();
$categories_query = tep_db_query("select c.categories_id, cd.categories_name, c.parent_id from " . TABLE_CATEGORIES . " c, " . TABLE_CATEGORIES_DESCRIPTION . " cd where c.parent_id = '0' and c.categories_id = cd.categories_id and cd.language_id='" . (int)$languages_id ."' order by sort_order, cd.categories_name");
while ($categories = tep_db_fetch_array($categories_query)) {
$tree[$categories['categories_id']] = array('name' => $categories['categories_name'],
'parent' => $categories['parent_id'],
'level' => 0,
'path' => $categories['categories_id'],
'next_id' => false);
if (isset($parent_id)) {
$tree[$parent_id]['next_id'] = $categories['categories_id'];
}
$parent_id = $categories['categories_id'];
if (!isset($first_element)) {
$first_element = $categories['categories_id'];
}
}
if (tep_not_null($cPath)) {
$new_path = '';
reset($cPath_array);
while (list($key, $value) = each($cPath_array)) {
unset($parent_id);
unset($first_id);
$categories_query = tep_db_query("select c.categories_id, cd.categories_name, c.parent_id from " . TABLE_CATEGORIES . " c, " . TABLE_CATEGORIES_DESCRIPTION . " cd where c.parent_id = '" . (int)$value . "' and c.categories_id = cd.categories_id and cd.language_id='" . (int)$languages_id ."' order by sort_order, cd.categories_name");
if (tep_db_num_rows($categories_query)) {
$new_path .= $value;
while ($row = tep_db_fetch_array($categories_query)) {
$tree[$row['categories_id']] = array('name' => $row['categories_name'],
'parent' => $row['parent_id'],
'level' => $key+1,
'path' => $new_path . '_' . $row['categories_id'],
'next_id' => false);
if (isset($parent_id)) {
$tree[$parent_id]['next_id'] = $row['categories_id'];
}
$parent_id = $row['categories_id'];
if (!isset($first_id)) {
$first_id = $row['categories_id'];
}
$last_id = $row['categories_id'];
}
$tree[$last_id]['next_id'] = $tree[$value]['next_id'];
$tree[$value]['next_id'] = $first_id;
$new_path .= '_';
} else {
break;
}
}
}
This code outputs the array as follows;
Array ( [1] => Array ( [name] => Hardware [parent] => 0 [level] => 0 [path] => 1 [next_id] => 17 ) [2] => Array ( [name] => Software [parent] => 0 [level] => 0 [path] => 2 [next_id] => 3 ) [3] => Array ( [name] => DVD Movies [parent] => 0 [level] => 0 [path] => 3 [next_id] => ) [17] => Array ( [name] => CDROM Drives [parent] => 1 [level] => 1 [path] => 1_17 [next_id] => 4 ) [4] => Array ( [name] => Graphics Cards [parent] => 1 [level] => 1 [path] => 1_4 [next_id] => 8 ) [8] => Array ( [name] => Keyboards [parent] => 1 [level] => 1 [path] => 1_8 [next_id] => 16 ) [16] => Array ( [name] => Memory [parent] => 1 [level] => 1 [path] => 1_16 [next_id] => 9 ) [9] => Array ( [name] => Mice [parent] => 1 [level] => 1 [path] => 1_9 [next_id] => 6 ) [6] => Array ( [name] => Monitors [parent] => 1 [level] => 1 [path] => 1_6 [next_id] => 5 ) [5] => Array ( [name] => Printers [parent] => 1 [level] => 1 [path] => 1_5 [next_id] => 7 ) [7] => Array ( [name] => Speakers [parent] => 1 [level] => 1 [path] => 1_7 [next_id] => 2 ) )
But unfortunately in the duplicated pages this array still is valid with all duplicates values i cannot find anything to check if this array is the good one or the duplicated one.









