Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Sign in to follow this  
dahui

hotwo index other pages as well ???

Recommended Posts

hi

 

I have Google XML Sitemap v1.3, the cronjob workz nicely.

 

is there any possibility to let automatically be indexed other pages as well? I know I could add them manually but I would rather have that done automatically and be submitted by the cronjob as well?

 

thx for yr 2cents

dahui

Share this post


Link to post
Share on other sites

my php skills are to flat, I start beeing able to 'read' and understand code but thats it already.

 

basically I would aproach it like this:

 

e.g.

-<url>

  -<loc>

          http://elflein-kosmetik.de/chiral-a-anti-a...trate-p-31.html

  </loc>

  <lastmod>2005-09-13</lastmod>

  <changefreq>weekly</changefreq>

  <priority>1.0</priority>

</url>

is a part of sitemapproducts.xml generated by the code in sitemap.class.php
	function GenerateProductSitemap(){
 $sql = "SELECT products_id as pID, products_date_added as date_added, products_last_modified as last_mod, products_ordered
      FROM " . TABLE_PRODUCTS . "
   WHERE products_status='1'
   ORDER BY products_ordered DESC";
 if ( $products_query = $this->DB->Query($sql) ){
	 $this->debug['QUERY']['PRODUCTS']['STATUS'] = 'success';
	 $this->debug['QUERY']['PRODUCTS']['NUM_ROWS'] = $this->DB->NumRows($products_query);
	 $container = array();
	 $number = 0;
	 $top = 0;
	 while( $result = $this->DB->FetchArray($products_query) ){
   $top = max($top, $result['products_ordered']);
   $location = $this->hrefLink(FILENAME_PRODUCT_INFO, 'products_id=' . $result['pID'], 'NONSSL', false);
   $lastmod = $this->NotNull($result['last_mod']) ? $result['last_mod'] : $result['date_added'];
   $changefreq = GOOGLE_SITEMAP_PROD_CHANGE_FREQ;
   $ratio = $top > 0 ? $result['products_ordered']/$top : 0;
   $priority = $ratio < .1 ? .1 : number_format($ratio, 1, '.', '');

   $container[] = array('loc' => htmlspecialchars(utf8_encode($location)),
                        'lastmod' => date ("Y-m-d", strtotime($lastmod)),
         'changefreq' => $changefreq,
         'priority' => $priority
                        );
   if ( sizeof($container) >= 50000 ){
  	 $type = $number == 0 ? 'products' : 'products' . $number;
  	 $this->GenerateSitemap($container, $type);
  	 $container = array();
  	 $number++;
   }
	 } # end while
	 $this->DB->Free($products_query);
	 if ( sizeof($container) > 1 ) {
   $type = $number == 0 ? 'products' : 'products' . $number;
   return $this->GenerateSitemap($container, $type);
	 } # end if
 } else {
	 $this->debug['QUERY']['PRODUCTS']['STATUS'] = 'false';
	 $this->debug['QUERY']['PRODUCTS']['NUM_ROWS'] = '0';
 }
} # end function

/**
* Funciton to generate category sitemap data
* @author Bobby Easland
* @version 1.1
* @return boolean
*/
function GenerateCategorySitemap(){
 $sql = "SELECT categories_id as cID, date_added, last_modified as last_mod
      FROM " . TABLE_CATEGORIES . "
   ORDER BY parent_id ASC, sort_order ASC, categories_id ASC";
 if ( $categories_query = $this->DB->Query($sql) ){
	 $this->debug['QUERY']['CATEOGRY']['STATUS'] = 'success';
	 $this->debug['QUERY']['CATEOGRY']['NUM_ROWS'] = $this->DB->NumRows($categories_query);
	 $container = array();
	 $number = 0;
	 while( $result = $this->DB->FetchArray($categories_query) ){
   $location = $this->hrefLink(FILENAME_DEFAULT, 'cPath=' . $this->GetFullcPath($result['cID']), 'NONSSL', false);
   $lastmod = $this->NotNull($result['last_mod']) ? $result['last_mod'] : $result['date_added'];
   $changefreq = GOOGLE_SITEMAP_CAT_CHANGE_FREQ;
   $priority = .5;

   $container[] = array('loc' => htmlspecialchars(utf8_encode($location)),
                        'lastmod' => date ("Y-m-d", strtotime($lastmod)),
         'changefreq' => $changefreq,
         'priority' => $priority
                        );
   if ( sizeof($container) >= 50000 ){
  	 $type = $number == 0 ? 'categories' : 'categories' . $number;
  	 $this->GenerateSitemap($container, $type);
  	 $container = array();
  	 $number++;
   }
	 } # end while
	 $this->DB->Free($categories_query);
	 if ( sizeof($container) > 1 ) {
   $type = $number == 0 ? 'categories' : 'categories' . $number;
   return $this->GenerateSitemap($container, $type);
	 } # end if
 } else {
	 $this->debug['QUERY']['CATEOGRY']['STATUS'] = 'false';
	 $this->debug['QUERY']['CATEOGRY']['NUM_ROWS'] = '0';
 }
} # end function

of sitemap.class.php

 

I would like to add after the while-function for all products or categories some code that at the end of the xml file beeing created appends my 'other pages to be indexed' like e.g.

-<url>

  -<loc>

          http://elflein-kosmetik.de/information.php

  </loc>

  <lastmod>2005-09-13</lastmod>

  <changefreq>weekly</changefreq>

  <priority>1.0</priority>

</url>

 

the idea is to maintain that 'other pages' code than by adding/deleting in the sitemap.class.php manually when files changes or are added and beeing able to use the cronjob to index them 'other pages' autoatically as well.

 

I know that arrays can be extende with arrayname.= and that maybe this

    return $this->GenerateSitemap($container, $type);

is where to start off, but it's to heavy for me. any input highly appreciated.

 

dahui

Share this post


Link to post
Share on other sites

nobody an idea where and how to insert in sitemap.class.php e.g. the following code to make it be appended to either sitemapproducts.xml or sitemapcategories.xml ???

 

        $otherpages = '-<url> ' . "\n" . '
                           -<loc> ' . "\n" . '
                                 http://elflein-kosmetik.de/information.php ' . "\n" . '
                           </loc> ' . "\n" . '
                           <lastmod>2005-09-13</lastmod> ' . "\n" . '
                           <changefreq>weekly</changefreq> ' . "\n" . '
                           <priority>1.0</priority> ' . "\n" . '
                       </url>' . "\n";
       $otherpages .= '-<url> ' . "\n" . '
                           -<loc> ' . "\n" . '
                                 http://elflein-kosmetik.de/impressum.php ' . "\n" . '
                           </loc> ' . "\n" . '
                           <lastmod>2005-09-13</lastmod> ' . "\n" . '
                           <changefreq>weekly</changefreq> ' . "\n" . '
                           <priority>1.0</priority> ' . "\n" . '
                       </url>' . "\n";  

 

dahui

Share this post


Link to post
Share on other sites

I seem to be the only one on this, so either

 

i) I have made the mistake and placed this in wrong board

ii) I am absolutely wrong, not in my codeing ideas but in my intention in general

iii) this is already solved otherwise - if so please point me in right direction

 

I tried working out s.th. that might work

 

I created manually a file named sitemapothers.xml and placed into same dir as

sitemapindex.xml

sitemapcategories.xml

sitemapproducts.xml

in my case root of mydomain.com

 

the file has to be maintained manually for the moment and looks e.g. like this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
<url>
 <loc>/http://mydomain.com/anysite_to_be_indexed_1.php</loc>
 <lastmod>2005-08-07</lastmod>
 <changefreq>weekly</changefreq>
 <priority>0.5</priority>
</url>
<url>
 <loc>/http://mydomain.com/anysite_to_be_indexed_2.php</loc>
 <lastmod>2005-08-07</lastmod>
 <changefreq>weekly</changefreq>
 <priority>0.5</priority>
</url>
<url>
 <loc>/http://mydomain.com/anysite_to_be_indexed_3.php</loc>
 <lastmod>2005-08-07</lastmod>
 <changefreq>weekly</changefreq>
 <priority>0.5</priority>
</url>
</urlset>

 

in sitemap.class.php the sitemapindex.xml (mentioned above) is created dynamically. I added some very basic code ;)

 

sitemapindex.xml is the one that the cronjob will call, and sitemapindex.xml will call

sitemapothers.xml

sitemapcategories.xml

sitemapproducts.xml

to make the content of those files to be indexed.

 

therefor i modified sitemap.class.php (as said very basic - bare with me please, any assistance here is highly appreciated) as follows:

/**
* Function to generate sitemap index file
* @author Bobby Easland
* @version 1.1
* @return boolean
*/
function GenerateSitemapIndex(){
 $content = '<?xml version="1.0" encoding="UTF-8"?>' . "\n";
 $content .= '<sitemapindex xmlns="http://www.google.com/schemas/sitemap/0.84">' . "\n";
 $pattern = defined('GOOGLE_SITEMAP_COMPRESS')
        ?	GOOGLE_SITEMAP_COMPRESS == 'true'
       ?	"{sitemap*.xml.gz}"
    	 :  "{sitemap*.xml}"
     :	"{sitemap*.xml}";
 foreach ( glob($this->savepath . $pattern, GLOB_BRACE) as $filename ) {
    if ( eregi('index', $filename) ) continue;
    $content .= "\t" . '<sitemap>' . "\n";
    $content .= "\t\t" . '<loc>'.$this->base_url . basename($filename).'</loc>' . "\n";
    $content .= "\t\t" . '<lastmod>'.date ("Y-m-d", filemtime($filename)).'</lastmod>' . "\n";
    $content .= "\t" . '</sitemap>' . "\n";
 } # end foreach
 
          $content .= "\t" . '<sitemap>' . "\n";
    $content .= "\t\t" . '<loc>'.$this->base_url.'sitemapothers.xml</loc>' . "\n";
    $content .= "\t\t" . '<lastmod>2005-09-01</lastmod>' . "\n";
    $content .= "\t" . '</sitemap>' . "\n";
 
 $content .= '</sitemapindex>';
 return $this->SaveFile($content, 'index');
} # end function

 

I am still working on the

$content .= "\t\t" . '<lastmod>2005-09-01</lastmod>' . "\n";

 

result is an sitemapindex.xml

BEFORE:

<sitemapindex>

-

<sitemap>

<loc>http://mydomain.com/sitemapcategories.xml</loc>

<lastmod>2005-09-14</lastmod>

</sitemap>

-

<sitemap>

<loc>http://mydomain.com/sitemapproducts.xml</loc>

<lastmod>2005-09-14</lastmod>

</sitemap>

-

<sitemap>

<loc>http://mydomain.com/sitemapothers.xml</loc>

<lastmod>2005-09-01</lastmod>

</sitemap>

</sitemapindex>

AFTER:

<sitemapindex>

-

<sitemap>

<loc>http://mydomain.com/sitemapcategories.xml</loc>

<lastmod>2005-09-14</lastmod>

</sitemap>

-

<sitemap>

<loc>http://mydomain.com/sitemapothers.xml</loc>

<lastmod>2005-09-14</lastmod>

</sitemap>

-

<sitemap>

<loc>http://mydomain.com/sitemapproducts.xml</loc>

<lastmod>2005-09-14</lastmod>

</sitemap>

-

<sitemap>

<loc>http://mydomain.com/sitemapothers.xml</loc>

<lastmod>2005-09-01</lastmod>

</sitemap>

</sitemapindex>

 

so what should happen now? the cronjob will call the sitemapindex.xml and instead of 2 files 3, the sitemapothers.xml as well, should be processed to google.

 

I would really appreciate any senior to have a look at this and tell me if it will work or not, tested it and sofar no errors as I can see, but who am I :P

 

dahui

Share this post


Link to post
Share on other sites
Guest

Basically I think many people will appreciate what you have done but dont have a clue on how to help, certainly if this works i would use the code!!!

 

I will try and test today on my test server.

Share this post


Link to post
Share on other sites

after going back and forth I found out that It will work fine only by doing this:

 

created manually a file named sitemapothers.xml and placed into same dir as

sitemapindex.xml

sitemapcategories.xml

sitemapproducts.xml

 

 

the file has to be maintained manually and should look e.g. like this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
<url>
<loc>/http://mydomain.com/anysite_to_be_indexed_1.php</loc>
<lastmod>2005-08-07</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>/http://mydomain.com/anysite_to_be_indexed_2.php</loc>
<lastmod>2005-08-07</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>/http://mydomain.com/anysite_to_be_indexed_3.php</loc>
<lastmod>2005-08-07</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
</url>
</urlset>

 

for trial now open yr catalog/sitmeapindex.xml in browser, the content should appear by default like this:

 <sitemapindex>
-
<sitemap>
<loc>http://yrdomain.com/sitemapcategories.xml</loc>
<lastmod>2005-09-14</lastmod>
</sitemap>
-
<sitemap>
<loc>http://yrdomain.com/sitemapproducts.xml</loc>
<lastmod>2005-09-14</lastmod>
</sitemap>
-

 

now simply run the catalog/googlesitemap/index.php in yr browser and then have a call again on catalog/sitmeapindex.xml, should now look like this:

 <sitemapindex>
-
<sitemap>
<loc>http://yrdomain.com/sitemapcategories.xml</loc>
<lastmod>2005-09-14</lastmod>
</sitemap>
-
<sitemap>
<loc>http://yrdomain.com/sitemapothers.xml</loc>
<lastmod>2005-09-14</lastmod>
</sitemap>
-
<sitemap>
<loc>http://yrdomain.com/sitemapproducts.xml</loc>
<lastmod>2005-09-14</lastmod>
</sitemap>
-

 

ERGO:

 

all you have to do is to create that sitemapothers.xml, maintain in manually if you have setup a cronjob.

 

if you have process the sitemapindex.xml manually to google, make sure that you have at least once called the catalog/googlesitemap/index.php in yr browser in order for the new sitemapothers.xml to be included to sitemapindex.xml

 

it can be as easy as that.

 

dahui

Share this post


Link to post
Share on other sites

the only prob I have that my cronjob will not work :(

 

mailmessage:

 

Warning: main(includes/configure.php): failed to open stream: No such file or directory in /home/httpd/vhosts/elflein-kosmetik.de/httpdocs/googlesitemap/index.php on line 38

 

Fatal error: main(): Failed opening required 'includes/configure.php' (include_path='.:/usr/share/php') in /home/httpd/vhosts/elflein-kosmetik.de/httpdocs/googlesitemap/index.php on line 38

 

the thingy is that I looke up the board and found 1 similar prob but no solution. :( :( :(

 

i tried everything all kind of permissions paths users owners I caanot sort out what it is

 

any Idea??? is yr cron running? which user? which path? how , why , when .... arghhhhhhhh :blush: would love to have that job runnning

 

dahui

Edited by dahui

Share this post


Link to post
Share on other sites
Guest

no cronjob isnt working for me but thats coz I havent asked for shell access... yet

Share this post


Link to post
Share on other sites

here we go everything up and running!

 

Google and Froogle are fed automatically with cronjobs

 

i) indexing 'other' pages workz brilliant as described above. at the moment the pages have to be maintaned manually but that seems ok for me as these pages will not change to often as well as the lastmod date of the files is a minor issue to me for the moment. got them both on my todolist

 

ii) cronjobs: that took me some time. I have my own VPS with linux and the problem wasn't permission of the files and dirs, it is due to the fact that the cron runs as a user whos' basedir is the root of the VPS and not the root of the virtual httpdocs, ergo:

 

in e.g. /googlesitemap/index.php or /catalog/admin/froogle.php all refernces and includes refering to the Vhost root/httpd do not work. Means all Includes cannot be performed.

 

solution -> change the defines as follows for those files you need to be accessed by a cronjob on a virtual server with plesk:

require_once(DIR_WS_INCLUDES . 'filenames.php');  

 

to:

require_once('/the/absolut/path/on/yr/server/to/inclufes/filenames.php');

 

Hope that my investigation for the whole day will help sme other with their crons on VPS

 

dahui

 

btw any input howto solve the lastmod date or automatically create the 'sitemapothers.xml' is of course very welcome. might in the end result in an addition to contrib ;)

Share this post


Link to post
Share on other sites
Guest
here we go everything up and running!

 

Google and Froogle are fed automatically with cronjobs

 

i) indexing 'other' pages workz brilliant as described above. at the moment the pages have to be maintaned manually but that seems ok for me as these pages will not change to often as well as the lastmod date of the files is a minor issue to me for the moment. got them both on my todolist

 

ii) cronjobs: that took me some time. I have my own VPS with linux and the problem wasn't permission of the files and dirs, it is due to the fact that the cron runs as a user whos' basedir is the root of the VPS and not the root of the virtual httpdocs, ergo:

 

in e.g. /googlesitemap/index.php or /catalog/admin/froogle.php all refernces and includes refering to the Vhost root/httpd do not work. Means all Includes cannot be performed.

 

solution -> change the defines as follows for those files you need to be accessed by a cronjob on a virtual server with plesk:

require_once(DIR_WS_INCLUDES . 'filenames.php'); ?

 

to:

require_once('/the/absolut/path/on/yr/server/to/inclufes/filenames.php');

 

Hope that my investigation for the whole day will help sme other with their crons on VPS

 

dahui

 

btw any input howto solve the lastmod date or automatically create the 'sitemapothers.xml' is of course very welcome. might in the end result in an addition to contrib ;)

 

 

I am using jail shell and keep getting:

 

/usr/local/cpanel/bin/jailshell: line 1: PHP: command not found

Share this post


Link to post
Share on other sites
I am using jail shell and keep getting:

 

/usr/local/cpanel/bin/jailshell: line 1: PHP: command not found

 

according howto setup cron on yr environment, I cannot assist, sorry, please contact yr host

 

any questions concerning osC are wlcm ;)

 

dahui

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×