Jump to content
Sign in to follow this  
puckett_jw

The Google sitemap

Recommended Posts

I have searched but I do not see a thread for the contribution--I am sorry if I missed it. Anyhow, I have installed the Google Sitemap contribution (http://www.oscommerce.com/community/contributions,3226/category,all/search,sitemap), and it runs perfectly. The problem is that I want to use a cron job to run the google_sitemap.php script every night, which I would like to do by exictuing

 

/usr/bin/php /html/root/path/google_sitemap.php | /bin/gzip > /html/root/path/sitemap.gz

This causes application_top to get included, which triggers the session, so I get...

 

Content-type: text/html
X-Powered-By: PHP/4.3.2
Set-Cookie: osCsid=7af8dd2b6e8b13c43522952465b06a34; path=/; domain=www.rangerjoes.com
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache

Added to the top of the file.

 

I have tried the -q switch, but that gives...

 

<br />
<b>Warning</b>:  session_start(): Cannot send session cookie - headers already sent in <b>/html/root/path/includes/functions/sessions.php</b> on line <b>67</b><br />
<br />
<b>Warning</b>:  session_start(): Cannot send session cache limiter - headers already sent in <b>/var/www/html/includes/
functions/sessions.php</b> on line <b>67</b><br />

Does anyone have a suggestion?

Share this post


Link to post
Share on other sites

Hi,

 

this happens if you have set Force Cookie Use = false and used a CGI-Binary. Please use the CLI-Binary (command line interface) or set Force Cookie Use = true.

 

Regards,

Raphael

Share this post


Link to post
Share on other sites

Raphael,

 

Nice coding for the contribution but it would be best to get away from the tep API and roll your own database abstraction layer. Then, skip the usual includes for app top/bottom which will get away from the sessions all together.

 

I've been working on a Google sitemap class for a day or so now (great minds think alike) and will have something out there pretty soon. I plan on integrating the solution to the adminCP and havings options such as sitmap index files (if more than one is needed - i.e. separate products versus categories to get around that 50K limit), automatic GZ compression format file save, and automatic HTTP request submission (webmasters/sitemaps/ping?sitemap=sitemap_url).

 

However, the overall point is that unless you get away from the tep API on this one the CRON generation will be fairly difficult to get around.

 

Bobby

Share this post


Link to post
Share on other sites

Hi,

 

However, the overall point is that unless you get away from the tep API on this one the CRON generation will be fairly difficult to get around.

 

There is no problem with the sessions if you use the CLI SAPI for PHP. This is the only correct way to access php from shell.

If you don't have shell access you could use the script over http.

 

... havings options such as sitmap index files (if more than one is needed - i.e. separate products versus categories to get around that 50K limit), automatic GZ compression format file save...

 

The new version supports an new option auto.

 

Usage:

shell: php google_sitemap.php -za

OR

http://domain.com/google_sitemap.php?auto=true&gzip=true

 

This generates multiple (if necessary [> 50000URLs or > 10MB filesize]) compressed sitemaps in the catalog folder and a corresponding index file.

 

Regards,

Raphael

Share this post


Link to post
Share on other sites

Hi,

 

I've added a new option p to notify Google about your new sitemap.

 

Usage:

shell: php google_sitemap.php -zap

 

http://domain.com/google_sitemap.php?auto=true&gzip=true&ping=true

 

 

All Features:

- supports multilangual categories and products

- supports Search-Engine Safe URLs (osC default)

- could be accessed by http or command line

- writes to file or standard output

- autogenerates multiple sitemaps for sites with over 50.000 URLs

- autogenerates multiple sitemaps if filesize exceeded 10MB

- autogenerates a index file if necessary

- writes files compressed or uncompressed

- auto-notify Google about sitemap or index file

 

see readme.txt for details.

 

Regards,

Raphael

Edited by rvullriede

Share this post


Link to post
Share on other sites

I'm experiencing the following error when it tries to submit to google:

 

Warning: fopen() expects at least 2 parameters, 1 given in /xxxxx/xxxxxxxx/public_html/google_sitemap.php on line 301

 

Any ideas?

Share this post


Link to post
Share on other sites

Hi,

 

thanks for your feedback.

 

replace Line 301

 

    fopen('http://www.google.com/webmasters/sitemaps/ping?sitemap='.urlencode($notify_url);

 

with

 

    fopen('http://www.google.com/webmasters/sitemaps/ping?sitemap='.urlencode($notify_url), 'r');

 

Regards,

Raphael

Share this post


Link to post
Share on other sites

Thanks for this contribution Raphael, clean easy and seems to work well so far.


The way of life, and the way we work for you.

Share this post


Link to post
Share on other sites

WOW! It is great seeing just how active this community is. Anyhow, I was just curious if Google has actually requested anyone's sitemap? I have had mine in place for about 48 hours, but no request in my apache logs.

Share this post


Link to post
Share on other sites

Hi,

 

Anyhow, I was just curious if Google has actually requested anyone's sitemap? I have had mine in place for about 48 hours, but no request in my apache logs.

 

I submitted my sitemap with my Google-Account (https://www.google.com/webmasters/sitemaps/login) and Google downloaded the file within 2 hours. A few hours later the sitemap was parsed and the status has changed from 'pending' to 'o.k.'

 

Regards,

Raphael

Share this post


Link to post
Share on other sites
Hi,

I submitted my sitemap with my Google-Account (https://www.google.com/webmasters/sitemaps/login) and Google downloaded the file within 2 hours. A few hours later the sitemap was parsed and the status has changed from 'pending' to 'o.k.'

 

Regards,

Raphael

 

Same happened to me.

Now Should I expect to see Google deeply crawling ALL the urls in the sitemap within a very short time ?

 

Cheers

Franco


Outside links in signatures are not allowed!

Share this post


Link to post
Share on other sites

I was wondering if anyone has had problems with this contrib and "big" stores.

 

I have *almost* 26,000 products, so my sitemap generates to about 9MB uncompressed.

 

I also have a smaller store, with about 55 products.

 

I submitted the sitemap for the smaller file, and google gave it the "OK", however I've RE-submitted my LARGER store's sitemap multiple times, with different options (gzip, not, etc) and I keep getting an "Invalid Date" error for that sitemap.

 

And of course, google doesn't give a line number or anything, so I can't just go in and fix it if it's one page or something coming up wrong.

 

Any ideas, suggestions, or similar problems?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×