Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Summary of Contribs related to ALL the Google threads


Mark Russell

Recommended Posts

I've read all the threads - more than once - including the 14 pager. Here is what I gathered in terms of the contribs and settings. I'm not too confident in having all this correct. Though, I hope this helps those that are also still confused by getting right down to the details.

 

- spider catcher: this is the code that goes at the root (e.g. index.html) and either serves up allprods via readme or redirects to the default page. Purpose: to detect bot/spider and feed product links via allprods. Or redirect 'real user' to default.

 

- Ian's Kill SID code: this is the code to detect a bot/spider once in the site to kill sids. Purpose: to prevent bots from getting trapped in site and to prevent getting product links listed in google with sids appended? QUESTION: I saw code in Ian's thread that looked just like the spider catcher code (one with bots in footprint array and all the ip addresses) that goes into html_output. How is this code different than spider catcher that goes into an index.html?

 

- all_prods: code to facilitate product link submittals to the engines and to allow customers to view all products on one page. Link to all_prods should be visible and in the header, main page, or footer. Not in the left info box that does not get crawled.

 

- meta-tags: contrib that allows meta-tags to be generated specific to pages/categories. Google might not use this, but it is good to have for other bots/spiders.

 

- search engine safe urls - turn them off. Having them on prevents a user who has cookies disabled from buying AND either setting doesn't affect the SEs.

 

Is that it? Can anyone shed additional light on the above?

 

Much thanks for eveyone's ongoing efforts...

 

Mark

Link to comment
Share on other sites

Mark,

 

That pretty much nails it down, with just a couple of deviations. Here's my comments on your summary.

 

 

 

- spider catcher: this is the code that goes at the root (e.g. index.html) and either serves up allprods via readme or redirects to the default page. Purpose: to detect bot/spider and feed product links via allprods. Or redirect 'real user' to default.  

 

I wouldn't use this at all. there seems t obe some dissent about whether or not Google frowns upon this practice. Ultimatly, they have said that we should not direct a bot someplace other than where the general public goes.

 

Ian's Kill SID code: this is the code to detect a bot/spider once in the site to kill sids. Purpose: to prevent bots from getting trapped in site and to prevent getting product links listed in google with sids appended? QUESTION: I saw code in Ian's thread that looked just like the spider catcher code (one with bots in footprint array and all the ip addresses) that goes into html_output. How is this code different than spider catcher that goes into an index.html?  

 

Almost, but not quite right. As I'm sure you have read, googls has some trouble with the SID. I suggested that we use the aforementioned spider catcher to determine if we assign an SID in the thml_output.php. To make a long story short, it turned out that this was a good idea. As long as the spider was in the IP address array, Google didn't get stuck indexing our site. However, it was brought to my attention by Burt (Thanks burt) that the USER_AGENT array in this hack does not work properly, which probably something in the substr function. There has been a suggestion for a fix, but I just havn't had the time, yet. Until it is fixed, you'll have to make sure to maintain your list of spider IP addresses.

 

Ian knew that this hack has some high maintenence involved, and devised a way to keep OSC from assigning a SID until either the user tried to log in, or add something to the cart, which Google would never try to do. Ians code is designed to work in place of the spider catcher. It still has some bugs, but I would contact him directly for more information on this. His way is definatly better.

 

all_prods: code to facilitate product link submittals to the engines and to allow customers to view all products on one page. Link to all_prods should be visible and in the header, main page, or footer. Not in the left info box that does not get crawled.  

 

Exactly

 

meta-tags: contrib that allows meta-tags to be generated specific to pages/categories. Google might not use this, but it is good to have for other bots/spiders.  

 

- search engine safe urls - turn them off. Having them on prevents a user who has cookies disabled from buying AND either setting doesn't affect the SEs.  

 

You hit the nail on the head. In addtion, I'd like to add that I moved the catalog up a level to the web root directory. I received an email from google that said that sites with re-directs would not be indexed by Google. They consider a redirect anything that can not get back to google with one click of the 'back' button on the browser. I originally had an index.html that would redirect to the catalog default.php, and you have to click 'back' twice using that method. So, I moved my whole catalog up a level. I'm not sure what HPDL et al think of this, but this is what I did, and I believe I am efficiently indexed on google.

-------------------------------------------------------------------------------------------------------------------------

NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit.

If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help.

Link to comment
Share on other sites

I believe the session-id killer is now unnecessary. It has been addressed in a November 18 commit.

 

Directly from CVS:

Quote from http://www.searchengineworld.com/spiders/g...faq.htm#item355

 

"Does Google index dynamic content?

A: It will in certain instances. What criteria is used to determine if a dynamic page is indexable is debatable. Most have found that

clean, high ranking (High page rank) sites can get dynamic content indexed"

 

This makes me think that Googlebot does indeed index dynamic pages, but not everybodies.

 

Another Quote:

 

"Sites that use session tracking urls to give each visitor a dynamic url. These sites can generate an infinite amount of pages for a

spider to visit. These types of pages, are usually blocked from being indexed by Google."

 

With this modification, the session id is back to the parameter area (even with Search Engine Friendly URL's turned on) so that no

spiders get trapped in our infinite number of pages and we are not banner from Google.

 

Any extra info on the subject is welcome.

 

I also have my entire storefront at root level and have noticed a mind blowing difference. Search engines frown upon redirects from domain level no matter how this is accomplished.

 

Search engine safe urls are going the way of Betamax, I've also read that Google prefers to serve dynamic pages pertaining to the most accurate content which brings me to the meta tag controller.

 

The meta tag controller is probably the second most important ingredient next to the session_id fix. If it could be improved to a category level as well, the possibilities are endless.

 

Some tips on meta tags:

 

Title: I prefer to use capitalize my words, use no punctuation and to keep it within 6-10 words keeping the keyword density high.

 

i.e.(Winter Clothing Jackets Coats Outerwear)

 

Description: Use capital letters again with no punctuation. I follow the same pattern as the title tag except use between 15-22 words.

 

i.e.(Clothing Coats Ski Jacket Down Winter Leather Down Outerwear[and so on])

 

Keywords: all lowercase with commas seperating keywords. 25-35 keyword phrases. (Probably the least important tag now used)

 

i.e.(winter jacket, jacket, ski jacket, ski clothing, down winter jacket, winter coat, coat, [and so on])

 

I've found following this system and making subtle changes on page content with a high ranking priority get you listed just about anywhere you want.

 

I hope this might help some people and I'd appreciate anyone else's ideas suggestions. (Seems like sharing information is the only way to really get a grasp on it)

 

Henry

Link to comment
Share on other sites

  • 1 year later...

The more I read about this, the more confused I get. I want to make sure that I am making my site to be as search-engine-friendly as possible.

 

I am using the Dynamic Product Meta Tags contribution. Is there any advantage between this and the Header Tags Controller contribution?

 

I have the following set:

Force Cookie Use True

Check SSL Session ID False

Check User Agent False

Check IP Address False

Prevent Spider Sessions True

Recreate Session False

 

Use Search-Engine Safe URLs True

 

Could someone look at my site and advise me on if I am doing anything wrong in the BIG search engine's views.

 

http://aquatin.com

Edited by mrsym2
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...