Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Google (Search Engines) revisited


Guest

Recommended Posts

Well .. after reading the HUGE discussion on the old list I made a plan of attack and used a mix of the suggestions.

 

I am using the index.php script seen here:

 

<?

 

// Quick script to 'redirect' spiders/robots to 'search engine friendly' page.

// By Dan Zambonini [[email protected]], Sep 2000. V 1.0

// Please send alterations/comments to [email protected]

// Modified from someone else's script, can't remember which one...Sorry!

// Major search engines match either $spider_footprint or $spider_ip.

 

$spider_footprint = array( "rawler", "pider", "obot", "eek", "canner", "lurp",

"cooter", "rachnoidea", "KIT", "ulliver", "arvest");

 

$spider_ip = array( "216.239.49.", "204.123.", "204.74.103.", "203.108.10.", "195.4.183.", "195.242.46.", "198.3.97.", "204.62.245.", "193.189.227.", "209.1.12.", "204.162.96.", "204.162.98.", "194.121.108.", "128.182.72.", "207.77.91.", "206.79.171.", "207.77.90.", "208.213.76.", "194.124.202.", "193.114.89.", "193.131.74.", "131.84.1.", "208.219.77.", "206.64.113.", "195.186.1.", "195.3.97.", "194.191.121.", "139.175.250.", "209.73.233.", "194.191.121.", "198.49.220.", "204.62.245.", "198.3.99.", "198.2.101.", "204.192.112.", "206.181.238", "208.215.47.", "171.64.75.", "204.162.98.", "204.162.96.", "204.123.9.52", "204.123.2.44", "204.74.103.39", "204.123.9.53", "204.62.245.", "206.64.113.", "204.138.115.", "94.22.130.", "164.195.64.1", "205.181.75.169", "129.170.24.57", "204.162.96.", "204.162.96.", "204.162.98.", "204.162.96.", "207.77.90.", "207.77.91.", "208.200.146.", "204.123.9.20", "204.138.115.", "209.1.32.", "209.1.12.", "192.216.46.49", "192.216.46.31", "192.216.46.30", "203.9.252.2");

 

$agent = getenv('HTTP_USER_AGENT');

$host_ip = getenv('REMOTE_ADDR');

$is_spider = 0;

 

// Is it a spider?

 

$i = 0;

while ($i < (count($spider_footprint)))

{

if (strstr($agent, $spider_footprint[$i]))

{

$is_spider = 1;

break;

}

$i++;

}

 

if (! $is_spider)

{

$i = 0;

while ($i < (count($spider_ip)))

{

if (strstr($host_ip, $spider_ip[$i]))

{

$is_spider = 1;

break;

}

$i++;

}

}

 

// Re-direct to correct page

// Change the files below to your search-engine 'keyword' page and normal index page.

 

if ($is_spider)

{

readfile("catalog/allprods.php");

}

else

{

echo

header("Location: http://www.visionxtreme.net/default.php");

}

?>

-----------------------

 

With the allprods.php seen here:

 

http://www.visionxtreme.net/allprods.php

 

I also have a robots.txt seen here:

 

User-agent: *

Disallow: /cgi-bin

Disallow: /ebay

Disallow: /temp

Disallow: /old

 

Only thing is, I have noticed that the allprods.php has not been hit at all. I am also using the search engine safe URL's provided by OSC.

 

/me wonders what I am doing wrong.

Link to comment
Share on other sites

  • Replies 53
  • Created
  • Last Reply

No, allprods won't get hit, it's getting executed as a readfile() and therefore it doesn't show up as getting hit by the bot. This is a good thing. The bot thinks allprods is the homepage. 8)

 

If you were to redirect the bot it would know it has been redirected and that would be a "bad thing" in terms of getting your pages indexed.

Link to comment
Share on other sites

So in my logs where it says:

 

crawler10.googlebot.com - - [10/Sep/2002:12:52:10 -0500] "GET /robots.txt HTTP/1.0" 200 105 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

crawler10.googlebot.com - - [10/Sep/2002:12:52:10 -0500] "GET / HTTP/1.0" 302 0 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

crawler10.googlebot.com - - [10/Sep/2002:12:52:11 -0500] "GET /default.php HTTP/1.0" 200 19636 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

 

he is really going to all prods?

Link to comment
Share on other sites

Jason (or anyone)

 

I an just beginning to learn about how search engines operate, and how to get listed in them.

 

I have implemented nearly all of the same things you have, and I have a couple of quick questions.

 

First, I'm not familiar with the robots.txt. Where can I find information on how to implement this?

 

Also, what did you look at to see how many times a page has been hit.

 

Finally, what logs did you look at where you found the googlebot?

 

I appreciate you help.

-------------------------------------------------------------------------------------------------------------------------

NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit.

If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help.

Link to comment
Share on other sites

wizardsandwars, I would do what Tim says and read that post.

 

I run my own server and have Webalizer running, so that's where I look at the logs.

 

Time,

 

I am told that I am not "redirecting" because the scripts runs as a readfile(). Should still be worried? And if so, what should I change.

 

Thanks.

Link to comment
Share on other sites

Gee, Tim,

 

Some would say that script is basically Cloaking. If Google catches you they will ban your site from the index.

 

Google doesn't really care if you are trying to HELP the end-user and the index engine rather than trying to stuff it. In fact, they even like it when people come up with a better way to help them index content.

 

It looks like I'm going to have to get a formal quote from Google to clarify this once and for all, but while we await the response to my e-mail to them, please explain why Google would ban someone who is helping them index a database driven site more effectivly. :bomb:

Link to comment
Share on other sites

Jason,

 

I've also been told that using this technique was ok, and was not considered re-directing. In fact, I was told that Google prefer's this method.

 

As for the 'robot.txt', that link did not provide any further information.

 

What is it? Where do I put it? How does it work? I searched the forumns, and I see a few references to it, but none that are particularly helpful

-------------------------------------------------------------------------------------------------------------------------

NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit.

If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help.

Link to comment
Share on other sites

It looks like I'm going to have to get a formal quote from Google to clarify this once and for all, but while we await the response to my e-mail to them, please explain why Google would ban someone who is helping them index a database driven site more effectivly

 

That is a formal quote. It's from GoogleGuy who is one of the top engineers at Google ;)

 

Read it in full here -> http://www.webmasterworld.com/forum3/5288.htm

Link to comment
Share on other sites

Chris,

 

The robots.txt file tells the robot where it CANNOT go. For instance, with an OSC installation I would block the admin directory.

 

Put it in the root of the website you are having indexed.

Link to comment
Share on other sites

Tim, I went to the URI and while it says that Google will handle dynamic URIs, it fials to support what you stated it said, that they will ban your site if you dump the database to them rather than present them the front page of the site.

 

The e-mail is off to Google. I will post bothe the e-mail and the response as soon as I hear from them.

Link to comment
Share on other sites

Jason,

 

I see. Great, thanks!

 

I discoveredthat my web host also runs webalizer, and WOW! I had no idea that I had so many bots crawling my site. When I first submitted, thoug, I had a redirect in an index.html. I think that google did not index me because of that.

 

Hopefully, when they come back around, they'll like my site a little better.

 

Anyone know when their spiders go out? I heard that it was a standard time each month.

-------------------------------------------------------------------------------------------------------------------------

NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit.

If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help.

Link to comment
Share on other sites

I've also employed the allprod.php contribution but only as a link that a typical user would not find on my site(not hidden just not glaringly evident). I've had google crawl through every category and product on my site with just this simple implementation and without using search engine safe urls.

 

I've noticed that google likes to crawl my site every sunday, although I'm not quite sure about when the results are actually refreshed on the google search engine. I also used two other contributions to change my title tag and keyword,description tags which should prove very helpful. This way I am able to have my keyword density very high on my title tag ; for example:

 

Instead of

 

hats, mittens, south carolina, winter clothes, ski jackets, etc.

(As the Title Tag)

 

I would use:

 

Winter Clothes Hats

 

In other words the greater the density(meaning the lower the

number of words appearing with those sought for) the better your chances of being ranked high are.

 

I hope this helps and I am anxiously awaiting the google update.

 

H

Link to comment
Share on other sites

jasonsfa98:

 

Are you aware that on your allprods page some links are "search engine friendly" and some are not?

 

case in point..

Xtreme AMD Machine: http://www.visionxtreme.net/product_info.p.../products_id/33

 

RTCW Server:

http://www.visionxtreme.net/product_info.p...?products_id=46

 

Just FYI.

Link to comment
Share on other sites

Yo, Zilla,

 

What does 'search engine friendly' mean and how can I fix it.

 

Looking at the difference between the urls, I can see that one of them is a static URL and one has a query string. But I'm not sure why or how to fix that on MY site.

 

www.wizardsandwars.com

-------------------------------------------------------------------------------------------------------------------------

NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit.

If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help.

Link to comment
Share on other sites

Yo, Zilla,

 

What does 'search engine friendly' mean and how can I fix it.

 

Looking at the difference between the urls, I can see that one of them is a static URL and one has a query string. But I'm not sure why or how to fix that on MY site.

 

www.wizardsandwars.com

There is a setting in the Admin tool.

Configuration -> My store -> Set "Use Search-engine safe URLs" to true

 

It looks like you have already implemented this on your site though.

 

Ryan

Link to comment
Share on other sites

ryatkins,

 

Yeah, I've got that enabled, but I have links on the homepage (under the two featured items) that aren't 'search engine friendly', whatever that means.

-------------------------------------------------------------------------------------------------------------------------

NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit.

If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help.

Link to comment
Share on other sites

I am seeing a few items that are NOT search Engine Safe also. But I have noticed that they are on contrib enabled pages.

 

allprods.php

featured products admin contrib

 

Something in these contributions are not allowing the search engine safe system to work.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...