Jump to content

Recommended Posts

What is everyone's experience with Google Webmaster Tools? Especially in your results in Getting the Google Indexing bots to fully crawl and index your site.

 

For example:

 

Tools tells me that Google Index Status shows 4100 total indexed and 0 blocked by robots.txt

Crawl Sitemap tells me that I have submitted 2160 URLs and only 150 indexed. No issues, it says.

 

I'm guessing that the 4100 includes images, and only 150 of the URLs are indexed due to Structured Data.

Structured Data shows about 3000 products and on 800 pages. No errors.

 

I build the sitemap.xml and submit it as my products change, about twice a week. Once on Saturday and again on Wednesday, depending on how many product changes/additions are provided by my supplier.

 

 

This particular store has been active now about a month. Is what I am seeing about normal, or should I see better results? Am I lucky to get this much so soon? Or, am I reading the results incorrectly?

 

 

What's your experience?

 

 

Mark

 

 

 

 


Community Bootstrap Edition, Edge

 

Avoid the most asked question. See How to Secure My Site and How do I...?

Share this post


Link to post
Share on other sites

When google visits your site, they search through it recording the links as they go. As long as a link is working and they don't have some objection to the page, they will list it. So a link to product A on the home page will count as one link while a link to product A on the category listing page will count as another. But when the sitemap is created, it just looks at the products and creates one link to it. So such a discrepancy is common.

 

Also, the links in the sitemap files are not indexed until google actually visits them. So you may submit a thousand links in a sitemap but if google only visits one of them, only one will be indexed. They will work on getting through the links. The more times they visit a site the faster the links will be indexed but there's no way to control that. The speed that they work through the maps files does increase as the site becomes more established. But the counts for indexed and sitemap links will never match. That is normal.

 

Structured data doesn't have any affect on what gets indexed.

Share this post


Link to post
Share on other sites

What is everyone's experience with Google Webmaster Tools? Especially in your results in Getting the Google Indexing bots to fully crawl and index your site.

 

For example:

 

Tools tells me that Google Index Status shows 4100 total indexed and 0 blocked by robots.txt

Crawl Sitemap tells me that I have submitted 2160 URLs and only 150 indexed. No issues, it says.

 

I'm guessing that the 4100 includes images, and only 150 of the URLs are indexed due to Structured Data.

Structured Data shows about 3000 products and on 800 pages. No errors.

 

I build the sitemap.xml and submit it as my products change, about twice a week. Once on Saturday and again on Wednesday, depending on how many product changes/additions are provided by my supplier.

 

 

This particular store has been active now about a month. Is what I am seeing about normal, or should I see better results? Am I lucky to get this much so soon? Or, am I reading the results incorrectly?

 

 

What's your experience?

 

 

Mark

 

It might be normal for a new store that has not been configured OR incorrectly configured. You have 150 out of your 2160 submitted URLs actually in the index. And I am willing to go further and say that those 150 URLs are not performing at their optimum.

 

A 2000 URL store is a small to medium size shop, and in my experience if i were to start that size shop and did not see at least 90% of my submitted URLs indexed, I would be thinking there's something wrong with my setup.

 

In short what I am saying is this shop is not optimized. 

Share this post


Link to post
Share on other sites

But how do they compare to the count when you search google for them using the site: command?

Share this post


Link to post
Share on other sites

But how do they compare to the count when you search google for them using the site: command?

 

Roughly double the number in both cases. So approx 500 and 100 URLs when doing a site: search.

 

In one of my previous posts on another thread I said that as a loose guide the minimum number of URLs to expect is the total of the homepage, categories and products (plus manufacturers if you use them).

 

Now, firstly you want to make sure that all your submitted URLs are in the index and part of that number returned by the site operator search, which in the OPs case it's not. He only has 150 / 2160.

 

And secondly you need to bear in mind that certain other URLs ought also to be included in the site search number. These URLs include paginated links, which contrary to what some people might believe, are not duplicate pages (content) of the first page in the series. Product review pages also get indexed, but you need to be careful with these because if there is no actual review(s) they generate soft 404 errors in webmaster tools.

 

So @@Jack_mcs I think a point you're making is that the site: search number will always be greater than the submitted number of URLs, which is correct, but if it's way off then that's an indication that something is wrong. Every site is different and the only way to work things out is for the webmaster to do his own analysis of what's included in the index. 

Edited by Hotclutch

Share this post


Link to post
Share on other sites

@@Hotclutch I guess it depends on what you mean by way off. As you mentioned, links to page listings and reviews, and others, will appear as indexed links but only one of those will appear in a sitemap file. So on a large site a product listing page can have a hundred links listed for it due to the number of pages.

 

Also, how many links get indexed when a sitemap is updated depends on a number of factors. But probably the two most important ones are how old the site is and how well it ranks. Google doesn't send bots to all sites on the same schedule. I have one hosting member whose site is so busy that there are at least 10 search bots from google on it everyday. I've seen other sites that only get visited by one bot a few times a week. For the first site I mentioned, if a new sitemap is uploaded today, it will be fully processed in a few days. One of the slower sites mentioned may wait a month before the are all looked at. 

 

The op said the site has only been active for a month and only a small number of the links indexed. That follows with what I describe above. It is just how google, and Bing to some extent, work.

Share this post


Link to post
Share on other sites

The op said the site has only been active for a month and only a small number of the links indexed. That follows with what I describe above. It is just how google, and Bing to some extent, work.

 

That might be normal for a site where no optimization has been done, but as an experienced webmaster I could never accept that kind of performance. From experience I could now build a 2000 URL site on a brand new domain, and have all submitted URLs shown as indexed within one week of publishing the site. That's not to say that you will absolutely rank number one for every keyword phrase, that depends on the competitiveness of your market and the quality of your content.

Share this post


Link to post
Share on other sites

I'd have to see that to believe it, unless, of course, you are talking about paying google with adwords, but that is not what we are talking about. What you are saying, in effect, is that you can control what google does. I've never heard of anyone being able to do that but maybe you are the first.

 

But even if you could somehow manage that, the links would not be ranked in any way, other than being at the end of the list, unless it was for some very unique and useless keyword. If that was possible, it would wreak havoc with googles ranking. All you would have to do is create a new site and have it rank above a site that has been active for 10 years. It just can't happen. Googles unwritten policy is to wait at least 6 months before seriously ranking a site. They do that to get by the ones that just want a quick site. See this post by ahref for some interesting results.

Share this post


Link to post
Share on other sites

That article does not contradict what I am saying at all.

 

"...Actually, I shouldn’t be framing these pages as “lucky,” because the reason they got to the Top10 in less than a year is most likely hard work and great knowledge of SEO, not luck..."

 

In any case readers can believe what they want.

Share this post


Link to post
Share on other sites

How do shopowners and those with little knowledge of SEO have a hope in heck when even experts cannot agree on the same way forward.

 

I am so glad that I don't have to perform any sort of SEO service.  It's all snake-oil.


This is a signature that appears on all my posts.  
IF YOU MAKE A POST REQUESTING HELP...please state the exact version
of osCommerce that you are using. THANKS

 
Get the latest Responsive osCommerce CE (community edition) here

Share this post


Link to post
Share on other sites

How do shopowners and those with little knowledge of SEO have a hope in heck when even experts cannot agree on the same way forward.

 

I am so glad that I don't have to perform any sort of SEO service.  It's all snake-oil.

 

Funny what this forum has degraded into. Post of least value gets liked on this forum.

Share this post


Link to post
Share on other sites

Speaking of getting pages indexed, what is the best way to use an XML Sitemap? That is, what pages should be listed in it? My understanding, first of all, is that a sitemap is a suggestion to a Search Engine as to what pages I consider to be important and would like to see indexed. A Search Engine is free to ignore this suggestion and index whatever pages it pleases, but hopefully there will be a large overlap between your list (Sitemap) and what gets indexed.

So, what pages should go into the Sitemap? For a store, I would think that the individual product pages should be first in line, as they have the most detail and keywords on a given product. Similarly, for a forum or blog, the topics/threads with the actual content should be indexed. What comes next (above that)? Is there any point to showing, say, a product category page (mostly pictures of products, a little text, and links to the full product pages)? Possibly, someone could give a long list of items, wanting a page where all appear? Similarly, for a forum or blog, the boards/forums (terminology varies) which often are just a listing of topic titles and some information on the latest updates. And so on, up the chain/tree to more and more generic descriptions (higher level categories for a store, lists of boards and categories for a forum)... at some point, it would seem that they would be useless or even harmful clutter in the eyes of a Search Engine. The entry points to your site, store, and forum (home pages) would be good to index, as they often have some sort of "welcome" text.

In a store, product reviews (and any other discussion) would probably be a good thing to index. Is there anything else? Listings of the manufacturers you carry? I would assume that at least some people search for stores by whose products they carry, rather than generic terms... I'm looking for Nikes, not "sneakers". That information would be on the product pages, so is there any point in indexing a list of manufacturers? Sales, new products, or specials (unless that information is already available on a product page)? How useful is indexing very temporary information such as a sale or new products? How about terms and conditions, privacy policies, etc. -- are they useful as indexed pages? Is it useful to get more internal links to a product or topic page, or does a Search Engine care, once you've narrowed down your page list for them? The leaf nodes/pages (final text) should surely go into the Sitemap, but how many of the branch nodes (higher up) should go in? Finally, "utility" stuff (registration, login, contact, help, search, etc.) can usually be omitted -- they're not going to be that useful for someone searching for a site. I would think that what you would want indexed are pages of useful information that would drive visitors to your site.

Are there any external sitemap generators that can make use of a markup on page links to "put this in a sitemap", versus "don't bother"? And you want to be sure to pick up the canonical URL or SEF form of a page address, rather than creating a generic URL from the store or forum database (likely not SEF without a lot of effort). Using an external generator (similar to a spider or scraper) to pick up marked links and generate an XML file might be an improvement over trying to build a generator within a store or forum (desired URL comes for free).

Share this post


Link to post
Share on other sites

The sitemap, or sitemaps, provide a quick way for the search engines to get around the site. They will only spend so much time on a site so providing direct links will speed up getting the pages listed and evaluated for SEO. The search engines will follow the links in the map file and evaluate the page. They will list it providing there isn't anything to cause them not to do so, like illegal content or a no-index tag.

As for which pages to add, any with content that will help the site should be listed. Generally, this means all popups should be excluded. And any page that requires a login, like the checkout, login and create account pages, should not be listed. And finally, maybe, exclude any that display results, like advanced_search_result.php. This last one is a judgment call. Some such pages may help.

If you haven't looked in your google webmaster tools lately, or haven't noticed it, you will probably see a message titled "Index coverage issue detected". There is usually, in my experience, at least one page linking problem that ends up being a page like advanced_search_results. So you may want to take a look there to get some hints about what pages not to list.

If you (anyone) isn't using the google sitemap addon, that would be the best way to go, providing it works with the url rewriter in your shop, if there is one. There are some sites that will create a map file for you but there are some problems with that. For example, not all of them remove the session ID's. Nor do they work with some url rewriters.  But if you can find one that adds all of the links you want to have listed, the final problem is that you have to do the same thing over and over since the map files need to be kept up-to-date for best results.  So while external sites can get you by for the short-run, they are generally not a good solution for dynamic sites.

Share this post


Link to post
Share on other sites

Well, I'm thinking that a Search Engine probably has a time limit to spend on indexing a site, so encouraging it to index (first) the most important pages for getting visitors to my site would probably be valuable. Would that be the "priority" setting for each page, or does that mean something else? Does it process pages in the order listed in the sitemap, or is it free to pick whatever pages it wants, in any order?

When I say "external" XML sitemap creator, I'm talking about something that would run on my site, kicked off at least weekly by cron. The problem with something similar to a spider or scraper is that it knows nothing about which links are of value to me, and which can be ignored. It would go off and explore everything like a small child or a puppy. That's why I'm wondering about the feasibility of modifying the osC code (and other site code) to flag links of the type that I want in the sitemap: product info pages, reviews, store home page, and what else? Rather than including everything under the sun, I would keep it down to the most useful pages, so that there's a better chance the indexer gets to them all. Removing session IDs, etc. should be trivial.

As for a built-in generator (add-on), it sounds good if it handles SEF links properly (the URL rewriter). Many don't, and generate generic links (non-SEF) directly from the database. An external generator would at least see the desired link URLs. I'm assuming that this is something that can be kicked off by a cron job, and writes to a specific XML file (e.g., my shop's would be /sitemap_shop.xml). Many seem to just generate an XML file at the browser, which means extra manual steps, unless that's something that Search Engines can be taught to directly make use of (i.e., I don't have to store the file on my site). There's also the issue of resource consumption if the XML sitemap is going to be constantly regenerated on the fly for every visiting spider, versus doing it once or twice every week.

I seem to recall seeing a post pleading for an all-in-one setup to do SEF URLs and sitemap generation using those URLs.

Share this post


Link to post
Share on other sites

The search engines have two kinds of scans: quick and deep. The quick scan has a short amount of time to go through a site to gather links and check a few things. They know where they end up so to they don't do the same thing on the next visit. The deep scan lasts much longer and will cover most, if not all of the pages. Obviously, the results will vary depending on the site, like a one page html site and a site like Amazon. If you look at the raw access log for the site or View Counter (if installed) and maybe Who's Online, you will see multiple bots from the various search engines at one time. I've seen as many as 6 google bots, using different IP's, accessing sites at the same time. Same with Bing.

The priority in the map is meant to tell the search engines the relative importance of a link. It isn't worth bothering with, in my opinion since the value can only be between 0 and 1, inclusive. You can read more about it here.

The google sitemap addon will do everything you need to do so there's no reason to spend time writing something new. The exception is that it only works with the original url's and url's rewritten by Ultimate SEO.  The script creates sitemap files that the search engines know to look for. And even on large sites, the files are created quickly so there's no load problem.

Share this post


Link to post
Share on other sites

Thanks for the info. I'll take a look at this, but it sounds like it will force my hand on which "SEO" package to use on a new shop. I have to write something anyway (in the way of an XML sitemap) for the custom-written part of my site, as well as using something for the forum and for the shop. The intent is to create sitemap_general.xml, sitemap_forum.xml, and sitemap_shop.xml, all under sitemap_index.xml. This keeps the major subsystems separate, and postpones hitting the dreaded 50k entry limit (necessitating two or more sitemaps).

I wonder, in each case, if it would be worthwhile to update a database table of desired links each time a page is visited, and once or twice a week (or even daily) generate a new XML sitemap from that table? That would somewhat slow the display of a page, but generating the sitemap(s) would be fast. Has this been tried before?

Share this post


Link to post
Share on other sites

If you want to use a url rewriter and the sitemap addon, you are correct in that there is only one choice. It is on my list of things to do to change the code so the sitemap will work with SEO 5 but there just isn't enough time. Regarding the size limit, the sitemap addon creates additional sitemaps as needed.

I'm not aware of a sitemap addon using tracked pages. But I don't think it would be worth the effort. The search engines will get to all of the pages eventually. You can't force them to look at something else by changing the order or priority in the sitemap file. If your goal is to improve the ranking of particular pages, the time would be better spent on optimizing those pages, in my opinion.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×