Jump to content
Guest

[Contribution] Ultimate SEO URLs - by Chemo

Recommended Posts

Guest
This is a tricky one, because of the way the searchbots work... one way you could do it would be to go into includes/application_top.php (approx. line 200) and find where the spider check is done and if passed, the session is started... at that point i would register $spider_flag... then, anywhere you don't want spiders to index options, such as sort options, check if $spider_flag is registered, and if it's not (most likely a spider) then don't print the href link for those options in the column headers, otherwise print them... that way a user can still sort, and a spider can still see the same page (default-sorted), but the clickable links to sort by various columns won't work for the spider to follow, and it can't index the page w/ those extra options...

 

i don't think you can do it w/ .htaccess, even if you got a mod_rewrite snippet that would convert your arguments into directory appearance (like index.php/sort/2/page/1), because i believe .htaccess operates after mod_rewrite translates the url, so it would still be operating on the actual file (index.php or whatever) and wouldn't be able to deny access for spiders to say, index.php/*

 

Hope that helps...

 

Richard Lindsey.

Since Jon brought up the question I have been thinking about the answer. The only one I gave him is that it is not possible. I came to the same conclusion as you did after more thought...register the spider flag and remove the links from product listing based on true/false. However, this still does not solve his problem of the buy_now parameter. This will take a little more work but will most easily be solved by using POST params instead of GET.

 

Bobby

Share this post


Link to post
Share on other sites
Since Jon brought up the question I have been thinking about the answer.  The only one I gave him is that it is not possible.  I came to the same conclusion as you did after more thought...register the spider flag and remove the links from product listing based on true/false.  However, this still does not solve his problem of the buy_now parameter.  This will take a little more work but will most easily be solved by using POST params instead of GET.

 

Bobby

 

Heya Chemo, any thoughts as to my problem with the apostrophes and quotation marks not stripping properly in my urls? It's still gnawing at me :D

 

Richard Lindsey.


Richard Lindsey

Share this post


Link to post
Share on other sites

Bobby,

 

Thanks for the reply. Ive temporarily disabled my product name/price sorts to avoid the linking. As far as the manuf. sorts, i dont believe the spiders will be able to follow the pull-down links, but i could easily be wrong.

With regards to the buy now buttons, couldn't these be changed to forms..? I thought there was a mod for this listed along with the prevent spider sessions mod, which isnt really necessary, but the forms one is..

Share this post


Link to post
Share on other sites

Use the link above,

it works great to change the buy now buttons to forms, to avoid spiders getting this url.

 

Now, if someone could code the same thing for the Product Name and Price sort links in the column header... <_<

Share this post


Link to post
Share on other sites

sorry for all the posts,

i could not get the forms to work on the reviews pages, do i just removed the add to cart buttons here altogether, as they are not really needed anyway.

 

So all that needs to be done is fix this, and then we must come up with a way to convert the GETS to POSTS and fix the sorting by name/price category links...

Share this post


Link to post
Share on other sites

When using with the oscommerce PayPal IPN v1.0

i get the emails titled "PayPal IPN Invalid Process", with the content

 

$_POST:

 

 

$_GET:

 

emailed to me. However, it seems the paypal ipn is working fine. I dont know why changing the pots to gets in the product lising is causing this error.

Any suggestions?

Share this post


Link to post
Share on other sites

disregard my above post - all works great, except for what is mentioned 1 post above the last.

 

This message and the one directly before it may be removed by Admin, once again, sorry for all the posts!

Share this post


Link to post
Share on other sites
Guest

OK I had to force cookie use but when I do that I cannot login to an account at all.

 

It tells me that my browser does not accept cookies even though I set it to allow all cookies nad reset the options in the internet browser to low

 

Now Ijust need to figure out why it is doing this. It is choking out my sales

Share this post


Link to post
Share on other sites
Guest

Richard Lindsay I had tried your code and it did not work. I have not reset my cache though would I need to do that to see it work effectively?

Share this post


Link to post
Share on other sites
Guest

That redirect code is handy :)

 

Bobby

Share this post


Link to post
Share on other sites

Susan,

 

I also have the redirect add-on working. Works like a charm. I was just curious of there was an easy way to incorporate the .html urls instead (one less step).

 

Also, looks like you have the osCsid at the end of yout html link. There is a way to remove that without having to force cookie use. It's a simple change in the configure.php file.

 

Actually, with the mods in .htaccess, the URLs will automatically convert when clicked. See for example Froogle Search.

 

Susan

Share this post


Link to post
Share on other sites
Also, looks like you have the osCsid at the end of yout html link.  There is a way to remove that without having to force cookie use.  It's a simple change in the configure.php file.

According to Chemo, this isn't a problem... oscid is okay for regular customers, just spiders shouldn't see it. You have a different take?

 

Susan

Share this post


Link to post
Share on other sites
Also, looks like you have the osCsid at the end of yout html link.  There is a way to remove that without having to force cookie use.  It's a simple change in the configure.php file.

 

Would you mind posting that simple change? We've been trying to get headyntl's SIDs out of his urls for the past few days, and it's still not working unless the site sets a cookie...

 

Richard Lindsey.


Richard Lindsey

Share this post


Link to post
Share on other sites

Richard,

 

I'm curious :huh:... what SEO contribution are you using on your test site? Is it a modified version of Ultimate SEO?

 

It's very nice, and the Site Map actually displays beyond one sub-category level. :thumbsup:

 

Thanks,

EricK

Share this post


Link to post
Share on other sites

I think Chemo posted this in a thread somewhere. Once I made the changes, the oscsid was gone. The only time I get the oscid is the first time I choose a selection in the manufacturers' drop down menu. After the first selection the ID is gone.

 

In catalog/includes/configure.php:

 

define('HTTP_COOKIE_DOMAIN', '.domain.com');

define('HTTPS_COOKIE_DOMAIN', '.domain.com');

define('HTTP_COOKIE_PATH', '/');

define('HTTPS_COOKIE_PATH', '/');

 

I have a SSL certificate so it may be different if you don't..

 

Hope this works!

 

 

Would you mind posting that simple change? We've been trying to get headyntl's SIDs out of his urls for the past few days, and it's still not working unless the site sets a cookie...

 

Richard Lindsey.

Edited by Zima

Share this post


Link to post
Share on other sites
Guest
you don't need the dots before your domain

eg .blahblah.com  should be blahblah.com

You can store sessions in the database by setting the STORE_SESSIONS to "mysql".

If blank, you will be using file-based sessions. Do you know where these are being stored? They should be stored in a directory below your web root for security

 

This setting is controlled by the session.save_path entry in php.ini, or the appropriate .htaccess file.

 

Also, shouldn't the cookie path entries be your store directory?

The 2 dots are not absolutely necessary but it defines the cookie SCOPE. Basically, the scope defines which parts of the site that cookie is valid for. The 2 dots indicate that the cookie is valid on ANY part of the store. It covers every area and is my preferred method for ensuring proper settings. It will not hurt anything and eliminates a lot of setup headaches.

 

Summary: use the 2 dots for the best performance and least number of problems.

 

Susan,

 

I also have the redirect add-on working.  Works like a charm.  I was just curious of there was an easy way to incorporate the .html urls instead (one less step).

 

Also, looks like you have the osCsid at the end of yout html link.  There is a way to remove that without having to force cookie use.  It's a simple change in the configure.php file.

That long string of letters on the Froogle URLs are not osCsid's but rather the internal Froogle tracking number. If you copy the URL and then paste it somewhere (so you can see what it actually is) the URL ends with products_id=XX and the rest of the string is the internal Froogle identifier.

 

Hence, Susan's osCsid's are NOT being cached by Froogle.

 

According to Chemo, this isn't a problem... oscid is okay for regular customers, just spiders shouldn't see it. You have a different take?

 

Susan

Regular customers should get an osCsid on the first click but should go away after that IF they have cookies enabled. However, if the customer does not have cookies enabled it will stay on every click and link generated by the cart. This is the proper function of the cookie/session system.

 

For development and/or testing I have several browsers on my computer with different settings. For example, I use Mozilla for regular browsing. I then have Internet Explorer setup to accept NO COOKIES. I then test a site with both browsers to ensure that it is properly configured.

 

Alternately, I use my spider simulator to test how the URLs are being formed on a site. The cool thing is that my simulator allows someone to choose which useragent to send. If you haven't used the tool lately I have upgraded it to be more robust and present more information. IMO, it's a kick ass free and fast tool to check your setup.

 

Would you mind posting that simple change? We've been trying to get headyntl's SIDs out of his urls for the past few days, and it's still not working unless the site sets a cookie...

 

Richard Lindsey.

Sometimes the problem is BKAC... :)

 

Bobby

Share this post


Link to post
Share on other sites

Bobby,

 

Wondering if you had any luck figuring out how to convert the GETS to POSTS and fix the sorting by name/price category links (similar to how the buy now buttons are updated to forms)..

After this, there will be no more SE problems with oscommerce..for now

Share this post


Link to post
Share on other sites
Guest

Sorry Jon...I have not had time. I've been dealing with other issues these past few days.

 

Bobby

Share this post


Link to post
Share on other sites
The 2 dots are not absolutely necessary but it defines the cookie SCOPE.  Basically, the scope defines which parts of the site that cookie is valid for.  The 2 dots indicate that the cookie is valid on ANY part of the store.  It covers every area and is my preferred method for ensuring proper settings.  It will not hurt anything and eliminates a lot of setup headaches.

 

i think it should also be mentioned that the cookie scope is a security measure to keep other domains from reading your customers' cookies and finding out any personal data in it (i would also suggest base64_encoding the data you store in them)... if you set the cookie domain with preceding dots like .allthingschildren.com, that means the cookie is valid for any domain with that in the name, meaning www.allthingschildren.com, mail.allthingschildren.com, etc... if you were to simply set the cookie domain to "allthingschildren.com", then theoretically, a person could go out and register ballthingschildren.com, wallthingschildren.com, etc... and have access to the customers' cookie contents (if they could crack the encryption on them, which is one more reason to encrypt the data :D)... just some things to keep in mind :)

 

Richard Lindsey.


Richard Lindsey

Share this post


Link to post
Share on other sites

oh and chemo, i finally nailed my own problem w/ apostrophes and quotations showing up in my urls after they'd been stripped :D the problem has to do with characters that appear to be apostrophes and quotations but aren't actually represented by the same ascii codes or something... like on some sites where (if this has ever happened to you) they may display some kind of sql tutorial, and you highlight, copy, and paste the text of the statement into your mysql console only to have it crap out on a quotation mark that was used... being a drop shipper (at least initially), i've been pulling most of my product descriptions and names right off of vendors' web sites via copy and paste, and it's been copying those strange characters over, and to the eye, they appear to be apostrophes and quotations, but to the machine, they're not, because they're not the right ascii values :D that's why i could put 2 statements back to back to show a stripped version of a name i manually fed to the function and a stripped version of the name pulled from the db, and i got different results :D i've since gone through all of my products and replaced those, and they strip fine now :D

 

Richard Lindsey.


Richard Lindsey

Share this post


Link to post
Share on other sites
Richard,

 

I'm curious :huh:... what SEO contribution are you using on your test site?  Is it a modified version of Ultimate SEO? 

 

It's very nice, and the Site Map actually displays beyond one sub-category level. :thumbsup:

 

Thanks,

EricK

 

Thanks for the compliment :D Yes, my urls are a modified version of this contribution... i didn't track my changes as i did it, because it took me a few days and a bit of frustration to get it working right, and get the regexes perfect, and then i actually had to rework all of my regexes because my web host is running apache 1.3, which uses older PCRE translation, and i had to get rid of some of my features *snap*... but it all ended up working, and perhaps one of these days when i either go live or actually have some spare time, i'll take a vanilla version of MS2, slap in chemo's url's, modify em like mine, and use some software that'll snapshot the differences so i repost it to the community... in the meantime i can tell you the gist of what it does... it just uses a regex to find the next word after the domain basename, being product, category, or manufacturer, then goes past the next / to start where it records its data, and records all the way to the /index.html in the case of categories, or all the way to the .html in the case of products and manufacturers... this data is then passed back to the code as the cPath (the entire chain of directories and subdirectories, split by slashes), manufacturer_id, or product_id, and in the code, the product/manufacturer is searched for in the cached table, since it stores data in the same format, and the actual number value is reassigned to that $_GET variable, or in the case of categories, the cPatch is exploded by '/' into an array, the individual elements are searched for in the cached table, the number values are created in another array, and that array is then reimploded with '_' and reassigned to cPath's $_GET variable... hope this is enough to at least get you started :)

 

Richard Lindsey.


Richard Lindsey

Share this post


Link to post
Share on other sites
Thanks for the compliment :D Yes, my urls are a modified version of this contribution... i didn't track my changes as i did it, because it took me a few days and a bit of frustration to get it working right, and get the regexes perfect, and then i actually had to rework all of my regexes because my web host is running apache 1.3, which uses older PCRE translation, and i had to get rid of some of my features *snap*... but it all ended up working, and perhaps one of these days when i either go live or actually have some spare time, i'll take a vanilla version of MS2, slap in chemo's url's, modify em like mine, and use some software that'll snapshot the differences so i repost it to the community... in the meantime i can tell you the gist of what it does... it just uses a regex to find the next word after the domain basename, being product, category, or manufacturer, then goes past the next / to start where it records its data, and records all the way to the /index.html in the case of categories, or all the way to the .html in the case of products and manufacturers... this data is then passed back to the code as the cPath (the entire chain of directories and subdirectories, split by slashes), manufacturer_id, or product_id, and in the code, the product/manufacturer is searched for in the cached table, since it stores data in the same format, and the actual number value is reassigned to that $_GET variable, or in the case of categories, the cPatch is exploded by '/' into an array, the individual elements are searched for in the cached table, the number values are created in another array, and that array is then reimploded with '_' and reassigned to cPath's $_GET variable... hope this is enough to at least get you started :)

 

Richard Lindsey.

 

So how is this going to do with SE?

 

The old rule of thumb was the least amount of directories the better...

 

so :

http://www.yourdomain.com/category/childre...ames/index.html

 

Looks liek it is 3 folders deep... so if that 2 folder theory is still used (which I am not sure) then your did more dmanage to the URL than it was from the stock osC.

 

I am not saying that is the case as I am no SEO expert but something to think about?

 

Chris


osC Contributions I have published.

 

Note: Some I only provided minor changes, updates or additions!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×