Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Robots.txt Config help!


dswofford

Recommended Posts

Thanks for any help in advance...

I am just starting at looking into spidering etc... I am making my robots.tx for

My Webpage and was wonder is there any point in allowing a spider to search my contact page shipping prvacy agreement etc.. or should I disallow this (which is what im doing now)?

 

also ...

when I disallow if I put

Disallow: /contact_us.php

and the acctual link on the page is

Disallow: http://www.ryford.com/contact_us.php will it still work?

also does it work for my links to othe pages ( i want to dissallow spidering my shared ssl) which is https://www.mysecuresite.com/ryford-com/contact_us.php

and last but not least if I put

Disallow: http://www.ryford.com/product_info.php

 

will that also stop the from going to:

Disallow: http://www.ryford.com/product_info.php?products_id=96

 

Here is my current code.. any errors or problems you can see?

Thanks for the help..

# Sample robots.txt file (make sure the filename is ALL LOWERCASE on Linux/Unix systems)
# This file should go in your web site's ROOT directory
#  The root directory is where your site's main /index.html file would be found
#  It is usually found in /yourhomedir/public_html/ or /yourhomedir/httpdocs
#  Where "yourhomedir" is your user account's name
#
# We invite you to also check out our popular contribution: Simple Template System (STS)
#  It lets you layout or change your OSC look-and-feel by modifying a single HTML file
#  http://www.oscommerce.com/community/contributions,1524 or SimpleTemplateSystem.com
#  Enjoy!   - Brian Gallagher @ DiamondSea.com

# This says to apply these settings to ALL search engine spiders/crawlers
User-agent: *

# These settings will keep spiders from indexing your unwanted pages 
#   This assumes that your OSC install is in your web site's ROOT directory
#   ie: http://www.yoursite.com/index.php <- Use if this brings up your OSC main page
Disallow: /admin
Disallow: /account.php
Disallow: /advanced_search.php
Disallow: /checkout_shipping.php
Disallow: /create_account.php
Disallow: /login.php
Disallow: /login.php
Disallow: /password_forgotten.php
Disallow: /popup_image.php
Disallow: /shopping_cart.php
Disallow: /index.php?language=es
Disallow: /phponline/client.php
Disallow: /school.php
Disallow: /conditions.php
Disallow: /contact_us.php
Disallow: /shipping.php
Disallow: https://www.mysecuresite.com/ryford-com/contact_us.php
Disallow: https://www.mysecuresite.com/ryford-com/order_info.php
Disallow: https://www.mysecuresite.com/ryford-com/account.php

# IF YOU DO NOT WISH TO HAVE THE GOOGLE IMAGE BOT SCAN YOUR DOMAIN FOR IMAGES
# THEN YOU CAN INCLUDE THE FOLLOWING IN YOUR ROBOTS FILE.
# I FOUND THAT MY BANDWIDTH USAGE DROPPED BY A MASSIVE AMOUNT AFTER I GOT RID
# OF THE GOOGLE IMAGE BOT. ALL I HAD WAS IMAGE HUNTERS STEALING PRODUCT SHOTS
# AND NOT EVEN BROWSING THE SITE.

User-agent: Googlebot-Image
Disallow: /

Link to comment
Share on other sites

Is there any reason that you particularly want to disallow spiders on these pages. I get the feeling that search engines may treat these higher level pages as being more important but may be wrong. They are extra pages that people could find your site from and there may be useful phrases in your shipping page for example. I personally would just allow them because I can?t see any reason why not to.

 

Mike

Link to comment
Share on other sites

The spiders can't send email since they cannot manipulate forms. Although you need to be sure to set Prevent Spider Sessions in admin. The more pages you have listed in the SE's the better off your site will be. It is doubtful that anyone will find you by searching for Contact Us but on every page you have listed are links to your other pages (due to the footer and left and right columns). This can raise your PR. Also, depending on your site, you may want to change the title of these pages to be more specific so that they are found. If you title your page Contact UK Resident, you may find that it will attract visitors from the UK. Or instead of Privacy, perhaps use Cell Phones and Privacy (if you sell cell phones). People searching for cell phones could find your site through this page. Be creative with these pages. Place a paragraph on them about your product with keywords that fit that page. They are an excellent opportunity to increase your sites position that many people ignore.

 

Jack

Support Links:

For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc.

All of My Addons

Get the latest versions of my addons

Recommended SEO Addons

Link to comment
Share on other sites

The spiders can't send email since they cannot manipulate forms. Although you need to be sure to set Prevent Spider Sessions in admin.? The more pages you have listed in the SE's the better off your site will be.? It is doubtful that anyone will find you by searching for Contact Us but on every page you have listed are links to your other pages (due to the footer and left and right columns). This can raise your PR. Also, depending on your site, you may want to change the title of these pages to be more specific so that they are found. If you title your page Contact UK Resident, you may find that it will attract visitors from the UK. Or instead of Privacy, perhaps use Cell Phones and Privacy (if you sell cell phones). People searching for cell phones could find your site through this page. Be creative with these pages. Place a paragraph on them about your product with keywords that fit that page. They are an excellent opportunity to increase your sites position that many people ignore.

 

Jack

 

Great advice there Jack... im off to change a few titles :)

 

PS Id delete that robots text file above if it were me, as Jack says if you have "prevent spider sessions" in your admin settings theres no need for all those disallows.

Link to comment
Share on other sites

  • 3 weeks later...

My Robots.txt just stops access to some directorys, so that the search engines dont need to spend so long. Also having images in google images is quite handy, but I dont like them to be linked to the directory, rather the product page so I use the following.

 

User-agent: *

Disallow: /images/

Disallow: /admin/

Disallow: /catalogues/

Disallow: /cgi-bin/

Disallow: /download/

Disallow: /includes/

Disallow: /temp/

 

Nice and simple does the job.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...