Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Help on 'robots.txt'


mhormann

Recommended Posts

Ok here goes... questions from a newbie

 

I do no have a catalog directory. I guess better wording would be I dont have a directory named catalog. Should I have one ?

 

RE:- Robots.txt

 

I have installed my store in a sub domain of my site. The subdomain is shop.mydomain.com . Should my robots.txt file be in the root directory of subdomain of the OSc install ? Should I have a separate robots.txt file for my main site that does not include my subdomain.

Link to comment
Share on other sites

  • 2 weeks later...
QUOTE(PandA.nl @ Apr 5 2005, 12:32 PM)

I would not add the location of your admin (or any other non public directory) to the robots.txt file! Showing this kind of information in your robots.txt file (which anybody can read), makes your site less safe.

 

Robots only would get there if there's a link to it (and obviously there shouldn't), and if a robot finds/tries it anyway, for whatever reason, the .htaccess protection won't allow it in, so the robots.txt file does not add anything usefull to that.

 

I have put the "admin" path in the robots.txt.

 

... and the honeypot ist exactly there! :D

Link to comment
Share on other sites

I have read through all this and I am now completely confused.

 

Is there are idiots guide as to what you need to do on these sort of things. I stumbled across this thread by accident and I was completely unaware that I would need to write a file called "robots.txt". Is there anywhere that will show be how to start writing it and what needs to be included and excluded?

 

I don't mean to be thick but how would I know this needed to be done? Are there any other things that I should be aware of?

 

Is there a complete idiots guide as to what to do and what not to do to get a website up and running and safe and secure?

 

I appreciate all this help is in people spare time but any pointers on this would be greatly appreciated.

:)

Link to comment
Share on other sites

I have read through all this and I am now completely confused.

 

Is there are idiots guide as to what you need to do on these sort of things. I stumbled across this thread by accident and I was completely unaware that I would need to write a file called "robots.txt". Is there anywhere that will show be how to start writing it and what needs to be included and excluded?

 

I don't mean to be thick but how would I know this needed to be done? Are there any other things that I should be aware of?

 

Is there a complete idiots guide as to what to do and what not to do to get a website up and running and safe and secure?

 

I appreciate all this help is in people spare time but any pointers on this would be greatly appreciated.

:)

 

I agree with you is there a sample robot.txt what we can edit ourselfs?

Link to comment
Share on other sites

  • 3 weeks later...

Thank alot to all the discussion. I get better understand to this topics.

Tough, i still have a couple of questions which i don't understand.

 

1. what's the mean to google's new 'wildcard' exclusion system. the thread said google now allows 'wildcards to be specified like '*.cgi'... Can i know what is the purpose of 'cgi-bin' directory? it's use for what?

 

2. will the .htaccess will block the spider's way to view our sites? because i saw htaccess in every single folder. However i'm not using htaccess to secure my folder. Instead i'm just uses the hosting site;s service - 'password protected directory' function. I'm just afraid that .htaccess file will block the spider's ways to suft my pages.

 

3. how can i force the bot to index my 'view large image' pop up? i heard it can easily appear in page of search result on google (or other index site)... Do i need to add some extra coding in popup.php to helps my ranking?

 

4. from a thread said "Those who didn't include robots.txt files excluding those pages are now having to create hacks to redirect customers to the homepage of their stores"... I'm not sure what's means, robots.txt file is NECESSARY to your pages is it?

 

5. 'Back Link' is from others site like to us. How about Anchor, is it <a href.....></a>. This is anchor that they meant to?

 

I know this post was long long ago. No matter how, i hope somebody will come here and give me some ideas to solve all these questions...

 

Thankyou

smith :thumbsup:

****

Hello World! ^.^ I'm a Internet naive. Browse my working profile

Malaysia Web Services - OPerion Website Marketing System

Link to comment
Share on other sites

  • 1 year later...

Hey guys can anyone tell me if my robots.txt looks ok??

 

 

# robots.txt for Wheel of Time

# Currently disallow all shop stuff to the Google Image bot

# Mainly image hunters anyway, they eat up bandwidth...

User-agent: Googlebot-Image

Disallow: /cgi-bin/

Disallow: /httpdocs/

 

# ALL search engine spiders/crawlers (put at end of file)

User-agent: *

Disallow: /cgi-bin/

Disallow: /tmp/

Disallow: /httpdocs/temp/

Disallow: /httpdocs/admin/

Disallow: /httpdocs/download/

Disallow: /httpdocs/pub/

Disallow: /httpdocs/account.php

Disallow: /httpdocs/advanced_search.php

Disallow: /httpdocs/checkout_shipping.php

Disallow: /httpdocs/create_account.php

Disallow: /httpdocs/login.php

Disallow: /httpdocs/password_forgotten.php

Disallow: /httpdocs/popup_image.php

Disallow: /httpdocs/shopping_cart.php

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...