Just a thought on this... why isn't cookie_usage.php disallowed too? Doesn't it seem that this is what causes bots to hit a wall?
Latest News: (loading..)
Help on 'robots.txt'
Started by mhormann, Dec 18 2004, 12:40
30 replies to this topic
#21
Posted 15 April 2006, 19:14
I find the fun in everything.
#22
Posted 29 April 2006, 13:43
mhormann, on Dec 18 2004, 03:40 PM, said:
# ALL search engine spiders/crawlers (put at end of file)
User-agent: *
Disallow: /cgi-bin/
Disallow: /usage/
Disallow: /catalog/admin/
Disallow: /catalog/download/
Disallow: /catalog/elmar/
Disallow: /catalog/pub/
Disallow: /catalog/account.php
Disallow: /catalog/advanced_search.php
Disallow: /catalog/checkout_shipping.php
Disallow: /catalog/create_account.php
Disallow: /catalog/login.php
Disallow: /catalog/password_forgotten.php
Disallow: /catalog/popup_image.php
Disallow: /catalog/shopping_cart.php[/code]
Have fun! And happy 'spidering'...
Matthias
User-agent: *
Disallow: /cgi-bin/
Disallow: /usage/
Disallow: /catalog/admin/
Disallow: /catalog/download/
Disallow: /catalog/elmar/
Disallow: /catalog/pub/
Disallow: /catalog/account.php
Disallow: /catalog/advanced_search.php
Disallow: /catalog/checkout_shipping.php
Disallow: /catalog/create_account.php
Disallow: /catalog/login.php
Disallow: /catalog/password_forgotten.php
Disallow: /catalog/popup_image.php
Disallow: /catalog/shopping_cart.php[/code]
Have fun! And happy 'spidering'...
Matthias
Hi, just a simple newbie question. What should the rest of the file look like? I mean how do you put the spiders in the end of the file. I had an error when I was using just a list of spiders. TIA.
-Jani
#23
Posted 29 April 2006, 22:15
the spiders use a different file. Look into your catalog\includes\spiders.txt
http://www.oscommerce.com/community/contributions,2455
http://www.oscommerce.com/community/contributions,2455
Edited by enigma1, 29 April 2006, 22:16.
#24
Posted 02 May 2006, 13:07
enigma1, on Apr 30 2006, 01:15 AM, said:
the spiders use a different file. Look into your catalog\includes\spiders.txt
http://www.oscommerce.com/community/contributions,2455
http://www.oscommerce.com/community/contributions,2455
I meant robots.txt file, not spiders.txt
#25
Posted 04 May 2006, 17:36
via the user agent for example
User-agent: Googlebot-Image Disallow: /
#26
Posted 12 May 2006, 07:32
Ok here goes... questions from a newbie
I do no have a catalog directory. I guess better wording would be I dont have a directory named catalog. Should I have one ?
RE:- Robots.txt
I have installed my store in a sub domain of my site. The subdomain is shop.mydomain.com . Should my robots.txt file be in the root directory of subdomain of the OSc install ? Should I have a separate robots.txt file for my main site that does not include my subdomain.
I do no have a catalog directory. I guess better wording would be I dont have a directory named catalog. Should I have one ?
RE:- Robots.txt
I have installed my store in a sub domain of my site. The subdomain is shop.mydomain.com . Should my robots.txt file be in the root directory of subdomain of the OSc install ? Should I have a separate robots.txt file for my main site that does not include my subdomain.
#27
Posted 20 May 2006, 17:33
wheeloftime, on Apr 5 2005, 01:37 PM, said:
QUOTE(PandA.nl @ Apr 5 2005, 12:32 PM)
I would not add the location of your admin (or any other non public directory) to the robots.txt file! Showing this kind of information in your robots.txt file (which anybody can read), makes your site less safe.
Robots only would get there if there's a link to it (and obviously there shouldn't), and if a robot finds/tries it anyway, for whatever reason, the .htaccess protection won't allow it in, so the robots.txt file does not add anything usefull to that.
I would not add the location of your admin (or any other non public directory) to the robots.txt file! Showing this kind of information in your robots.txt file (which anybody can read), makes your site less safe.
Robots only would get there if there's a link to it (and obviously there shouldn't), and if a robot finds/tries it anyway, for whatever reason, the .htaccess protection won't allow it in, so the robots.txt file does not add anything usefull to that.
I have put the "admin" path in the robots.txt.
... and the honeypot ist exactly there!
#28
Posted 21 May 2006, 09:36
I have read through all this and I am now completely confused.
Is there are idiots guide as to what you need to do on these sort of things. I stumbled across this thread by accident and I was completely unaware that I would need to write a file called "robots.txt". Is there anywhere that will show be how to start writing it and what needs to be included and excluded?
I don't mean to be thick but how would I know this needed to be done? Are there any other things that I should be aware of?
Is there a complete idiots guide as to what to do and what not to do to get a website up and running and safe and secure?
I appreciate all this help is in people spare time but any pointers on this would be greatly appreciated.
Is there are idiots guide as to what you need to do on these sort of things. I stumbled across this thread by accident and I was completely unaware that I would need to write a file called "robots.txt". Is there anywhere that will show be how to start writing it and what needs to be included and excluded?
I don't mean to be thick but how would I know this needed to be done? Are there any other things that I should be aware of?
Is there a complete idiots guide as to what to do and what not to do to get a website up and running and safe and secure?
I appreciate all this help is in people spare time but any pointers on this would be greatly appreciated.
#29
Posted 23 May 2006, 11:58
owl17sb, on May 21 2006, 10:36 AM, said:
I have read through all this and I am now completely confused.
Is there are idiots guide as to what you need to do on these sort of things. I stumbled across this thread by accident and I was completely unaware that I would need to write a file called "robots.txt". Is there anywhere that will show be how to start writing it and what needs to be included and excluded?
I don't mean to be thick but how would I know this needed to be done? Are there any other things that I should be aware of?
Is there a complete idiots guide as to what to do and what not to do to get a website up and running and safe and secure?
I appreciate all this help is in people spare time but any pointers on this would be greatly appreciated.

Is there are idiots guide as to what you need to do on these sort of things. I stumbled across this thread by accident and I was completely unaware that I would need to write a file called "robots.txt". Is there anywhere that will show be how to start writing it and what needs to be included and excluded?
I don't mean to be thick but how would I know this needed to be done? Are there any other things that I should be aware of?
Is there a complete idiots guide as to what to do and what not to do to get a website up and running and safe and secure?
I appreciate all this help is in people spare time but any pointers on this would be greatly appreciated.
I agree with you is there a sample robot.txt what we can edit ourselfs?
#30
Posted 08 June 2006, 05:02
Thank alot to all the discussion. I get better understand to this topics.
Tough, i still have a couple of questions which i don't understand.
1. what's the mean to google's new 'wildcard' exclusion system. the thread said google now allows 'wildcards to be specified like '*.cgi'... Can i know what is the purpose of 'cgi-bin' directory? it's use for what?
2. will the .htaccess will block the spider's way to view our sites? because i saw htaccess in every single folder. However i'm not using htaccess to secure my folder. Instead i'm just uses the hosting site;s service - 'password protected directory' function. I'm just afraid that .htaccess file will block the spider's ways to suft my pages.
3. how can i force the bot to index my 'view large image' pop up? i heard it can easily appear in page of search result on google (or other index site)... Do i need to add some extra coding in popup.php to helps my ranking?
4. from a thread said "Those who didn't include robots.txt files excluding those pages are now having to create hacks to redirect customers to the homepage of their stores"... I'm not sure what's means, robots.txt file is NECESSARY to your pages is it?
5. 'Back Link' is from others site like to us. How about Anchor, is it <a href.....></a>. This is anchor that they meant to?
I know this post was long long ago. No matter how, i hope somebody will come here and give me some ideas to solve all these questions...
Thankyou
smith
Tough, i still have a couple of questions which i don't understand.
1. what's the mean to google's new 'wildcard' exclusion system. the thread said google now allows 'wildcards to be specified like '*.cgi'... Can i know what is the purpose of 'cgi-bin' directory? it's use for what?
2. will the .htaccess will block the spider's way to view our sites? because i saw htaccess in every single folder. However i'm not using htaccess to secure my folder. Instead i'm just uses the hosting site;s service - 'password protected directory' function. I'm just afraid that .htaccess file will block the spider's ways to suft my pages.
3. how can i force the bot to index my 'view large image' pop up? i heard it can easily appear in page of search result on google (or other index site)... Do i need to add some extra coding in popup.php to helps my ranking?
4. from a thread said "Those who didn't include robots.txt files excluding those pages are now having to create hacks to redirect customers to the homepage of their stores"... I'm not sure what's means, robots.txt file is NECESSARY to your pages is it?
5. 'Back Link' is from others site like to us. How about Anchor, is it <a href.....></a>. This is anchor that they meant to?
I know this post was long long ago. No matter how, i hope somebody will come here and give me some ideas to solve all these questions...
Thankyou
smith
****
Hello World! ^.^ I'm a Internet naive. Browse my working profile
Malaysia Web Services - OPerion Website Marketing System
Hello World! ^.^ I'm a Internet naive. Browse my working profile
Malaysia Web Services - OPerion Website Marketing System
#31
Posted 21 May 2008, 19:23
Hey guys can anyone tell me if my robots.txt looks ok??
# robots.txt for Wheel of Time
# Currently disallow all shop stuff to the Google Image bot
# Mainly image hunters anyway, they eat up bandwidth...
User-agent: Googlebot-Image
Disallow: /cgi-bin/
Disallow: /httpdocs/
# ALL search engine spiders/crawlers (put at end of file)
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /httpdocs/temp/
Disallow: /httpdocs/admin/
Disallow: /httpdocs/download/
Disallow: /httpdocs/pub/
Disallow: /httpdocs/account.php
Disallow: /httpdocs/advanced_search.php
Disallow: /httpdocs/checkout_shipping.php
Disallow: /httpdocs/create_account.php
Disallow: /httpdocs/login.php
Disallow: /httpdocs/password_forgotten.php
Disallow: /httpdocs/popup_image.php
Disallow: /httpdocs/shopping_cart.php
# robots.txt for Wheel of Time
# Currently disallow all shop stuff to the Google Image bot
# Mainly image hunters anyway, they eat up bandwidth...
User-agent: Googlebot-Image
Disallow: /cgi-bin/
Disallow: /httpdocs/
# ALL search engine spiders/crawlers (put at end of file)
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /httpdocs/temp/
Disallow: /httpdocs/admin/
Disallow: /httpdocs/download/
Disallow: /httpdocs/pub/
Disallow: /httpdocs/account.php
Disallow: /httpdocs/advanced_search.php
Disallow: /httpdocs/checkout_shipping.php
Disallow: /httpdocs/create_account.php
Disallow: /httpdocs/login.php
Disallow: /httpdocs/password_forgotten.php
Disallow: /httpdocs/popup_image.php
Disallow: /httpdocs/shopping_cart.php














