Jump to content


Corporate Sponsors


Latest News: (loading..)

- - - - -

Help on 'robots.txt'


30 replies to this topic

#21 FixItPete

  • Community Member
  • 318 posts
  • Real Name:Pete
  • Location:The Garden State

Posted 15 April 2006, 19:14

Just a thought on this... why isn't cookie_usage.php disallowed too? Doesn't it seem that this is what causes bots to hit a wall?
I find the fun in everything.

#22 jashnu

  • Community Member
  • 27 posts
  • Real Name:Jani

Posted 29 April 2006, 13:43

View Postmhormann, on Dec 18 2004, 03:40 PM, said:

# ALL search engine spiders/crawlers (put at end of file)
User-agent: *
Disallow: /cgi-bin/
Disallow: /usage/
Disallow: /catalog/admin/
Disallow: /catalog/download/
Disallow: /catalog/elmar/
Disallow: /catalog/pub/
Disallow: /catalog/account.php
Disallow: /catalog/advanced_search.php
Disallow: /catalog/checkout_shipping.php
Disallow: /catalog/create_account.php
Disallow: /catalog/login.php
Disallow: /catalog/password_forgotten.php
Disallow: /catalog/popup_image.php
Disallow: /catalog/shopping_cart.php[/code]

Have fun! And happy 'spidering'...
Matthias

Hi, just a simple newbie question. What should the rest of the file look like? I mean how do you put the spiders in the end of the file. I had an error when I was using just a list of spiders. TIA.

-Jani

#23 enigma1

  • Community Member
  • 8,206 posts
  • Real Name:Mark Samios

Posted 29 April 2006, 22:15

the spiders use a different file. Look into your catalog\includes\spiders.txt
http://www.oscommerce.com/community/contributions,2455

Edited by enigma1, 29 April 2006, 22:16.


#24 jashnu

  • Community Member
  • 27 posts
  • Real Name:Jani

Posted 02 May 2006, 13:07

View Postenigma1, on Apr 30 2006, 01:15 AM, said:

the spiders use a different file. Look into your catalog\includes\spiders.txt
http://www.oscommerce.com/community/contributions,2455

I meant robots.txt file, not spiders.txt

#25 enigma1

  • Community Member
  • 8,206 posts
  • Real Name:Mark Samios

Posted 04 May 2006, 17:36

via the user agent for example

User-agent: Googlebot-Image
Disallow: /


#26 RC_Nut

  • Community Member
  • 7 posts
  • Real Name:Mark

Posted 12 May 2006, 07:32

Ok here goes... questions from a newbie

I do no have a catalog directory. I guess better wording would be I dont have a directory named catalog. Should I have one ?

RE:- Robots.txt

I have installed my store in a sub domain of my site. The subdomain is shop.mydomain.com . Should my robots.txt file be in the root directory of subdomain of the OSc install ? Should I have a separate robots.txt file for my main site that does not include my subdomain.

#27 rin67630

  • Community Member
  • 103 posts
  • Real Name:michel firholz
  • Location:Munich Germany

Posted 20 May 2006, 17:33

View Postwheeloftime, on Apr 5 2005, 01:37 PM, said:

QUOTE(PandA.nl @ Apr 5 2005, 12:32 PM)
I would not add the location of your admin (or any other non public directory) to the robots.txt file! Showing this kind of information in your robots.txt file (which anybody can read), makes your site less safe.

Robots only would get there if there's a link to it (and obviously there shouldn't), and if a robot finds/tries it anyway, for whatever reason, the .htaccess protection won't allow it in, so the robots.txt file does not add anything usefull to that.

I have put the "admin" path in the robots.txt.

... and the honeypot ist exactly there! :D

#28 owl17sb

  • Community Member
  • 75 posts
  • Real Name:sue paterson

Posted 21 May 2006, 09:36

I have read through all this and I am now completely confused.

Is there are idiots guide as to what you need to do on these sort of things. I stumbled across this thread by accident and I was completely unaware that I would need to write a file called "robots.txt". Is there anywhere that will show be how to start writing it and what needs to be included and excluded?

I don't mean to be thick but how would I know this needed to be done? Are there any other things that I should be aware of?

Is there a complete idiots guide as to what to do and what not to do to get a website up and running and safe and secure?

I appreciate all this help is in people spare time but any pointers on this would be greatly appreciated.
:)

#29 WatchPart

  • Community Member
  • 36 posts
  • Real Name:WatchPart

Posted 23 May 2006, 11:58

View Postowl17sb, on May 21 2006, 10:36 AM, said:

I have read through all this and I am now completely confused.

Is there are idiots guide as to what you need to do on these sort of things. I stumbled across this thread by accident and I was completely unaware that I would need to write a file called "robots.txt". Is there anywhere that will show be how to start writing it and what needs to be included and excluded?

I don't mean to be thick but how would I know this needed to be done? Are there any other things that I should be aware of?

Is there a complete idiots guide as to what to do and what not to do to get a website up and running and safe and secure?

I appreciate all this help is in people spare time but any pointers on this would be greatly appreciated.
:)

I agree with you is there a sample robot.txt what we can edit ourselfs?

#30 smithveg

  • Community Member
  • 360 posts
  • Real Name:Smithveg
  • Gender:Male

Posted 08 June 2006, 05:02

Thank alot to all the discussion. I get better understand to this topics.
Tough, i still have a couple of questions which i don't understand.

1. what's the mean to google's new 'wildcard' exclusion system. the thread said google now allows 'wildcards to be specified like '*.cgi'... Can i know what is the purpose of 'cgi-bin' directory? it's use for what?

2. will the .htaccess will block the spider's way to view our sites? because i saw htaccess in every single folder. However i'm not using htaccess to secure my folder. Instead i'm just uses the hosting site;s service - 'password protected directory' function. I'm just afraid that .htaccess file will block the spider's ways to suft my pages.

3. how can i force the bot to index my 'view large image' pop up? i heard it can easily appear in page of search result on google (or other index site)... Do i need to add some extra coding in popup.php to helps my ranking?

4. from a thread said "Those who didn't include robots.txt files excluding those pages are now having to create hacks to redirect customers to the homepage of their stores"... I'm not sure what's means, robots.txt file is NECESSARY to your pages is it?

5. 'Back Link' is from others site like to us. How about Anchor, is it <a href.....></a>. This is anchor that they meant to?

I know this post was long long ago. No matter how, i hope somebody will come here and give me some ideas to solve all these questions...

Thankyou
smith :thumbsup:
****
Hello World! ^.^ I'm a Internet naive. Browse my working profile
Malaysia Web Services - OPerion Website Marketing System

#31 spudevo

  • Community Member
  • 40 posts
  • Real Name:wayne

Posted 21 May 2008, 19:23

Hey guys can anyone tell me if my robots.txt looks ok??


# robots.txt for Wheel of Time
# Currently disallow all shop stuff to the Google Image bot
# Mainly image hunters anyway, they eat up bandwidth...
User-agent: Googlebot-Image
Disallow: /cgi-bin/
Disallow: /httpdocs/

# ALL search engine spiders/crawlers (put at end of file)
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /httpdocs/temp/
Disallow: /httpdocs/admin/
Disallow: /httpdocs/download/
Disallow: /httpdocs/pub/
Disallow: /httpdocs/account.php
Disallow: /httpdocs/advanced_search.php
Disallow: /httpdocs/checkout_shipping.php
Disallow: /httpdocs/create_account.php
Disallow: /httpdocs/login.php
Disallow: /httpdocs/password_forgotten.php
Disallow: /httpdocs/popup_image.php
Disallow: /httpdocs/shopping_cart.php