AWWWW.WAHWAH Posted June 3, 2005 Share Posted June 3, 2005 FYI ... I am being hit by Become.com a lot. It is constantly going through my entire site. Can someone add this to the spiders.txt list? Quote Link to comment Share on other sites More sharing options...
stevel Posted June 3, 2005 Author Share Posted June 3, 2005 The purpose of spiders.txt is not to keep spiders out of your site - it's tp keep spiders from creating sessions. Do you see become.com's spider getting sessions? Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Guest Posted June 3, 2005 Share Posted June 3, 2005 The purpose of spiders.txt is not to keep spiders out of your site - it's tp keep spiders from creating sessions. Do you see become.com's spider getting sessions? <{POST_SNAPBACK}>  deja vu? Quote Link to comment Share on other sites More sharing options...
Guest Posted July 7, 2005 Share Posted July 7, 2005 Steve, Can you talk to me about "Inktomi"? It's not in your spider.txt file...for a reason? Â It crawls my site ALL OF THE TIME!!! Â It has a different IP address with every single URL it crawls. Â It "appears" to create sessions also...however, I am not quite sure how to REALLY see what it is doing...should I look in the log files??? I am simply using my Visitor Stats contrib to see it. Â Curious about what information you might have on this. I have searched the forum but found nothing pertinent. Â Thanks in advance! Â Regards, Siddall Quote Link to comment Share on other sites More sharing options...
Sid04 Posted July 7, 2005 Share Posted July 7, 2005 what about msnbot? isnt that one of the biggest spiders ya have issues with? Â ...or did I just miss it in the most recent spiders.txt contrib of yours? Quote Link to comment Share on other sites More sharing options...
stevel Posted July 7, 2005 Author Share Posted July 7, 2005 msnbot is detected by the string "nbot".  As for Inktomi - in the past, this has been associated with Yahoo slurp, and would be detected by "slurp". In my own logs, I don't see any spiders with "Inktomi" in the user agent. Give me some sample lines from your access logs that show Inktomi. I believe you, as I've seen complaints from other webmasters about an Inktomi spider that has run wild, but I have not seen it myself and can't find references to a specific user agent. Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
stevel Posted July 9, 2005 Author Share Posted July 9, 2005 Is anyone going to offer access log lines for this supposed Inktomi spider? I'm puzzled that my own sites seem unaffected by this (yet are indexed by dozens of other spiders.) Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Guest Posted July 9, 2005 Share Posted July 9, 2005 Is anyone going to offer access log lines for this supposed Inktomi spider? I'm puzzled that my own sites seem unaffected by this (yet are indexed by dozens of other spiders.) <{POST_SNAPBACK}>  STEVE, you appear to be correct according to the log file (have an excerpt of it below for your review)...the "Inktomi" spider is related to (or the same as) the Yahoo! Slurp stuff (that's a technical term) that you mentioned.  [i may be easier to cut n paste the excerpt below into something that doesn't word-wrap]  HOWEVER, the log excerpt below (which is Inktomi/Slurp specific) shows that when it crawls, it uses a different IP with almost EVERY link it crawls. I'll get 10-20 different IP hits (Inktomi-specific) within' an hour...a few times a day.  Sidenote: it doesn't seem to be creating near as many sessions as it used to...but there are a few (search on osCsid).  I don't know if anything can be done about this, but I thought it was worth noting, just in case there is something that I am missing or not understanding.  68.142.251.148 - - [09/Jul/2005:00:05:11 -0500] "GET /robots.txt HTTP/1.0" 200 806 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.251.151 - - [09/Jul/2005:00:05:13 -0500] "GET /the-bad-sports-mlb-c-21_27_22_47.html HTTP/1.0" 200 3228 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.250.93 - - [09/Jul/2005:00:15:54 -0500] "GET /-c-23_35_45.html HTTP/1.0" 200 2878 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.251.189 - - [09/Jul/2005:00:16:01 -0500] "GET /stop-evil-internet-p-55.html?action=notify HTTP/1.0" 302 26 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.250.91 - - [09/Jul/2005:00:28:39 -0500] "GET /stop-lying-yourself-pr-84.html HTTP/1.0" 200 3661 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.250.32 - - [09/Jul/2005:00:49:58 -0500] "GET /the-good-c-22_38_23_36_21.html HTTP/1.0" 200 3934 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.251.48 - - [09/Jul/2005:00:51:49 -0500] "GET /good-animal-rights-c-22_38_23_36_21_48.html HTTP/1.0" 200 3886 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.249.72 - - [09/Jul/2005:00:53:13 -0500] "GET /the-sign-c-22_38_23_36_21_48_41.html HTTP/1.0" 200 3204 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.250.30 - - [09/Jul/2005:00:53:14 -0500] "GET /-c-22_38_23_36_21_48_45.html HTTP/1.0" 200 2949 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.251.153 - - [09/Jul/2005:00:55:46 -0500] "GET /-c-22_38_23_36_21_48_42.html HTTP/1.0" 200 2931 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.251.148 - - [09/Jul/2005:01:20:50 -0500] "GET /robots.txt HTTP/1.0" 200 806 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.251.68 - - [09/Jul/2005:01:20:51 -0500] "GET /the-good-c-23_39_21_26_21.html?sort=3a&page=1 HTTP/1.0" 200 4555 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.250.171 - - [09/Jul/2005:01:27:26 -0500] "GET /stop-supporting-murder-troops-p-124.html?action=notify HTTP/1.0" 302 26 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.249.148 - - [09/Jul/2005:01:33:37 -0500] "GET /-c-23_39_21_27_22_31_44.html HTTP/1.0" 200 2927 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.249.140 - - [09/Jul/2005:01:41:47 -0500] "GET /the-good-c-21_25_21.html?page=3&sort=2a HTTP/1.0" 200 3497 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.251.172 - - [09/Jul/2005:01:46:36 -0500] "GET /product_info.php?products_id=218&osCsid=d56479bf7ae04c02366c3fc1b38a5d33 HTTP/1.0" 301 3771 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.251.172 - - [09/Jul/2005:01:46:38 -0500] "GET /stop-take-time-recycle-graphic-p-218.html?osCsid=d56479bf7ae04c02366c3fc1b38a5d33 HTTP/1.0" 200 3765 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.250.110 - - [09/Jul/2005:01:52:45 -0500] "GET /-c-22_31_43.html HTTP/1.0" 200 2887 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.249.115 - - [09/Jul/2005:02:10:29 -0500] "GET /stop-puppy-mills-p-182.html HTTP/1.0" 200 3736 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.251.64 - - [09/Jul/2005:02:17:18 -0500] "GET /-c-22_38_23_33_42.html HTTP/1.0" 200 2922 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.250.171 - - [09/Jul/2005:02:17:47 -0500] "GET /the-bad-c-22_38_23_35_22.html?sort=2d&page=1 HTTP/1.0" 200 4694 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.251.148 - - [09/Jul/2005:02:23:07 -0500] "GET /robots.txt HTTP/1.0" 200 806 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.251.11 - - [09/Jul/2005:02:23:07 -0500] "GET /stop-spelled-backwards-pots-p-217.html?action=notify HTTP/1.0" 302 26 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.250.67 - - [09/Jul/2005:02:34:29 -0500] "GET /stop-supporting-republicans-pr-115.html HTTP/1.0" 200 3653 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.250.72 - - [09/Jul/2005:02:46:47 -0500] "GET /the-good-c-21_24_21.html?sort=3a&page=1 HTTP/1.0" 200 4331 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.251.64 - - [09/Jul/2005:02:46:54 -0500] "GET /catalog/product_info.php?products_id=220&osCsid=d56479bf7ae04c02366c3fc1b38a5d33 HTTP/1.0" 302 361 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.251.64 - - [09/Jul/2005:02:46:55 -0500] "GET /product_info.php?products_id=220&osCsid=d56479bf7ae04c02366c3fc1b38a5d33 HTTP/1.0" 301 3813 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.251.64 - - [09/Jul/2005:02:46:56 -0500] "GET /stop-voices-head-p-220.html?osCsid=d56479bf7ae04c02366c3fc1b38a5d33 HTTP/1.0" 200 3806 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.251.132 - - [09/Jul/2005:03:00:56 -0500] "GET /the-bad-anarchy-c-22_38.html?page=2&sort=2a HTTP/1.0" 200 3427 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.250.34 - - [09/Jul/2005:03:09:05 -0500] "GET /stop-worrying-enjoy-your-kitten-p-175.html?action=notify HTTP/1.0" 302 26 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.251.148 - - [09/Jul/2005:03:29:12 -0500] "GET /robots.txt HTTP/1.0" 200 806 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.250.73 - - [09/Jul/2005:03:29:13 -0500] "GET /the-bad-c-22_38_23_36_21_27_22.html HTTP/1.0" 200 4499 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.250.80 - - [09/Jul/2005:03:48:40 -0500] "GET /index.php?cPath=45&osCsid=d56479bf7ae04c02366c3fc1b38a5d33 HTTP/1.0" 301 2844 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.250.80 - - [09/Jul/2005:03:48:41 -0500] "GET /-c-45.html?osCsid=d56479bf7ae04c02366c3fc1b38a5d33 HTTP/1.0" 200 2876 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.251.206 - - [09/Jul/2005:03:57:26 -0500] "GET /product_info.php?products_id=228 HTTP/1.0" 301 3692 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.251.206 - - [09/Jul/2005:03:57:27 -0500] "GET /stop-those-packersgo-vikings-p-228.html HTTP/1.0" 200 3656 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.249.126 - - [09/Jul/2005:04:04:00 -0500] "GET /-c-23_39_21_27_44.html HTTP/1.0" 200 2917 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.250.122 - - [09/Jul/2005:04:07:52 -0500] "GET /stop-those-packersgo-vikings-p-228.html HTTP/1.0" 200 3656 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 68.142.250.110 - - [09/Jul/2005:04:15:14 -0500] "GET /stop-violence-pr-108.html HTTP/1.0" 200 3645 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" Quote Link to comment Share on other sites More sharing options...
stevel Posted July 9, 2005 Author Share Posted July 9, 2005 Ok - that is Slurp, and will be detected as such by spiders.txt. I suspect you'll find that the sessions it gets are actually old sessions, but they'll be recreated when it tries again. The contrib Spider Session Remover will help with that.  A comment - you evidently have the sort headers enabled in the product listings (that's the default.) The spiders will try every combination of sort, which means a lot of duplicate accesses. You can either disable the sort links entirely (which I did - I don't find them useful), or can make them active only when a session is active. A programming exercise left to the reader. Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Guest Posted July 15, 2005 Share Posted July 15, 2005 Hi All, :'( Â Im so glad this thread exists. I have a problem, that I am hoping can be solved and also be a point for others with a similar issue to referance on the forum. Â Since an install of few stores it seems that our monthly server bandwidth has risen to amasing numbers. Almost our limit and in some cases well over. Â Looking at stats it appears the spiders are causing this, MSNBOT totalling in some months over 600MB worth. I think I am having the same issue others are. I am not very technical and would like a step by step (1. 2. 3.) guide on how I can stop this from happening. Â While I understand they are important I do not want to turn them off or restrict access. Any ideas for a complete newb at this? What spiders.txt I should use, code I should add to files or settings in Admin I should use? I need to apply this to 3 stores I have set up and added over 1000 items. Â Thanks!!!! And great work guys! Quote Link to comment Share on other sites More sharing options...
stevel Posted July 15, 2005 Author Share Posted July 15, 2005 Well, if you have a lot of pages in your store, the spiders are going to want to search them. The purpose of spiders.txt is to prevent spiders from getting sessions, so that they can't go on a wild shoping spree in your store, but they can still index the pages. Just replace the stock spiders.txt with the one in this contribution, and watch the announcement thread for updates to keep it current. spiders.txt does not keep spiders out of your store.  If the spiders are simply crawling your store to index its content, that's goodness. If you have a store with lots of products, you will probably need to move to a host that offers more bandwidth unless you want to keep spiders out. Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Guest Posted July 18, 2005 Share Posted July 18, 2005 I'm not the "sharpest tool in the shed", so forgive me if this is already included in your list".  I am including the whole string because I don't know what info you use.  sv-crawlfw3.looksmart.com  By the way, thanks for keeping the list up-to-date. You do a terrific job, and I appreciate that you take the time to do this for everyone. Quote Link to comment Share on other sites More sharing options...
stevel Posted July 18, 2005 Author Share Posted July 18, 2005 "crawl" is in the entry that will catch this one.  Thanks for the kind words. And if people are wondering why there haven't been any recent updates, it's because I haven't detected any new spiders on my sites. (I don't bother with those that just touch the home page and go no further.) Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
niknakgroup Posted July 27, 2005 Share Posted July 27, 2005 Hi Steve - great contrib, just a quick question to see if i can save a little more bandwidth; I get around 25,000 spider page requests per day, which is perfect, as it keeps my rankings nice and high in the engines' result pages, and also shows my new content very quickly. However, I am using coolmenu on all my pages, and as I understand it, spiders dont read javascript, meaning my pages all have roughly 25kb of java code which is completely irrelevant to the spiders. Is there a way to call the user_agent into my index.php file using an if statement to not refer to my coolmenu script if a pider is detected as the browser, but instead display a default categories box. Â This little bit of coding could save me nearly a quarter of my bandwidth every month! Â From what I understand (very limited in terms of this, but here goes), something like this: <?php if $user_agent == 'ooglebot' { ?> <?php require (DIR_WS_BOXES . 'categories.php'); ?> <php }else{ ?> <?php require (DIR_WS_BOXES . 'coolmenu.php'); ?> <?php } ?> would that work? Also, is there an easy way to make it work for every spider listed in your spiders.txt file instead of listing them individually, or if not, how do I add the OR type command in? Â Sorry for the imbecilic questions, this is an area of php coding that is very grey to me.....any and all help very much appreciated. Quote Please note - if I have suggested a contrib above, it doesnt mean it will work! Most of the contribs are not ones I've used, but may be useful for your particular problem.... Have you tried a refined search? Chances are your problem has already been dealt with elsewhere on the forums..... if (stumped == true) { return(square_one($start_over) } else { $random_query = tep_fetch_answer($forum_query) } Link to comment Share on other sites More sharing options...
stevel Posted July 27, 2005 Author Share Posted July 27, 2005 You just want to detect whether there is a session or not, in my opinion. I'm away from my sources now so can't suggest code, but it should be pretty simple. Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
boxtel Posted July 27, 2005 Share Posted July 27, 2005 Hi Steve - great contrib, just a quick question to see if i can save a little more bandwidth; I get around 25,000 spider page requests per day, which is perfect, as it keeps my rankings nice and high in the engines' result pages, and also shows my new content very quickly. However, I am using coolmenu on all my pages, and as I understand it, spiders dont read javascript, meaning my pages all have roughly 25kb of java code which is completely irrelevant to the spiders. Is there a way to call the user_agent into my index.php file using an if statement to not refer to my coolmenu script if a pider is detected as the browser, but instead display a default categories box. This little bit of coding could save me nearly a quarter of my bandwidth every month!  From what I understand (very limited in terms of this, but here goes), something like this: <?php if $user_agent == 'ooglebot' { ?> <?php require (DIR_WS_BOXES . 'categories.php'); ?> <php }else{ ?> <?php require (DIR_WS_BOXES . 'coolmenu.php'); ?> <?php } ?> would that work? Also, is there an easy way to make it work for every spider listed in your spiders.txt file instead of listing them individually, or if not, how do I add the OR type command in?  Sorry for the imbecilic questions, this is an area of php coding that is very grey to me.....any and all help very much appreciated. <{POST_SNAPBACK}>  simply use :  if ($spider_flag) { do not show js } else { show js } Quote Treasurer MFC Link to comment Share on other sites More sharing options...
niknakgroup Posted July 27, 2005 Share Posted July 27, 2005 if ($spider_flag) {do not show js } else { show js } Â Hi Amanda - where would i put that? in my application_top? Quote Please note - if I have suggested a contrib above, it doesnt mean it will work! Most of the contribs are not ones I've used, but may be useful for your particular problem.... Have you tried a refined search? Chances are your problem has already been dealt with elsewhere on the forums..... if (stumped == true) { return(square_one($start_over) } else { $random_query = tep_fetch_answer($forum_query) } Link to comment Share on other sites More sharing options...
stevel Posted July 28, 2005 Author Share Posted July 28, 2005 You would put that around wherever the JS code is written out. Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
willross Posted August 5, 2005 Share Posted August 5, 2005 stevel  I would add "becomebot" to your next list. They can be found at: http://www.become.com/ When they hit my stores, they behave like MSN and do big indexes from single ip's. Not like Yahoo, Google that use many. Quote · willross ·········· Link to comment Share on other sites More sharing options...
stevel Posted August 5, 2005 Author Share Posted August 5, 2005 Already there - detected by the string "ebot". Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
♥zelf Posted September 12, 2005 Share Posted September 12, 2005 For MAC OS X hosted sites your server may not be able to properly identify the line breaks in spiders.txt.  This can be fixed be either editing auto_detect_line_breaks in php.ini or adding an ini_set to includes/application_top.php  For full details on this thread: http://www.oscommerce.com/forums/index.php?showtopic=170026 Quote Virtual Merchant a.k.a. Elavon, ViaKlix, Nova Payment Module Contribution Link to comment Share on other sites More sharing options...
n_e_w_s Posted October 14, 2005 Share Posted October 14, 2005 Hi all! Â Steve you've done nice work :D Â Simple question from a newbie: Â Does setting Force Cookie Use = TRUE Prevent spiders From indexing my OSc Site totally ? Quote Link to comment Share on other sites More sharing options...
stevel Posted October 14, 2005 Author Share Posted October 14, 2005 No - it prevents spiders from successfully completing an action that requires a session, such as Buy Now or Notify, but it doesn't prevent spiders from indexing such links. This option will keep some customers from purchasing at your store and will break your store if your domain name for HTTPS is not the same as for HTTP.  Neither use of spiders.txt nor Force Cookie Use prevents spiders from indexing your store. You should do what you can to prevent spiders from following links you don't want to appear in a search engine index. A robots.txt is one tool, another is to not display links such as Buy Now if $session_started is false. Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
willross Posted October 14, 2005 Share Posted October 14, 2005 You can also try using "nofollow":  <a href="http://shop.yoursite.com/" target="_blank" rel="nofollow" />  And/or  <meta content="noarchive, nofollow, noindex" name=Robots> Quote · willross ·········· Link to comment Share on other sites More sharing options...
stevel Posted October 14, 2005 Author Share Posted October 14, 2005 Typically you'd need to modify the arguments to tep_href_link to add the rel= value. I prefer to not display the links at all - where the code says:  echo '<a href="' . tep_href_link(...  I change it to:  if ($session_started) echo '<a href="' . tep_href_link(...  Some of the places I do that are: Buy Now, Notify (in fact, I have the whole notify box removed if no session), login, checkout, My Account, etc. One can also do this for the product listing sort links (very important, I feel.) Quote Steve  Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.