Rishi001 Posted November 8, 2008 Share Posted November 8, 2008 I cant find a robots.txt file in my server, someone told me that i need it. Can someone please provide me with the link to download it? Thanks Rishi Patel Quote Link to comment Share on other sites More sharing options...
Guest Posted November 8, 2008 Share Posted November 8, 2008 I cant find a robots.txt file in my server, someone told me that i need it. Can someone please provide me with the link to download it? Thanks Rishi Patel Look in the contributions area. Search for robots. Quote Link to comment Share on other sites More sharing options...
themilkman Posted November 12, 2008 Share Posted November 12, 2008 Look in the contributions area. Search for robots. Hi - I can see my robots.txt file but I cannot see any spiders.txt file.... how do I see it? I am looking in the public_html folder and its not there. Please advise.... Thanks Quote Whats the point of a signature? Link to comment Share on other sites More sharing options...
Guest Posted November 12, 2008 Share Posted November 12, 2008 Please advise.... Thanks Look in your catalog/includes/ robots is in catalog/ Quote Link to comment Share on other sites More sharing options...
Guest Posted November 16, 2008 Share Posted November 16, 2008 (edited) i just spotted this one browsing as a guest: msnbot-Products Name: msnbot-Products/1.0 (+http://search.msn.com/msnbot.htm) IP Address: 65.55.252.40 isn't the nbot entry in spiders.txt supposed to catch these? or because -products/1.0 is there, do i need to add a new entry? hmm. on second look, it IS being treated as a bot. everywhere, except the categories. doesn't seem to like my seo url structure for categories. anyone else ever run into this?? google & yahoo have no problem with my categories like this. (as in, they are not shown as a customer/guest on who's online when indexing categories .. like msnbot-products is) Edited November 16, 2008 by eww Quote Link to comment Share on other sites More sharing options...
stevel Posted December 26, 2008 Author Share Posted December 26, 2008 hmm. on second look, it IS being treated as a bot. everywhere, except the categories. doesn't seem to like my seo url structure for categories. anyone else ever run into this??google & yahoo have no problem with my categories like this. (as in, they are not shown as a customer/guest on who's online when indexing categories .. like msnbot-products is) What behavior do you see that is a problem? Do you see it having a session created? spiders.txt would prevent that. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Irin Posted December 31, 2008 Share Posted December 31, 2008 It seems that a bot is crawling my site, creating sessions and adding random products to the cart. Here are some lines from my access log: 208.99.195.54 - - [30/Dec/2008:19:20:31 -0700] "GET / HTTP/1.1" 200 6882208.99.195.54 - - [30/Dec/2008:22:17:24 -0700] "GET /product_info.php?pName=ts&osCsid=f735ae1ed1e7085d43bece7f1bb19579 HTTP/1.0" 200 15237 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)" 208.99.195.54 - - [30/Dec/2008:22:18:22 -0700] "GET /account_history_info.php HTTP/1.0" 302 26 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)" 208.99.195.54 - - [30/Dec/2008:22:18:53 -0700] "GET /account.php?osCsid=f735ae1ed1e7085d43bece7f1bb19579 HTTP/1.0" 302 26 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)" 208.99.195.54 - - [30/Dec/2008:22:26:35 -0700] "GET /create_account.php?guest_account=true&osCsid=f735ae1ed1e7085d43bece7f1bb19579 HTTP/1.0" 200 15170 208.99.195.54 - - [30/Dec/2008:22:47:55 -0700] "GET /account_edit.php HTTP/1.0" 302 26 208.99.195.54 - - [30/Dec/2008:22:47:58 -0700] "GET /account_newsletters.php HTTP/1.0" 302 26 208.99.195.54 - - [30/Dec/2008:22:52:40 -0700] "GET /document.all. HTTP/1.0" 404 309 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)" 208.99.195.54 - - [30/Dec/2008:22:52:43 -0700] "GET /checkout_shipping.php HTTP/1.0" 302 26 Quote Link to comment Share on other sites More sharing options...
stevel Posted December 31, 2008 Author Share Posted December 31, 2008 Why do you think this is a bot? I see no evidence of that. Everything, including the times of access, suggests a human. The user agent is that of MSIE 6; while this can be forged, it also makes it impossible to filter out based on spiders.txt. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Irin Posted December 31, 2008 Share Posted December 31, 2008 Why do you think this is a bot? I see no evidence of that. Everything, including the times of access, suggests a human. The user agent is that of MSIE 6; while this can be forged, it also makes it impossible to filter out based on spiders.txt. Well, its behavior suggests me that it's a bot. It creates a cart, then leaves, then comes back and creates a new cart and always adds many random products, no consistency. It jumps from page to page and never stays for longer than two seconds on each page. It can reach as far as checkout_shipping.php without even logging in. The times of access that you see were just randomly picked from the access log. As I said, its activity is changing every second or two, without consistency. If I would post the whole activity log of this "human/bot", you would see that it's a strange for a human behavior. Quote Link to comment Share on other sites More sharing options...
Irin Posted December 31, 2008 Share Posted December 31, 2008 Well, its behavior suggests me that it's a bot. It creates a cart, then leaves, then comes back and creates a new cart and always adds many random products, no consistency. It jumps from page to page and never stays for longer than two seconds on each page. It can reach as far as checkout_shipping.php without even logging in and getting a cart. The times of access that you see were just randomly picked from the access log. As I said, its activity is changing every second or two, without consistency. If I would post the whole activity log of this "human/bot", you would see that it's a strange for a human behavior. Quote Link to comment Share on other sites More sharing options...
stevel Posted December 31, 2008 Author Share Posted December 31, 2008 Ok, but there's nothing in the information available which one could use to automatically decide that it's a bot. Is the IP address always the same? If so, you could block it in .htaccess or just add a test for that IP in the spider test. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Irin Posted January 1, 2009 Share Posted January 1, 2009 Ok, but there's nothing in the information available which one could use to automatically decide that it's a bot. Is the IP address always the same? If so, you could block it in .htaccess or just add a test for that IP in the spider test. What do you mean by saying add a test for that IP in the spider test? How do I do that? Yes, the IP is always the same. Quote Link to comment Share on other sites More sharing options...
stevel Posted January 2, 2009 Author Share Posted January 2, 2009 In includes/application_top.php, just after this code: } elseif (SESSION_BLOCK_SPIDERS == 'True') { $user_agent = strtolower(getenv('HTTP_USER_AGENT')); $spider_flag = false; add this: if (tep_get_ip_address() == '208.99.195.54') $spider_flag = true; Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
knifeman Posted January 14, 2009 Share Posted January 14, 2009 I have a frequent visitor that is referred by the following uRL http://search.live.com/results.aspx?q=cookie They have a session and usually have one item in their cart. The last set of digits in their IP changes and the IP always resolves to microsoft corp. http://www.showmyip.com/?ip=65.55.109.146 If this is a bot, why does it have a session and why doesn't it show as a bot? I have the enhanced who's online added to the store. Tim Also I updated my spiders text file. and prevent spider sessions is set to true. Quote Link to comment Share on other sites More sharing options...
stevel Posted January 14, 2009 Author Share Posted January 14, 2009 What is the user agent string from the access log? Does the first URL it tries contain an osCid= session ID in the URL? I'd be rather astonished that any bot has a referral URL at all. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
knifeman Posted January 14, 2009 Share Posted January 14, 2009 Steve, Here are some entries from the access log: 65.55.110.167 - - [14/Jan/2009:06:34:04 -0500] "GET /immobilizer-900000-volt-cell-phone-stun-p-616.html?action=buy_now&page=1&sort=2d HTTP/1.0" 302 - "http://search.live.com/results.aspx?q=cookie" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322)" 65.55.110.167 - - [14/Jan/2009:06:34:04 -0500] "GET /shopping_cart.php?cPath=28&page=1&sort=2d&osCsid=ef787039f88b28f4739d1789f3c2c213 HTTP/1.0" 200 34252 "http://search.live.com/results.aspx?q=cookie" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322)" 65.55.110.167 - - [14/Jan/2009:06:34:05 -0500] "GET /stylesheet.css HTTP/1.0" 200 6637 "http://myknifestore.net/shopping_cart.php?cPath=28&page=1&sort=2d&osCsid=ef787039f88b28f4739d1789f3c2c213" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322 65.55.109.237 - - [14/Jan/2009:08:55:26 -0500] "GET /military-issue-swmi-p-364.html?action=buy_now&page=1&sort=2d HTTP/1.0" 302 - "http://search.live.com/results.aspx?q=cookie" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322)" 65.55.109.237 - - [14/Jan/2009:08:55:27 -0500] "GET /shopping_cart.php?cPath=35&page=1&sort=2d&osCsid=13094c900428b9ca1531b764867defd3 HTTP/1.0" 200 34335 "http://search.live.com/results.aspx?q=cookie" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322)" 65.55.109.237 - - [14/Jan/2009:08:55:27 -0500] "GET /stylesheet.css HTTP/1.0" 200 6637 "http://myknifestore.net/shopping_cart.php?cPath=35&page=1&sort=2d&osCsid=13094c900428b9ca1531b764867defd3" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322 Quote Link to comment Share on other sites More sharing options...
stevel Posted January 14, 2009 Author Share Posted January 14, 2009 As far as Prevent Spider Sessions is concerned, that's a human user. There's nothing one can use to say "this is a bot". In fact, what you show looks very much like a human clicking on a Buy Now button, complete with loading a stylesheet (which a bot would NEVER do.) I will agree, though, that the IP is for msnbot. Very weird. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
knifeman Posted January 14, 2009 Share Posted January 14, 2009 I realize there is no accounting for human behavior, but this is really weird. This user comes back numerous times with the same referrer and adds one item to the cart and leaves. The last set of digits in the ip will change, but resolves to msn every time I check it. Thanks for taking the time to look at this, Tim As far as Prevent Spider Sessions is concerned, that's a human user. There's nothing one can use to say "this is a bot". In fact, what you show looks very much like a human clicking on a Buy Now button, complete with loading a stylesheet (which a bot would NEVER do.) I will agree, though, that the IP is for msnbot. Very weird. Quote Link to comment Share on other sites More sharing options...
stevel Posted January 14, 2009 Author Share Posted January 14, 2009 The best thing I can suggest is to replace your Buy Now link buttons with a form button. There's a contrib to do this. Few if any bots follow forms. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
SteveDallas Posted February 17, 2009 Share Posted February 17, 2009 (edited) As far as Prevent Spider Sessions is concerned, that's a human user. There's nothing one can use to say "this is a bot". In fact, what you show looks very much like a human clicking on a Buy Now button, complete with loading a stylesheet (which a bot would NEVER do.) I will agree, though, that the IP is for msnbot. Very weird. The activity that knifeman described comes from msnbot's cloaking detector, that has been running since the summer or fall of '07. See this blog post from Vanessa Fox on Search Engine Land that discusses the behavior of an early version of this non-bot bot. It continues to crawl my site from time to time. I just live with it, but it is annoying. I may block the subnet(s) that it runs from in application_top.php as suggested for another bot that doesn't identify itself a few posts back, though changing the Buy Now buttons to form buttons may be an even better idea. --Glen Edited February 17, 2009 by SteveDallas Quote Link to comment Share on other sites More sharing options...
SteveDallas Posted February 20, 2009 Share Posted February 20, 2009 The best thing I can suggest is to replace your Buy Now link buttons with a form button. There's a contrib to do this. Few if any bots follow forms. I searched for this form in the contrib library, but couldn't find it. Do you remember the title or contrib number? --Glen Quote Link to comment Share on other sites More sharing options...
Guest Posted February 20, 2009 Share Posted February 20, 2009 I searched for this form in the contrib library, but couldn't find it. Do you remember the title or contrib number? --Glen How about this one? http://addons.oscommerce.com/info/952 Quote Link to comment Share on other sites More sharing options...
SteveDallas Posted February 20, 2009 Share Posted February 20, 2009 How about this one? http://addons.oscommerce.com/info/952 That looks like the one I'm looking for. I don't think I need the SID killer part; just the change buy now buttons to forms. Thanks very much. --Glen Quote Link to comment Share on other sites More sharing options...
knifeman Posted February 25, 2009 Share Posted February 25, 2009 Thanks Glen, That explains everything. It was bugging me because the bot crawls my site everyday. I checked with MSN and I am not blocked. Tim The activity that knifeman described comes from msnbot's cloaking detector, that has been running since the summer or fall of '07. See this blog post from Vanessa Fox on Search Engine Land that discusses the behavior of an early version of this non-bot bot. It continues to crawl my site from time to time. I just live with it, but it is annoying. I may block the subnet(s) that it runs from in application_top.php as suggested for another bot that doesn't identify itself a few posts back, though changing the Buy Now buttons to form buttons may be an even better idea. --Glen Quote Link to comment Share on other sites More sharing options...
SteveDallas Posted February 25, 2009 Share Posted February 25, 2009 Thanks Glen, That explains everything. It was bugging me because the bot crawls my site everyday. I checked with MSN and I am not blocked. Tim I ended up changing my "Buy Now" buttons to forms, as outlined in the SID Killer contribution. While the MSN cloaking checker still crawls my site, usually a minute or two after msnbot has indexed the same page, I haven't noticed it creating carts. In making the change, I removed a previous mod that I had installed that enables the Buy Now button only if a session ID has been assigned. I figured that I no longer needed that, since the whole point of making form buttons is that bots won't follow them. If its any consolation, the cloaking checker seems to visit less often over time. --Glen Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.