stevel 2 Posted July 2, 2010 spiders.txt does not block search engines from your site. What it does is prevent them from creating sessions so that they are unable to do "add to cart", go places only humans can go, and, most importantly, it prevents URLs in their index from containing session IDs. When a "bot" visits your site, it supplies a user agent string that identifies it (usually). Since a lot of bots have the string "ebot" in their UA strings, this is used to detect all of them. Googlebot is just one. Similarly, "nbot" detects MSNbot and any other with "nbot" in the UA string. These bots are not bad - in fact they are good - you want your site indexed. You just don't want them following "add to cart" links and leaving session IDs in URLs. If you actually want to block a bot, the first thing is to add an entry to robots.txt. All well-behaved bots will honor this. See this Wikipedia article for more info. I don't know if Yandex honors this - it probably does. You may have to visit its web site to see what to put in robots.txt. Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Share this post Link to post Share on other sites
Androider 0 Posted July 2, 2010 (edited) Thank you for your kind reply. I also saw this trolling on my website. as13448.com Do I just put as13448 somewhere in spider.txt file to stop this bot from creating sessions? Another questions.. So by putting the "yandex" in spider.txt file, you stop them from creating sessions? Which will reduce them using your bandwidth? So it is OK for them to visit the front page of my website? Because whenever yandex.ru came to my website, they were viewing most of my products one by one. So does this mean that I will still see them on my who's online page? Thank you. Edited July 2, 2010 by Androider Share this post Link to post Share on other sites
stevel 2 Posted July 2, 2010 (edited) Do I just put as13448 somewhere in spider.txt file to stop this bot from creating sessions? No - you have to look at the user agent string from the server log and see what it has there. It may not have anything you can use to identify it if it is not a well-behaved bot. Is it causing trouble for you? Yes, you will still see the bots on Who's Online. From experience, I'd say to NOT trust what that says for whether or not the visitor has a session. Edited July 2, 2010 by stevel Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Share this post Link to post Share on other sites
Androider 0 Posted July 2, 2010 Is it causing trouble for you? To be honest, I'm not sure if bots are causing problems... I just became curious who this yandex.ru (who was on my website everyday) was and did some search and people were complaining its eating up bandwidth of upto 1gb a day. So is this how you stop them using bandwidth? spider.txt? I just want my site clean as possible. So, I should just remove as13448 from spider.txt? As its of no use? Share this post Link to post Share on other sites
stevel 2 Posted July 2, 2010 I would remove as13448 from spiders.txt. You can use robots.txt to slow down a spider - read the link I posted. AS13448.com is operated by a company called Websense, a company that sells web filtering devices and services. Can you show me a line from your server log indicating an as13448.com IP address? Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Share this post Link to post Share on other sites
Androider 0 Posted July 2, 2010 I'm a newbie at technical stuff.. But I was able to find this from cpanel AWSTATS. static-208-80-193-39.as13448.com Share this post Link to post Share on other sites
stevel 2 Posted July 2, 2010 That's not the user agent string. You want a line that looks something like this: 220.181.7.44 - - [12/Apr/2010:02:32:03 -0400] "GET /robots.txt HTTP/1.1" 200 451 www.example.com "-" "Baiduspider+(+http://www.baidu.com/search/spider.htm)" "-" See that string that starts "Baiduspider"? That's the user agent. If you're using awstats, you should be able to locate the access log. If you want to block Yandex entirely - and posts I have read suggest that is a good idea, add this to your robots.txt: User-agent: Yandex Disallow: / Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Share this post Link to post Share on other sites
Andreas2003 1 Posted July 2, 2010 Got something on my site, which I'm not familiar with: Name: 0.83 IP-address is changing, but a lot from different comcast-nodes like "c-66-41-29-213.hsd1.mn.comcast.net". No session, no referrer. I searched through my spiders.txt, but did not found anything like "0.83". Do anyone of you know, if this is a real "bot" or someone too interested in my site? Thanks in advance, regards Andreas Share this post Link to post Share on other sites
Andreas2003 1 Posted July 2, 2010 (edited) Name: 0.83 Got some more information: http://www.80legs.com/spider.html I blocked it through robots.txt: User-agent: 008 Disallow: / Hope, that will work. Edited July 2, 2010 by Andreas2003 Share this post Link to post Share on other sites
surcie 0 Posted July 23, 2010 I have this not recognized spider: msnbot-207-46-12-118.search.msn.com Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648) IP: 207.46.12.118 Share this post Link to post Share on other sites
stevel 2 Posted July 23, 2010 Well, that's a bit odd. While the hostname has msnbot in it, the user agent string just looks like MSIE. There's no way to detect that as a bot. Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Share this post Link to post Share on other sites
surcie 0 Posted July 24, 2010 So it's a pc user from msn services?? Can this be considered a risk in security?? thanks in advance Share this post Link to post Share on other sites
stevel 2 Posted July 24, 2010 A security risk? No more than any other PC. The thing to look at is if this "user" went around your site adding items to a cart. How many pages did it visit at this time? Do you see a session ID in all the URLs or maybe just one or two? Remember that the purpose of spiders.txt is NOT to prevent bots from visiting your site - it's to keep session IDs out of search engine indexes and to prevent them from doing things that require a session. Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Share this post Link to post Share on other sites
spoofy 2 Posted February 7, 2011 Hey Steve, Should we go ahead and add the new Bing/Yahoo bot called "bingbot" My Contributions: Google XML Sitemap SEO compatible with Ultimate SEO URL by FWR Media ::: Accurate & Precise Bread Crumb Trail Share this post Link to post Share on other sites
Guest Posted April 18, 2011 We have installed a site search engine and would like to add our own site spider to the list. Anyone know how this can be done? Share this post Link to post Share on other sites
etzeppy 0 Posted April 26, 2011 I am using spiders.txt dated 04-17-2010, which I believe is the most recent. It is not detecting the following bot: User Agent: Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) I thought adding "bingbot" (without quotes) to spiders.txt would allow detection but that did not seem to work. I actually thought that one of the existing strings would catch it but this bot is showing up in Who's Online as a customer. Can someone please tell me what string needs to be in spiders.txt to allow proper detection? Thanks Share this post Link to post Share on other sites
smiler99 0 Posted May 12, 2011 (edited) I am using spiders.txt dated 04-17-2010, which I believe is the most recent. It is not detecting the following bot: User Agent: Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) I thought adding "bingbot" (without quotes) to spiders.txt would allow detection but that did not seem to work. I actually thought that one of the existing strings would catch it but this bot is showing up in Who's Online as a customer. Can someone please tell me what string needs to be in spiders.txt to allow proper detection? Thanks gbot picks up this spider - line 27 in spiders.txt (presuming you havnt changed the order of the bots from the original file). my Whos online registers User Agent: Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) as a bot Smiler Edited May 12, 2011 by Jan Zonjee spamming Share this post Link to post Share on other sites
stevel 2 Posted June 8, 2011 We have installed a site search engine and would like to add our own site spider to the list. Anyone know how this can be done? You need to know what "user agent" string the spider supplies when making the http request. It would, ideally, have some part of it that can be used to identify it as a bot. If the UA string includes "bot/" or "/bot" that would do the trick. If it doesn't fit the pattern of any of the existing strings, then figure out what would identify it (without a false positive on a legitimate browser) and add the string to the spiders.txt file. If your search engine supplies a generic UA or one that matches that of a browser, you can't. Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Share this post Link to post Share on other sites
smiler99 0 Posted August 14, 2011 Steve, I am getting lots of vosits from users who have SIMBAR in their user agent, from what i have read it appears that these users have some sort of malware/adware on their system. Should i be concerned in any way, should i block any user with SIMBAR in their user agent. Share this post Link to post Share on other sites
germ 233 Posted August 14, 2011 Blocking people because of "this, that, or the other thing" is a never ending endeavor because "this, that, or the other thing" is constantly changing. Either your site is secure or it isn't. If it's secure you don't have to worry. If it isn't, sooner or later someone will break in before you have the chance to block them because of "this, that, or the other thing". :blush: Just my 2 cents. Take it or leave it. :) If I suggest you edit any file(s) make a backup first - I'm not perfect and neither are you. "Given enough impetus a parallelogramatically shaped projectile can egress a circular orifice." - Me - "Headers already sent" - The definitive help "Cannot redeclare ..." - How to find/fix it SSL Implementation Help Like this post? "Like" it again over there > Share this post Link to post Share on other sites
♥DAVID3733 9 Posted August 31, 2011 Hi There I too have a MSN bot that is showing in my whos online 3.5.4 as a customer rather than a bot, not sure why, i have recently moved servers and have had to make many changes to get things right, this is one of them but i cant work out why, i have downloaded the latest spiders.txt, any clues would be appriciated. below is the info from whos online 00:00:00 Guest msnbot-207-46-13-95.search.msn.com 09:59:52 am 09:59:52 am HTC 35H00132-00M, 35H00132-05M, BA S410 , Battery (Product) Yes Not Found Name: Guest ID: 0 IP Address: 207.46.13.95 User Agent: osCsid: e8cb6afc74dafb79a9b16df0a4b25da8 thank you David David Share this post Link to post Share on other sites
sackling 14 Posted March 13, 2013 What happened to the updates to this addon? 1 Stephan Gebbers reacted to this Share this post Link to post Share on other sites