Updated spiders.txt Official Support Topic
#581
Posted 02 July 2010, 01:52
But I was able to find this from cpanel AWSTATS.
static-208-80-193-39.as13448.com
#582
Posted 02 July 2010, 10:35
220.181.7.44 - - [12/Apr/2010:02:32:03 -0400] "GET /robots.txt HTTP/1.1" 200 451 www.example.com "-" "Baiduspider+(+http://www.baidu.com/search/spider.htm)" "-"
See that string that starts "Baiduspider"? That's the user agent. If you're using awstats, you should be able to locate the access log.
If you want to block Yandex entirely - and posts I have read suggest that is a good idea, add this to your robots.txt:
User-agent: Yandex
Disallow: /
Contributions:
Country-State Selector
Login Page a la Amazon
Protection of Configuration
Updated spiders.txt
Embed Links with SID in Description
#583
Posted 02 July 2010, 12:18
Name: 0.83
IP-address is changing, but a lot from different comcast-nodes like "c-66-41-29-213.hsd1.mn.comcast.net".
No session, no referrer.
I searched through my spiders.txt, but did not found anything like "0.83".
Do anyone of you know, if this is a real "bot" or someone too interested in my site?
Thanks in advance,
regards
Andreas
#584
Posted 02 July 2010, 13:10
Andreas2003, on 02 July 2010, 12:18, said:
http://www.80legs.com/spider.html
I blocked it through robots.txt:
User-agent: 008 Disallow: /Hope, that will work.
Edited by Andreas2003, 02 July 2010, 13:12.
#585
Posted 23 July 2010, 00:25
msnbot-207-46-12-118.search.msn.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
IP: 207.46.12.118
#586
Posted 23 July 2010, 01:26
Contributions:
Country-State Selector
Login Page a la Amazon
Protection of Configuration
Updated spiders.txt
Embed Links with SID in Description
#587
Posted 24 July 2010, 16:18
#588
Posted 24 July 2010, 20:42
Remember that the purpose of spiders.txt is NOT to prevent bots from visiting your site - it's to keep session IDs out of search engine indexes and to prevent them from doing things that require a session.
Contributions:
Country-State Selector
Login Page a la Amazon
Protection of Configuration
Updated spiders.txt
Embed Links with SID in Description
#589
Posted 07 February 2011, 01:23
Should we go ahead and add the new Bing/Yahoo bot called "bingbot"
#590
Posted 18 April 2011, 10:14
#591
Posted 26 April 2011, 17:50
User Agent: Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
I thought adding "bingbot" (without quotes) to spiders.txt would allow detection but that did not seem to work. I actually thought that one of the existing strings would catch it but this bot is showing up in Who's Online as a customer. Can someone please tell me what string needs to be in spiders.txt to allow proper detection?
Thanks
#592
Posted 12 May 2011, 18:49
etzeppy, on 26 April 2011, 17:50, said:
User Agent: Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
I thought adding "bingbot" (without quotes) to spiders.txt would allow detection but that did not seem to work. I actually thought that one of the existing strings would catch it but this bot is showing up in Who's Online as a customer. Can someone please tell me what string needs to be in spiders.txt to allow proper detection?
Thanks
gbot picks up this spider - line 27 in spiders.txt (presuming you havnt changed the order of the bots from the original file).
my Whos online registers User Agent: Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) as a bot
Smiler
Edited by Jan Zonjee, 12 May 2011, 22:31.
spamming
#593
Posted 08 June 2011, 23:50
odleyart, on 18 April 2011, 10:14, said:
You need to know what "user agent" string the spider supplies when making the http request. It would, ideally, have some part of it that can be used to identify it as a bot. If the UA string includes "bot/" or "/bot" that would do the trick. If it doesn't fit the pattern of any of the existing strings, then figure out what would identify it (without a false positive on a legitimate browser) and add the string to the spiders.txt file.
If your search engine supplies a generic UA or one that matches that of a browser, you can't.
Contributions:
Country-State Selector
Login Page a la Amazon
Protection of Configuration
Updated spiders.txt
Embed Links with SID in Description
#594
Posted 14 August 2011, 16:04
I am getting lots of vosits from users who have SIMBAR in their user agent, from what i have read it appears that these users have some sort of malware/adware on their system. Should i be concerned in any way, should i block any user with SIMBAR in their user agent.
#595
Posted 14 August 2011, 16:26
Either your site is secure or it isn't.
If it's secure you don't have to worry.
If it isn't, sooner or later someone will break in before you have the chance to block them because of "this, that, or the other thing".
Just my 2 cents.
Take it or leave it.
"Headers already sent" - The definitive help
"Cannot redeclare ..." - How to find/fix it
SSL Implementation Help
Like this post? "Like" it again over there >
#596
Posted 31 August 2011, 09:05
I too have a MSN bot that is showing in my whos online 3.5.4 as a customer rather than a bot,
not sure why, i have recently moved servers and have had to make many changes to get things right, this is one of them but i cant work out why, i have downloaded the latest spiders.txt, any clues would be appriciated.
below is the info from whos online
00:00:00 Guest msnbot-207-46-13-95.search.msn.com 09:59:52 am 09:59:52 am HTC 35H00132-00M, 35H00132-05M, BA S410 , Battery (Product) Yes Not Found Name: Guest
[img]http://www.batterytrader.com/rabbithutch2010/images/pixel_trans.gif[/img]
ID: 0
[img]http://www.batterytrader.com/rabbithutch2010/images/pixel_trans.gif[/img]
IP Address: 207.46.13.95
[img]http://www.batterytrader.com/rabbithutch2010/images/pixel_trans.gif[/img]
User Agent:
[img]http://www.batterytrader.com/rabbithutch2010/images/pixel_trans.gif[/img]
osCsid: e8cb6afc74dafb79a9b16df0a4b25da8
thank you
David














