Jump to content


Corporate Sponsors


Latest News: (loading..)

* * * * * 1 votes

Updated spiders.txt Official Support Topic


595 replies to this topic

#581 Androider

  • Community Member
  • 4 posts
  • Real Name:Kim

Posted 02 July 2010, 01:52

I'm a newbie at technical stuff..
But I was able to find this from cpanel AWSTATS.
static-208-80-193-39.as13448.com

#582 stevel

  • Community Member
  • 2,833 posts
  • Real Name:Steve Lionel
  • Location:New Hampshire, USA

Posted 02 July 2010, 10:35

That's not the user agent string. You want a line that looks something like this:

220.181.7.44 - - [12/Apr/2010:02:32:03 -0400] "GET /robots.txt HTTP/1.1" 200 451 www.example.com "-" "Baiduspider+(+http://www.baidu.com/search/spider.htm)" "-"

See that string that starts "Baiduspider"? That's the user agent. If you're using awstats, you should be able to locate the access log.

If you want to block Yandex entirely - and posts I have read suggest that is a good idea, add this to your robots.txt:

User-agent: Yandex
Disallow: /

#583 Andreas2003

  • Community Member
  • 312 posts
  • Real Name:Andreas

Posted 02 July 2010, 12:18

Got something on my site, which I'm not familiar with:

Name: 0.83
IP-address is changing, but a lot from different comcast-nodes like "c-66-41-29-213.hsd1.mn.comcast.net".

No session, no referrer.

I searched through my spiders.txt, but did not found anything like "0.83".

Do anyone of you know, if this is a real "bot" or someone too interested in my site?

Thanks in advance,
regards
Andreas

#584 Andreas2003

  • Community Member
  • 312 posts
  • Real Name:Andreas

Posted 02 July 2010, 13:10

View PostAndreas2003, on 02 July 2010, 12:18, said:

Name: 0.83
Got some more information:
http://www.80legs.com/spider.html

I blocked it through robots.txt:
User-agent: 008
Disallow: /
Hope, that will work.

Edited by Andreas2003, 02 July 2010, 13:12.


#585 surcie

  • Community Member
  • 31 posts
  • Real Name:sf

Posted 23 July 2010, 00:25

I have this not recognized spider:

msnbot-207-46-12-118.search.msn.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
IP: 207.46.12.118

#586 stevel

  • Community Member
  • 2,833 posts
  • Real Name:Steve Lionel
  • Location:New Hampshire, USA

Posted 23 July 2010, 01:26

Well, that's a bit odd. While the hostname has msnbot in it, the user agent string just looks like MSIE. There's no way to detect that as a bot.

#587 surcie

  • Community Member
  • 31 posts
  • Real Name:sf

Posted 24 July 2010, 16:18

So it's a pc user from msn services?? Can this be considered a risk in security?? thanks in advance

#588 stevel

  • Community Member
  • 2,833 posts
  • Real Name:Steve Lionel
  • Location:New Hampshire, USA

Posted 24 July 2010, 20:42

A security risk? No more than any other PC. The thing to look at is if this "user" went around your site adding items to a cart. How many pages did it visit at this time? Do you see a session ID in all the URLs or maybe just one or two?

Remember that the purpose of spiders.txt is NOT to prevent bots from visiting your site - it's to keep session IDs out of search engine indexes and to prevent them from doing things that require a session.

#589 spoofy

  • Community Member
  • 181 posts
  • Real Name:Spoofy

Posted 07 February 2011, 01:23

Hey Steve,

Should we go ahead and add the new Bing/Yahoo bot called "bingbot"

#590 odleyart

  • Community Member
  • 67 posts
  • Real Name:Marc Odley

Posted 18 April 2011, 10:14

We have installed a site search engine and would like to add our own site spider to the list. Anyone know how this can be done?

#591 etzeppy

  • Community Member
  • 101 posts
  • Real Name:Keith

Posted 26 April 2011, 17:50

I am using spiders.txt dated 04-17-2010, which I believe is the most recent. It is not detecting the following bot:

User Agent: Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

I thought adding "bingbot" (without quotes) to spiders.txt would allow detection but that did not seem to work. I actually thought that one of the existing strings would catch it but this bot is showing up in Who's Online as a customer. Can someone please tell me what string needs to be in spiders.txt to allow proper detection?

Thanks

#592 smiler99

  • Community Member
  • 59 posts
  • Real Name:chris
  • Gender:Male

Posted 12 May 2011, 18:49

View Postetzeppy, on 26 April 2011, 17:50, said:

I am using spiders.txt dated 04-17-2010, which I believe is the most recent. It is not detecting the following bot:

User Agent: Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

I thought adding "bingbot" (without quotes) to spiders.txt would allow detection but that did not seem to work. I actually thought that one of the existing strings would catch it but this bot is showing up in Who's Online as a customer. Can someone please tell me what string needs to be in spiders.txt to allow proper detection?

Thanks

gbot picks up this spider - line 27 in spiders.txt (presuming you havnt changed the order of the bots from the original file).

my Whos online registers User Agent: Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) as a bot

Smiler

Edited by Jan Zonjee, 12 May 2011, 22:31.
spamming


#593 stevel

  • Community Member
  • 2,833 posts
  • Real Name:Steve Lionel
  • Location:New Hampshire, USA

Posted 08 June 2011, 23:50

View Postodleyart, on 18 April 2011, 10:14, said:

We have installed a site search engine and would like to add our own site spider to the list. Anyone know how this can be done?

You need to know what "user agent" string the spider supplies when making the http request. It would, ideally, have some part of it that can be used to identify it as a bot. If the UA string includes "bot/" or "/bot" that would do the trick. If it doesn't fit the pattern of any of the existing strings, then figure out what would identify it (without a false positive on a legitimate browser) and add the string to the spiders.txt file.

If your search engine supplies a generic UA or one that matches that of a browser, you can't.

#594 smiler99

  • Community Member
  • 59 posts
  • Real Name:chris
  • Gender:Male

Posted 14 August 2011, 16:04

Steve,

I am getting lots of vosits from users who have SIMBAR in their user agent, from what i have read it appears that these users have some sort of malware/adware on their system. Should i be concerned in any way, should i block any user with SIMBAR in their user agent.

#595 germ

  • Community Member
  • 13,559 posts
  • Real Name:Jim
  • Gender:Male
  • Location:USA (GMT-6)

Posted 14 August 2011, 16:26

Blocking people because of "this, that, or the other thing" is a never ending endeavor because "this, that, or the other thing" is constantly changing.

Either your site is secure or it isn't.

If it's secure you don't have to worry.

If it isn't, sooner or later someone will break in before you have the chance to block them because of "this, that, or the other thing".
:blush:

Just my 2 cents.

Take it or leave it.
:)
If I suggest you edit any file(s) make a backup first - I'm not perfect and neither are you.

"Headers already sent" - The definitive help

"Cannot redeclare ..." - How to find/fix it

SSL Implementation Help

Like this post? "Like" it again over there >

#596 DAVID3733

  • Community Member
  • 155 posts
  • Real Name:DAVID
  • Location:essex UK

Posted 31 August 2011, 09:05

Hi There

I too have a MSN bot that is showing in my whos online 3.5.4 as a customer rather than a bot,

not sure why, i have recently moved servers and have had to make many changes to get things right, this is one of them but i cant work out why, i have downloaded the latest spiders.txt, any clues would be appriciated.
below is the info from whos online

00:00:00 Guest msnbot-207-46-13-95.search.msn.com 09:59:52 am 09:59:52 am HTC 35H00132-00M, 35H00132-05M, BA S410 , Battery (Product) Yes Not Found Name: Guest
[img]http://www.batterytrader.com/rabbithutch2010/images/pixel_trans.gif[/img]
ID: 0
[img]http://www.batterytrader.com/rabbithutch2010/images/pixel_trans.gif[/img]
IP Address: 207.46.13.95
[img]http://www.batterytrader.com/rabbithutch2010/images/pixel_trans.gif[/img]
User Agent:
[img]http://www.batterytrader.com/rabbithutch2010/images/pixel_trans.gif[/img]
osCsid: e8cb6afc74dafb79a9b16df0a4b25da8






thank you

David
David