AWWWW.WAHWAH Posted March 17, 2007 Share Posted March 17, 2007 That's an IP, not a user agent. What's the user agent string from the access log? That IP is not assigned to a specific domain. It is PSI, Performance Systems International? Within minutes it is pulling up every page. I basically went and blocked it completely off my site. Then varients of that IP started crawling. Quote Link to comment Share on other sites More sharing options...
AWWWW.WAHWAH Posted March 17, 2007 Share Posted March 17, 2007 http://www.showmyip.com/?ip=38.98.120.75 Quote Link to comment Share on other sites More sharing options...
ruboo Posted March 26, 2007 Share Posted March 26, 2007 Fine. Adding that one to spiders.txt would not accomplish anything anyway. But if you do see spiders getting session IDs. then by all means let me know! I just posted an update to the contrib - the rate of new spiders has fallen off quite a bit - I had not seen a new one for a couple of months. Hi Stevel, There is one bot, which gets session ID on my site and added almost all products to shoping cart and is always there: 217.106.233.192 it is webmoney bot i guess Is it possible to add it to spiders list to prevent it from getting sessions? I added webmone or money to spider txt file, but without any luck Quote Link to comment Share on other sites More sharing options...
stevel Posted March 26, 2007 Author Share Posted March 26, 2007 What is the user agent from your access log? If you want to block a specific IP, you can do that in the .htaccess file with the line: DenyFrom x.x.x.x There's no reverse DNS (not even a domain name) associated with that IP so I don't know what else to advise you. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
ruboo Posted March 26, 2007 Share Posted March 26, 2007 What is the user agent from your access log? If you want to block a specific IP, you can do that in the .htaccess file with the line: DenyFrom x.x.x.x There's no reverse DNS (not even a domain name) associated with that IP so I don't know what else to advise you. looks like this: 217.106.233.192 - - [26/Mar/2007:05:04:47 -0400] Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.12) Gecko/20060824/1.7.12 (Mozilla; http://mozilla.org; [email protected]) Quote Link to comment Share on other sites More sharing options...
stevel Posted March 26, 2007 Author Share Posted March 26, 2007 I don't see anything there you can safely use in spiders.txt. You'll have to block by IP in .htaccess. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
warrenerjm Posted March 29, 2007 Share Posted March 29, 2007 Hi I have an IP address 213.123.219.228 who has gone beyond a customer & I suspect is a spider...with a massive basket & with a session :angry: First time I have had this happen. Name: Guest ID: 0 IP Address: 213.123.219.228 User Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) I have tried to look at my logs (which I have no clue about :blink: ) & it only goes up to dec 06. I do not know how to look at the logs for now. I downloaded the one from the control panel of my host & that was the last date showing. Please can you tell me what to do? Urgent & totally confused. Thanks Julie Quote Link to comment Share on other sites More sharing options...
warrenerjm Posted March 29, 2007 Share Posted March 29, 2007 Now it is creating loads of baskets! I don't know what is happening, but would appreciate any suggestions. Thanks Julie :thumbsup: Quote Link to comment Share on other sites More sharing options...
warrenerjm Posted March 29, 2007 Share Posted March 29, 2007 Me again! :lol: It shows btopenworld.com & this is possibly bt-yahoo.com etc Does this help with a spider identification? Quote Link to comment Share on other sites More sharing options...
stevel Posted March 29, 2007 Author Share Posted March 29, 2007 No, it doesn't help. The user agent string looks like a generic web browser, but it's easy and common for non-well-behaved spiders to pretend to be a web browser. If it's a single IP that is the problem, add a DenyFrom line in .htaccess to block it. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
warrenerjm Posted March 29, 2007 Share Posted March 29, 2007 No, it doesn't help. The user agent string looks like a generic web browser, but it's easy and common for non-well-behaved spiders to pretend to be a web browser. If it's a single IP that is the problem, add a DenyFrom line in .htaccess to block it. Thanks Steve If it is Yahoo (BT use Yahoo for their search engine) it would be preferential not to block it. Is BT being a naughty spider! tut tut I have the amendment in "some" file :blush: suggested by Boxtel for getting these sessions removed, & I have also asked my host if they can get my logs for me. When they do I will try & give you the info needed. It was really weird seeing it & I panicked a bit too! Failing this I will block it as it isn't doing me any good I take it? Thanks for your help. :thumbsup: Julie Quote Link to comment Share on other sites More sharing options...
stevel Posted March 29, 2007 Author Share Posted March 29, 2007 It's not Yahoo - it's a BT customer who is running their own spider. Ask BT if they would ask the user for that IP to please stop spidering other sites. You should simply block the IP. Yahoo uses its own IPs and distinctive user agents for its spiders. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
warrenerjm Posted March 29, 2007 Share Posted March 29, 2007 It's not Yahoo - it's a BT customer who is running their own spider. Ask BT if they would ask the user for that IP to please stop spidering other sites. You should simply block the IP. Yahoo uses its own IPs and distinctive user agents for its spiders. :huh: :angry: Why would a BT customer want to use a spider to search other sites? I am so niave! Is there a special place or way to add DenyFrom 213.123.219.228 to the .htaccess file please as I have never had to do this before? Thanks Julie Quote Link to comment Share on other sites More sharing options...
stevel Posted March 29, 2007 Author Share Posted March 29, 2007 (edited) Well, there are a number of possibilities. One is that they're looking for insecure sites to exploit. Another is that they're playing with spidering sites. A third, and perhaps more innocuous, is that they're using a tool to fetch an entire site, though these USUALLY have their own user agent string. If you have a local copy of the .htaccess file (which you should), add that line to the bottom of the file then upload it to the site, being sure to use ASCII mode. If you don't have it, download it in ASCII mode. You may find it easier to rename the file as htaccess.txt on your local computer, upload it that way, delete the old .htaccess and rename the new one to .htaccess. Sometimes on Windows files with no file name and just a type cause problems. Edited March 29, 2007 by stevel Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
warrenerjm Posted March 29, 2007 Share Posted March 29, 2007 Well, there are a number of possibilities. One is that they're looking for insecure sites to exploit. Another is that they're playing with spidering sites. A third, and perhaps more innocuous, is that they're using a tool to fetch an entire site, though these USUALLY have their own user agent string. If you have a local copy of the .htaccess file (which you should), add that line to the bottom of the file then upload it to the site, being sure to use ASCII mode. If you don't have it, download it in ASCII mode. You may find it easier to rename the file as htaccess.txt on your local computer, upload it that way, delete the old .htaccess and rename the new one to .htaccess. Sometimes on Windows files with no file name and just a type cause problems. OK I am lost! :blush: Is it ok to show you mt .htaccess file as I have forthe first time looked at it & don't know where to put this line. :-" Not sure whether I'm using ASCII, although I have noticed it switch to binary when I FTP...or am I really lost! Thanks Julie Quote Link to comment Share on other sites More sharing options...
stevel Posted March 29, 2007 Author Share Posted March 29, 2007 Yes, you can show the .htaccess. Many FTP programs look at the file type to decide the mode, so you may have to explicitly tell it ASCII. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
kunal247 Posted April 16, 2007 Share Posted April 16, 2007 Hi, I have been usign this contribution for 2 urs now and all seems to be working well. last few days I have noticed a Search Engine Spider / a Hacker on my site but IP address is coming unknown Online Name IP Address Entry Last Click Last URL Session? Referer? 00:02:30 Guest 81.10.82.136 08:27:20 08:29:50 /hp-nw9440-cd-t2700-233g-2gb-100g-ey616eaabu-p-49234.html Yes Yes 00:00:00 Guest 87.75.128.182 08:29:09 08:29:09 (Product) Yes Not Found 00:00:00 Guest 87.75.128.182 08:28:12 08:28:12 (Product) Yes Not Found 00:00:00 Guest unknown 08:28:05 08:28:05 Pioneer S-V40UK (Product) Yes Not Found 00:00:00 Guest unknown 08:28:03 08:28:03 Sandisk Memory Stick - SDMSH-64-E10 (Product) Yes Not Found 09:00:42 Mozilla 66.249.72.195 23:26:33 08:27:15 /ctx-m-110.html?sort=4d&page=1 No Not Found 00:00:00 Guest 192.168.1.72 08:26:51 08:26:51 LAMP FOR TOSHIBA TDP-MT700 PROJECTOR (Product) Yes Not Found 00:00:00 Guest unknown 08:26:14 08:26:14 Lenovo ThinkCentre A52 P4 2.8 512 80 DVD XPP - VSA72UK (Product) Yes Not Found 00:01:12 Guest 194.73.121.7 08:24:45 08:25:57 /sharp-xvz3000-p-46493.html Yes Yes 00:00:00 Mozilla 74.6.67.158 08:25:33 08:25:33 /samsung-ppm42m5s-42-silver-plasma-screen-p-8705.html No Not Found 00:01:14 ShopWiki 38.98.120.87 08:24:07 08:25:21 /acer-al1722hs-etl0408073-p-7320.html No Not Found 00:00:00 Guest unknown 08:24:31 08:24:31 Sandisk 1GB Memory Stick - SDMSPD-1024-E10M (Product) Yes Not Found 00:00:00 Mozilla 74.6.74.83 08:23:32 08:23:32 /sahara-s2000-p-26188.html No Not Found 00:00:00 Mozilla 193.47.80.42 08:22:18 08:22:18 Audica Tower CS-T1 Silver (pair) (Product) No Not Found How can i stop this as it seems to be creating a session? Please help!! Quote Link to comment Share on other sites More sharing options...
stevel Posted April 16, 2007 Author Share Posted April 16, 2007 I would say that you need to ask the author of the enhanced "Who's online" contrib you are using to ask why "unknown" is shown. As for spiders.txt, I'd need to see access log entries for these references to see what there is that can be blocked. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Herpaddict87 Posted April 20, 2007 Share Posted April 20, 2007 (edited) First, thanks Stevel for the contrib. Looks like a lot of work. 2nd, can you explain to me what the difference is between these are: ! *******************Best Spiders List***********************! ! architext spider ! ask jeeves ! crawler ! crawle etc... and ! ****************knocker Spiders List!**********************! .bot /bot /teoma _bot abcdatos abot accoona acme acoon etc... Im sorry I didnt really know what to do with this file but replace the stock one in my /includes folder (and prevent spider sessions of course), but am I seeing ! googlebot commented out? (the ones on top all are within !) Thanks! HerpAddict 87 Edited April 20, 2007 by Herpaddict87 Quote Link to comment Share on other sites More sharing options...
stevel Posted April 20, 2007 Author Share Posted April 20, 2007 The text you show is not from my contrib, I can't comment on it. I have seen some other lists which, in some cases, show that the authors don't understand how spiders.txt is processed. In my contrib, the string "ebot" catches googlebot. I'll comment that all spider strings in spiders.txt must be lowercase and that extra comment lines slow down processing. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Herpaddict87 Posted April 21, 2007 Share Posted April 21, 2007 Thanks Stevel! Now I understand my own confusion! I must have been looking at a different contrib. I now have your latest .txt uploaded. :) HerpAddict87 Quote Link to comment Share on other sites More sharing options...
kunal247 Posted May 17, 2007 Share Posted May 17, 2007 Today getting a lot of spiders on my site addding items to the cart with random products from my site. IP addresses are: 86.142.246.187 208.99.195.54 How to stop them?? Quote Link to comment Share on other sites More sharing options...
stevel Posted May 17, 2007 Author Share Posted May 17, 2007 If you're interested in using spiders.txt to stop these, then you need the user agent string from the access logs. IPs aren't helpful. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
kunal247 Posted May 17, 2007 Share Posted May 17, 2007 If you're interested in using spiders.txt to stop these, then you need the user agent string from the access logs. IPs aren't helpful. What is the best way to show the acces_Log? The file size is about 3MB and too big to copy on here. Also how do we read the log to find out who to block using spiders.txt Also I am getting a spider on my site which creates sessions but IP address comes up as "unknown" on Who Is online. Pls help!! Kunal Quote Link to comment Share on other sites More sharing options...
stevel Posted May 17, 2007 Author Share Posted May 17, 2007 Just post the lines showing accesses by the IPs you are worried about. Just one line from each would be fine. You're looking for the User Agent string which, for normal users, shows the name of the browser. For well-behaved spiders, it will have an identification such as Googlebot. There are also individuals who use generic software to create their own spiders, sometimes for not nice reasons. Those may be hard to identify, sometimes they pretend to be a regular web browser. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.