Jump to content

Archived

This topic is now archived and is closed to further replies.

roadkill

block bots and harvesters on pagerequest frequency

Recommended Posts

Hi

 

My capabilities with PHP scripting are limited so i thought i would best drop the question (or request) here.

I don't know if a contribtion like the one i am about to propose already exists but:

 

I occasionally see bots visiting my store, requesting about a page or more per second.

I have crawl-interval set at 5 seconds in my robots.txt but these malicious bots tend to ignore that.

 

These are mostly hackers or harvesters.

 

I would like to have a script in my application that blocks visitors after requesting 15 (or a flexible amount set in admin) pages in 15 (or admin set) seconds. I already use the bot-trap add on which blocks nosy bots but that does not block on page requests.

 

Anyone?

 

Thank you!

Share this post


Link to post
Share on other sites

If you can recognize them by a consistent USER_AGENT string and/or IP address, you could simply ban them in your .htaccess file or add code to application_top.php to ban them. I presume that you know how to use whois and have confirmed that these are not desirable bots like Google and the other major search engines. If they falsify their USER_AGENT string or bounce around different IP addresses, the task will be much more difficult.

 

The whos_online table has a time_last_click field that might be of use here. It's meant to handle timed-out (expired) sessions in admin/whos_online.php (more than 900 seconds past the last activity), but it might be bent to also look for rapid-fire accesses. Just thinking out loud here. Presumably they're guests and not logged in. Maybe you could add some fields to either this table or sessions: last_interval_start_time and activity count, and you could count up accesses and after X accesses, see that at least Y seconds have elapsed (then reset to 0 accesses and the current time).

 

Even if you succeed in kicking them off based on too many accesses in too short a time, what's to keep them from coming back a few seconds later, possibly under a different USER_AGENT and/or IP address? Maybe the high-order octets of their IP address won't change that fast, just the lowest octet? Maybe you would have to append their IP address to the end of your .htaccess (the ban by IP section at the end)? Of course, you want to be careful not to ban an excessive number of genuine customers!

 

Do you see that Google and other well-behaved spiders are obeying your crawl-rate setting? That would at least reassure you that you have it correctly set up, and that these other guys are in fact rogue bots.

Share this post


Link to post
Share on other sites

thank you MRPhil, banning on user agent does not do all of the tricks and the bots i would like to block do have an extreme high crawl rate. The main crawlers respect y crawl-rate. I would like to ban bad bots on single IP per event and put their ip on somekind of a blacklist and then redirect them away from my shop.

 

Thank you Nick, I think this add-on is exactly what I am looking for. Maybe i could modify it to my exact likings, but basically this is the mod i thought it would already exist. To weird i did not find it myself ;)

Share this post


Link to post
Share on other sites

×