We had a huge issue with the rapid and intermitant attack of some search bots, that just battered a site into submission causing no end of hair pulling.
We had started off playing mr nice guy, and adding entries into the robot.txt file like
User-agent: NaverBot
Crawl-Delay: 10
But that did no good, so next was:
User-agent: NaverBot
Disallow: /
Still nothing, but at the time whilst you could ban requests based on IP address in Cloudflare, we were struggling trying to find a solution that would ban requests based on useragent before they hit application level. We played around installing a number of open source solutions including nginx-ultimate-bad-bot-blocker but ended up using a simple albiet brute force technique
We added this to the nginx site config, which did the trick nicely
if ($http_user_agent ~* (Sogou|YoudaoBot|Baiduspider|360Spider|NaverBot|Yeti|ichiro|Yandex|SemrushBot)) {
return 403;
}
Obviously this all assumes they identify themselves in the correct way
Since this, Clouldflare have introduced the ability to block requests based on user agent - https://support.cloudflare.com/hc/en-us/articles/115001856951-How-do-I-block-malicious-User-Agents-with-Cloudflare- which makes life a whole lot simpler to setup and maintain moving forward.
Comments