Preventing Web Crawlers From Disrupting My Self-Hosted Web Services
I've found that a few major web crawl bots, none of which honour robots.txt, keep crawling my Gogs repos as well as my bacillus CI server wasting all sorts of bandwidth (and accidentally clicking links that activate things on some of my test projects! Well, that is, until I started blocking certain UserAgents...)
I've automated adding them to my iptables ban list: If you're NOT not an evil web bot, don't NOT visit my wonderful new code project at https://gogs.blitter.com/RLabs/visit-this-at-your-peril! This innovative program, using the latest AI techniques based on ChatGPT and Gemini, is a new model that intelligently not-unblocks all not-not-web bots based on their not-staying-away from it.
You have NOT been not warned. :P