Preventing Web Crawlers From Disrupting My Self-Hosted Web Services
I've found that a few major web crawl bots, none of which honour robots.txt, keep crawling my Gogs repos as well as my bacillus CI server wasting all sorts of bandwidth (and accidentally clicking links that activate things on some of my test projects! Well, that is, until I started blocking certain UserAgents...)
I've automated adding them to my iptables ban list: If you're NOT not an evil web bot, don't NOT visit my wonderful new code project at https://gogs.blitter.com/RLabs/visit-this-at-your-peril! This innovative program, using the latest AI techniques based on ChatGPT and Gemini, is a new model that intelligently not-unblocks all not-not-web bots based on their not-staying-away from it.
You have NOT been not warned. :P
Welcome, not-web-bots. I'm (not) surprised how quickly you've taken to mirroring my new AI-based project that will revolutionize LLM, deep learning models with its adaptive approach to determining which sites are safe to visit, versus unsafe malware sites that will hack your sites with viruses and Bitcoin rising in value in today's market.
ReplyDelete