Dealing with dumb programs (or Why our site has been sluggish recently)
A while ago, we noticed a not-so-clever program which basically said:
while (true) {
get_public_file_from_openomy();
...
}
They didn't do this to be evil (we know because we were able to find the application and see what it was doing). We were unable to get them to fix it for a variety of reasons.Over time, more and more people downloaded this application, making it effectively a DDOS attack on Openomy. The funny thing is, because it's just a user's public file and therefore subject to bandwidth limitations, it's virtually useless 99% of the time. We've tried a variety of strategies to deal with this file, so I figured we'd talk about them.
Step 1: Use .htaccess to block some IPs
At the beginning, just a few people were using the program. The IPs remained relatively static, so we would just add the IP to an .htaccess file which we then blocked. This was an extremely simple solution that we hoped would stop the proliferation of the program and make things back off.
Unfortunately, it didn't. People continued to download it, and the list of IPs grew. Adding IPs to the .htaccess file really only helped us for a week or so.
Step 2: 404ing the file
Since the file is virtually useless (due to the bandwidth overages) anyways, we decided we would just make the URL 404. We could do this at the mod_rewrite level and then no further processing would need to be done.
This actually lasted for a couple months, until the increased load once again caused the site to perform sluggishly.
Step 3: Stopping KeepAlives
One thing we noticed about 404ing the file was that although the requests weren't taking long to process, many of our available sockets were being taken up for long periods of time by these requests due to HTTP Keep-Alives. So, we turned off all KeepAlives for the site. This could have hurt some performance for our other pages, but because our site is so lightweight, the net gain was actually very large. We were once again able to process many more connections.
This, too, helped us for another couple months. That is, up until these past few days.
Step 4: (Now testing) iptables
The amount of clients requesting this bogus, unavailable file is again becoming too many for our web servers to handle. Moreover, the clients are all accessing the file many times. However, it does seem as though many of the IPs stick around for a few days, which is nice.
So, we've decided to run a periodic cron job which parses our log files for the past week, find the IP addresses that have requested our non-existant file (with some threshold), and create an iptable filter to drop all requests from said IP addresses. This obviously won't cover everyone requesting this file, but hopefully it will keep the amount of requests low enough that our servers can continue to serve requests to others with good performance.
This works better than the .htaccess file because our web server doesn't even begin to touch these requests.
I ended up writing a ruby script to perform this for us, but I started out with just some simple shell-foo. An example of how to find all the IPs accessing a certain log file would be like:
grep pattern filename.log | cut -d' ' -f1 | sort | uniq -c
The site seems to be running pretty well right now, but we've only had this out for a few hours. We'll need to watch this over time to see how it continues to perform and scale. One issue we currently see is a large amount of connections in the TIME_WAIT state. I'm unsure how this will affect us, but we'll be keeping an eye on it.Hopefully this gives some insight into what we're facing, helps some people who face this problem in the future, and perhaps we'll learn something about how to make this even better going forward!
