Dealing with dumb programs (or Why our site has been sluggish recently)
Some people have noted the recent slowness of the Openomy web site. To be clear, the API has been fine, as have been downloading files, but the web site has been extremely sluggish. The reason is we've essentially been DDOS'd by a dumb program. I think we've begun to get it under control, but I wanted to shed some light on what happened and how we've controlled it. We're no experts in this field, so I'd love to hear about what we did right/wrong or could improve upon in these situations.
A while ago, we noticed a not-so-clever program which basically said:
Over time, more and more people downloaded this application, making it effectively a DDOS attack on Openomy. The funny thing is, because it's just a user's public file and therefore subject to bandwidth limitations, it's virtually useless 99% of the time. We've tried a variety of strategies to deal with this file, so I figured we'd talk about them.
Step 1: Use .htaccess to block some IPs
At the beginning, just a few people were using the program. The IPs remained relatively static, so we would just add the IP to an .htaccess file which we then blocked. This was an extremely simple solution that we hoped would stop the proliferation of the program and make things back off.
Unfortunately, it didn't. People continued to download it, and the list of IPs grew. Adding IPs to the .htaccess file really only helped us for a week or so.
Step 2: 404ing the file
Since the file is virtually useless (due to the bandwidth overages) anyways, we decided we would just make the URL 404. We could do this at the mod_rewrite level and then no further processing would need to be done.
This actually lasted for a couple months, until the increased load once again caused the site to perform sluggishly.
Step 3: Stopping KeepAlives
One thing we noticed about 404ing the file was that although the requests weren't taking long to process, many of our available sockets were being taken up for long periods of time by these requests due to HTTP Keep-Alives. So, we turned off all KeepAlives for the site. This could have hurt some performance for our other pages, but because our site is so lightweight, the net gain was actually very large. We were once again able to process many more connections.
This, too, helped us for another couple months. That is, up until these past few days.
Step 4: (Now testing) iptables
The amount of clients requesting this bogus, unavailable file is again becoming too many for our web servers to handle. Moreover, the clients are all accessing the file many times. However, it does seem as though many of the IPs stick around for a few days, which is nice.
So, we've decided to run a periodic cron job which parses our log files for the past week, find the IP addresses that have requested our non-existant file (with some threshold), and create an iptable filter to drop all requests from said IP addresses. This obviously won't cover everyone requesting this file, but hopefully it will keep the amount of requests low enough that our servers can continue to serve requests to others with good performance.
This works better than the .htaccess file because our web server doesn't even begin to touch these requests.
I ended up writing a ruby script to perform this for us, but I started out with just some simple shell-foo. An example of how to find all the IPs accessing a certain log file would be like:
Hopefully this gives some insight into what we're facing, helps some people who face this problem in the future, and perhaps we'll learn something about how to make this even better going forward!
A while ago, we noticed a not-so-clever program which basically said:
while (true) {
get_public_file_from_openomy();
...
}
They didn't do this to be evil (we know because we were able to find the application and see what it was doing). We were unable to get them to fix it for a variety of reasons.Over time, more and more people downloaded this application, making it effectively a DDOS attack on Openomy. The funny thing is, because it's just a user's public file and therefore subject to bandwidth limitations, it's virtually useless 99% of the time. We've tried a variety of strategies to deal with this file, so I figured we'd talk about them.
Step 1: Use .htaccess to block some IPs
At the beginning, just a few people were using the program. The IPs remained relatively static, so we would just add the IP to an .htaccess file which we then blocked. This was an extremely simple solution that we hoped would stop the proliferation of the program and make things back off.
Unfortunately, it didn't. People continued to download it, and the list of IPs grew. Adding IPs to the .htaccess file really only helped us for a week or so.
Step 2: 404ing the file
Since the file is virtually useless (due to the bandwidth overages) anyways, we decided we would just make the URL 404. We could do this at the mod_rewrite level and then no further processing would need to be done.
This actually lasted for a couple months, until the increased load once again caused the site to perform sluggishly.
Step 3: Stopping KeepAlives
One thing we noticed about 404ing the file was that although the requests weren't taking long to process, many of our available sockets were being taken up for long periods of time by these requests due to HTTP Keep-Alives. So, we turned off all KeepAlives for the site. This could have hurt some performance for our other pages, but because our site is so lightweight, the net gain was actually very large. We were once again able to process many more connections.
This, too, helped us for another couple months. That is, up until these past few days.
Step 4: (Now testing) iptables
The amount of clients requesting this bogus, unavailable file is again becoming too many for our web servers to handle. Moreover, the clients are all accessing the file many times. However, it does seem as though many of the IPs stick around for a few days, which is nice.
So, we've decided to run a periodic cron job which parses our log files for the past week, find the IP addresses that have requested our non-existant file (with some threshold), and create an iptable filter to drop all requests from said IP addresses. This obviously won't cover everyone requesting this file, but hopefully it will keep the amount of requests low enough that our servers can continue to serve requests to others with good performance.
This works better than the .htaccess file because our web server doesn't even begin to touch these requests.
I ended up writing a ruby script to perform this for us, but I started out with just some simple shell-foo. An example of how to find all the IPs accessing a certain log file would be like:
grep pattern filename.log | cut -d' ' -f1 | sort | uniq -c
The site seems to be running pretty well right now, but we've only had this out for a few hours. We'll need to watch this over time to see how it continues to perform and scale. One issue we currently see is a large amount of connections in the TIME_WAIT state. I'm unsure how this will affect us, but we'll be keeping an eye on it.Hopefully this gives some insight into what we're facing, helps some people who face this problem in the future, and perhaps we'll learn something about how to make this even better going forward!

9 Comments:
With a bit of PHP you could easily write a script that logs the IP addresses of those who request this particular URL and, with some use of crontab, cycle this in to your list of ignored addresses. Could be done as often as every minute if you needed to.
It's a bit of a hack, really, compared to fixing the problem at the source, but seems better than being drowned.
If it's becoming a real problem and the party responsible for all of this traffic can't be bothered to fix it, you might try experimenting with what you can send back in the hopes of crashing or breaking the requesting process.
Care to share that Ruby script? I wouldn't mind checking it out.
Use a REJECT in iptables, not a DROP. This will help with the time_wait as well as doing away with 5 minutes or retrys from the client trying to get through.
Echoing some of the above, if you replaced the mod_rewrite 404 block with a teensy script which communicated with your iptables-modifying process immediately, every client would get "one and only one" hit. You could write to a FIFO which has a ruby (or pick your language) daemon listening to it and manages the iptables set. (or a network connection to a port instead of a FIFO, but that increases the chances of a REALLY nasty DDOS if it's not nice and locked down.) For a bonus, as you scale sideways, this little daemon can handle notifying all the rest of the servers involved.
In web hosting we do something very similar for SSH/FTP brute forcers, bulk spammers, etc. It's a lifesaver. We actually store the 'bad' IP's in a database with ban_start_time, ban_end_time, number_of_bans, so you can automatically expire a ban after a bit, blow it out if you want to, scale up duration if people keep screwing up, and allow querying of the db so you can see when/who/why/how long if someone calls in wondering why they can't ssh. Then when you make a change you just have to notify the clients to pull a fresh config. That's a lot easier to do securely as all you have to say is 'go check the master now' instead of 'block this IP'.
Also yes, use the REJECT. Instead of just silently ignoring the person's packets you send them back a notify that the port is closed to them.
Also, you probably want to put these in their own chain if you haven't already. It makes the config a lot more manageable and when you reload you can just reload that chain, and not worry about mucking up your 'real' site. Just jump to it at the very end of the input chain.
I have no advice. I just wanted to say how glad I am you're tackling this problem. I really enjoy Openomy. Keep up the good work.
raretodd
If your server is involved in Interstate Commerce (sells ad space or anything involving money changing hands), and its functioning is being adversely impacted by a software program distributed by another, that is a Federal felony punishable by 5 years incarceration and a $250,000 fine per count. Call the FBI, it'd be low-hanging fruit for one of their cybersquads.
is that "public file" static content?
then you could use a proxy like nginx to handle that
I would like to ask it, that your blog from what is PR8 ?
http://www.webtelefonkonyv.hu
www.webtelefonkonyv.hu www.gsmszeged.hu
see these !
Post a Comment
<< Home