Author

Topic: How my custom DDoS protection worked, and how it could've been improved (Read 702 times)

full member
Activity: 574
Merit: 152
My recommendation, and keep in mind this is really crude, but if you have a good DDoS solution already in place for the obvious stuff, the more devious stuff is going to be something that requires using a combination of logs and data that could be compiled even from a simple awstats page. Sort IPs by page count over hit count and the closer they are to a "1" ratio the higher they increase in bandwidth over time, just auto-ban them.

You could use the time while under Cloudflare to build a good baseline for "normal" user behavior and then define that as your method. You're right about it being difficult to script up a system for this, but it could maybe be done. Your solution already sounds pretty ingenious as it is, but you're right in that it almost becomes a full-time job just keeping ahead of everything when it comes to automating it.

I don't fault you for going with Cloudflare, but even with their assurances and transparency, I still don't trust them. The government will inevitably use them for wiretapping again, and because of gag orders, they will comply. Just as they have in the past.

This is just an ELK stack + webserver logs. It'd fail due to IPv4 exhaust and NAT/tor tunnels.

---

What sort of memory cache engine are you using? Are you doing any frontend caching (varnish)? Have you considered offloading static content(see edit). Are you separating the database server from the application server? What's the actual infrastructure look like for this SMF host. Are you doing master-master SQL clustering? Are you opposed to having CDNs serve static content (such as those on ip.bitcointalk.org or the image proxy... probably the actual server's IP address).

It looks as though ip.bitcointalk.org is actually hosted on digitalocean? Is the entire site hosted on digitalocean or just the image proxy server? With more information about the internal infrastructure of bitcointalk, we can actually help to improve it rather than throwing random ass suggestions that normally involve fairly standard SIEM technologies or basic elastic computing features.

---

Edit: A bit more exploring into ip.bitcointalk.org shows a *.bitcointalk.org wildcard cert. But it doesn't seem to be serving actual bitcointalk data when I'm telling my header is bitcointalk.org =(
donator
Activity: 1419
Merit: 1015
My recommendation, and keep in mind this is really crude, but if you have a good DDoS solution already in place for the obvious stuff, the more devious stuff is going to be something that requires using a combination of logs and data that could be compiled even from a simple awstats page. Sort IPs by page count over hit count and the closer they are to a "1" ratio the higher they increase in bandwidth over time, just auto-ban them.

You could use the time while under Cloudflare to build a good baseline for "normal" user behavior and then define that as your method. You're right about it being difficult to script up a system for this, but it could maybe be done. Your solution already sounds pretty ingenious as it is, but you're right in that it almost becomes a full-time job just keeping ahead of everything when it comes to automating it.

I don't fault you for going with Cloudflare, but even with their assurances and transparency, I still don't trust them. The government will inevitably use them for wiretapping again, and because of gag orders, they will comply. Just as they have in the past.
full member
Activity: 574
Merit: 152
<...>AWS is neat, but linode could be much more cost effective to mitigate a DDoS.<...>
A bit off-topic, but Linode doesn't really have a good rep in the Bitcoin community: https://arstechnica.com/information-technology/2012/03/bitcoins-worth-228000-stolen-from-customers-of-hacked-webhost/

That's strange. I didn't notice they were compromised in 2012. I'm betting their security measures have increased many times after an attack like that. However, I was just pointing out leveraging multiple providers instead of putting all your eggs into the same basket. Sure AWS functions great now, but why not also leverage OVH, GCP, Azure, DigitalOcean or any other provider. AWS is stupid expensive iirc; paying for them to mitigate a DDoS attack might not really be cost effective.
global moderator
Activity: 3794
Merit: 2612
In a world of peaches, don't ask for apple sauce
<...>AWS is neat, but linode could be much more cost effective to mitigate a DDoS.<...>
A bit off-topic, but Linode doesn't really have a good rep in the Bitcoin community: https://arstechnica.com/information-technology/2012/03/bitcoins-worth-228000-stolen-from-customers-of-hacked-webhost/
full member
Activity: 574
Merit: 152
>The first major flaw with my setup is that it wasn't easy to change. My setup would grab a few configuration details (eg. the origin server IP) from VPC-local DNS records that I would set, but if I wanted to make deeper changes, I'd have to modify one of the instances, convert that into a new AMI, terminate all of the other instances, and then start new instances again. If I wanted to change the number of gates, I'd have to start/stop them manually and change the DNS records myself. A good solution would never require this much manual work, and would use things like auto scaling groups and CloudFormation to simplify it. It should only take a couple of minutes to add a new iptables rule, for example.


Look into some automation tooling for this. I don't think it'd be that hard to roll out changes automatically. Locking the forums into a system like cloudformation might not be the best idea.

Check into terraform to deploy on different infrastructure. AWS is neat, but linode could be much more cost effective to mitigate a DDoS.

>Another point is that you could design the system such that it does not require looking into HTTPS traffic. It can just work at the TCP layer and pass the encrypted HTTPS traffic verbatim. I'm not sure how exactly you would tunnel the real data to the real server (I previously thought that GRE tunnels would work, but somebody told me that this might not be the appropriate tool), but it should definitely be possible. The upside to this is that you can use a very powerful service like AWS without trusting them too much. The downside is that you cannot use layer 7 data for IP classification, and you cannot insert a challenge; it's either block or allow. The ideal anti-DDoS solution would give you the option of whether you want to give the gates access to your HTTPS or not.


Like a standard firewall? Layer 4 blocking?
legendary
Activity: 916
Merit: 1003
That was actually pretty interesting.  At least you learned a lot from the experience!
I will say the site is very snappy now.
administrator
Activity: 5222
Merit: 13032
About a year ago I created my own homebrew DDoS protection. Here's how it worked:

Set up several smallish EC2 instances. Each one acts as a reverse proxy to the origin server:

Code:
client1-->gate1\
client2-->gate2->--->origin server
client3-->gate3/

It's just an nginx reverse proxy. Pass the real IP in the X-Real-IP header, etc. Easy.

Each gate will have iptables and nginx rules to detect easy attacks (eg. rate limiting). Importantly, they all must have SYNPROXY rules, a feature of modern Linux kernels. Having SYNPROXY iptables rules over several gateways like this completely defeats all SYN flood attacks.

The gateways need to be in an AWS VPC set to block all UDP traffic in the VPC's stateless traffic settings. This completely blocks all UDP flood attacks. If you instead block UDP traffic in the gates' security groups, then very large UDP floods can still affect you. It has to be at the VPC level.

I found that the best way to set up the DNS to distribute traffic was like this. Assume that you have 4 gates, g1 through g4. Then using Route 53's weighted record feature, you would have the DNS return at random one of the following 5 pairs of IPs, each with a TTL of 5 minutes:
g1&g2
g2&g3
g3&g4
g4&g1

This seems to work better than just putting all of the gate IPs into one A record. I think that the randomization that should happen in that case actually gets cached at some points, and so whichever record is returned first at gets hit harder.

Additionally, I had a system of classifying and blocking malicious-looking IPs, but it failed to work well enough in the end, so I'm not going to describe it in detail.

So that's my homebrew DDoS protection that we were using for the last year or so. It worked impressively well against many attacks which you might think would require something like Cloudflare, but failed in the end against attackers with thousands of IPs, making full TCP connections, who can blend into the legit traffic too well. A more complete solution which could replace Cloudflare etc. in many ways would look more like this:

-----

The first major flaw with my setup is that it wasn't easy to change. My setup would grab a few configuration details (eg. the origin server IP) from VPC-local DNS records that I would set, but if I wanted to make deeper changes, I'd have to modify one of the instances, convert that into a new AMI, terminate all of the other instances, and then start new instances again. If I wanted to change the number of gates, I'd have to start/stop them manually and change the DNS records myself. A good solution would never require this much manual work, and would use things like auto scaling groups and CloudFormation to simplify it. It should only take a couple of minutes to add a new iptables rule, for example.

The second major flaw with my setup is that it lacked a good, systematic way of classifying IPs as good/bad/neutral. All of the gates should collect long-term stats on every IP which connects to them and contribute it to a central database. Using some sort of model over the data in the central IP database, it should then be able to determine whether an IP address is probably good (because it's been acting like a normal person browsing the site for a long time), probably bad (because it eg. just started requesting tons of pages), or unknown/neutral. Then based on that classification plus an idea of how busy the site currently is, it can block an IP, allow an IP, or insert a Cloudflare-style captcha challenge for an IP. If you pass the challenge, the system sets a cookie on you which whitelists you for several days.

For the forum to go back to a homebrew solution from Cloudflare, the above two pieces would need to be very-well-satisfied.

Another point is that you could design the system such that it does not require looking into HTTPS traffic. It can just work at the TCP layer and pass the encrypted HTTPS traffic verbatim. I'm not sure how exactly you would tunnel the real data to the real server (I previously thought that GRE tunnels would work, but somebody told me that this might not be the appropriate tool), but it should definitely be possible. The upside to this is that you can use a very powerful service like AWS without trusting them too much. The downside is that you cannot use layer 7 data for IP classification, and you cannot insert a challenge; it's either block or allow. The ideal anti-DDoS solution would give you the option of whether you want to give the gates access to your HTTPS or not.
Jump to: