My former employer, @OCCRP, just went live with their new website, and it's pretty slick!
https://www.occrp.org/en
This is bitter-sweet for me.
On one hand, glad to see them have a new site, finally! The old was a mess.
OTOH: I had designed and built the infra that hosted their site through Panama Papers (arguably OCCRP's big break). It did not rely on external CDNs or "DDoS-protection" providers.
That infra is no longer in use as of today. Replaced by Google.
For those curious, the infra was mostly:
- a pair of back-end servers (the main site was an ancient Joomla install…), in a production / warm standby configuration;
- a couple dozen very thin VPSes acting as (micro-) caching reverse proxies; we called them "fasadas" (from Bosnian word for a façade);
- a bunch of scripts that tied it all together.
The stripped down and simplified nginx config for the fasadas lives as a FLOSS project here:
https://0xacab.org/rysiek/fasada
The production / warm standby back-end servers were automagically synced every hour. Yes, including the database.
This meant that:
1. we had a close-to-production testing server always available;
2. we had a way of quickly switching to almost completely up-to-date backup back-end server in case anything went down with the production.
The set-up on these back-ends included *two* nginx instances running parallel on different ports but with same config, serving same content.
Yes, on each.
Each fasada (i.e. reverse proxy on the edge) was configured to use *both* of these nginx instances on the currently-production back-end server.
Because everything was in docker containers, we could upgrade each nginx instance separately.
Whenever we were deploying nginx config changes or were upgrading nginx itself, we would do that one instance at a time. If it got b0rked, fasadas would just stop using the b0rked back-end nginx instance and switch to the other one.
No downtime. No stress.
IP addresses of active fasadas (that is, ones that were supposed to handle production traffic) were simply added as A records for `occrp.org`.
This was Good Enough™, as browsers were already smart about selecting an endpoint IP address and sticking to it across requests related to the same domain.
This also meant that if an active fasada went under for whatever reason, visitors would mostly not notice – their browsers would retry against one of the remaining IPs.
We had about 2 dozen fasadas configured, deployed, and ready to serve production traffic at any given time.
But we only kept 4-6 actually active for `occrp.org` (and some others for other sites we hosted).
The other ones were an "emergency stash".
If an active fasada did go under, we'd swap its IP address out of occrp.org A records, and add one of the currently healthy standbys instead.
If we started getting way more traffic than the current active fasada set could handle, we'd add more.
From my experience, what brings a site down really rarely is an *actual* DDoS. Most of the time it is organic traffic spike hitting a slow back-end.
Hence:
1. microcaching
2. my exasperation with CloudFlare calling everything a DDoS
But I digress!
We did get honest-to-Dog DDoSes, some pretty substantial. When that happened we just… swapped out *all* active fasadas.
DDoS would happily continue against the 4 to 6 old IP addresses… While new visitors would get served from other nodes.
See, when you're DDoSing someone, you don't want to waste your bandwidth on checking DNS records, now do you? You want to put everything you've got into these malicious packets.
And when you do, and the target just moves on to a different set of IP addresses, you're DDoSing something that does not matter. Have at it!
Now, I am not saying *all* DDoSes work this way.
I *am* saying that all the DDoSes I have seen against OCCRP's infra when I was there worked this way.
The time we really went down hard was when our dedi provider (which was otherwise great!) overeagerly blackholed DDoS traffic…
…blackholing also our production back-end server.
Took us 45min to deal with this, mainly because I was out at lunch and for *once* I did not take my phone with me. While a certain @smari happened to be on vacation literally on the other side of the globe.
Dealing with this meant pushing a quick config change to the fasadas to switch to the warm spare back-end.
What a blast from the past!
I should probably write this all up in a blogpost, with some more lessons-learned (for example: remember to microcache your 4xx/5xx errors and 3xx redirects as well).
Thanks for joining me for this ride down the memory lane!
I will now take your questions.
/end
@rysiek what's the point in having redundancy on the web server if they are both on the same machine? Thanks in advance for the answer.
@gubi getting there
@rysiek what order of magnitude were the TTLs on those records, how soon would clients notice the changes?
@viq TTL was 900, so 30min for ~full propagation.
@rysiek I worked my back up the thread because I read "microcache" as "microfiche" ... ahem!
Interesting stuff even if you stuck to current-century tech, thank you!
@DamonHD haha, thanks!
@rysiek Q: what TTL did you typically have on your A records?
@DamonHD 900, meaning in ~30min we could expect ~full propagation.
@rysiek OK, thanks!
Many many years ago when the BBC was hosting live UK General Election results for the first time I think that they used 5 minutes, and broke a lot of things.
(Ofc everything is faster and better tuned these days, from the DNS servers to the browsers...)
@robryk @DamonHD for some visitors it would be instantenous, if their recursive resolvers have not cached occrp.org A records yet.
For those whose did, the worst case scenario is roughly 2×TTL if the request happens *just* before we push DNS changes.
There are nuances and caveats, but that's an effective enough way of thinking about it.
@robryk @DamonHD there are all sorts of small random delays that can push it over the edge and mean that a recursive resolver still serves the cached response even though technically the TTL should have *just* expired.
Or, a recursive resolver gets a request from user A *just* before DNS changes are pushed, caches that. Then user B issues a request *just* before the TTL expires and gets the cashed response from the recursive resolver.
@rysiek Thanks for sharing! I don't think I've heard about micro caching before, can you share how low you set the time in this case?
@Herover it's all in the code.
Default: 10s for dynamic content
https://0xacab.org/rysiek/fasada/-/blob/master/services/etc/nginx/sites/example.com.conf?ref_type=heads#L43
And then depending on context:
https://0xacab.org/rysiek/fasada/-/blob/master/services/etc/nginx/sites/example.com.conf?ref_type=heads#L86
https://0xacab.org/rysiek/fasada/-/blob/master/services/etc/nginx/sites/example.com.conf?ref_type=heads#L106
Basically the idea is that content that is "dynamic" (i.e. generated by the CMS on the back-end) is cached just long enough to limit the number of requests hitting the back-end dramatically, but otherwise not make it annoying to people who write/edit/modify that content in the CMS.
10s -20s is pretty good for that.
@rysiek why do you recommend caching 4xx/5xx responses?
@slimhazard I recommend *microcaching* these responses just like all other dynamic responses, if the back-end is dynamic.
Why? Because if you're running a databased-backed back-end, generating a 404 page or a 301 redirect might take close to as much time and resources as generating a 200 response.
And if your back-end is overwhelmed and throwing 504 Gateway Timeouts after a long wait, throwing more requests its way is also a bad idea.
@slimhazard learned that the hard way when Apple's retina macs became a thing.
On one of the sites we started getting a "@2x" requests for *every* image:
https://www.kylejlarson.com/blog/creating-retina-images-for-your-website/
Those "@2x" images did not exist, so 404s were being returned. But these 404s were not being (micro)cached at the edge at the time, and that drove the load on the back-end through the roof.
So yeah, microcache your 3xx/4xx/5xx responses if you're doing microcaching at all.
@rysiek got it. I work with Varnish and see things through that lens, hence my question. We would do similar things, but differently.
In that world: 301 and 404 responses are cacheable by default, so the load for that can be taken from the backends in any case. And yes, a caching proxy has to react appropriately to distressed backends. For example with health checks, taking unhealthy backends out of rotation for a time to give them a chance to recover.
Sounds like your way worked.
@slimhazard yup, worked fine for us.
We looked at Varnish back when we were setting this up, but it did not support HTTPS (which was a necessity for us). The official way of adding HTTPS was "put nginx in front of it". So we shrugged and just used nginx.
@rysiek Yes. This should definitely be a blog post. I learned stuff!
@rysiek reminds me of when I was running a server with a popular shooting game, a great way to stop a ddos attack was to figure out which previous player was doing it (easy, they always made loudly told about themselves), and then block the persons IP. The ddos would almost always stop within minutes because most people thought they had permanently ddos'd the server out of existence, somehow
@rysiek: I use parts of your NGINX configs at Peekr. Alongside using Redis, blocking AI scrapers’ user agents server-side and optimising static resources, the instance works blazing fast and can return results as fast as few milliseconds. Fastly (thanks to its Fast Forward programme) also helps us in handling static resources such as CSS/JS/WOFF2 fonts (i.e. used as an auxiliary caching method).
Also, can you please explain how $redirect_fbclid
map does work? Would be interested in implementing this.
@slavistapl ah, nice!
Sure.
At some point Facebook started adding the `fbclid` argument, busting the cache.
So, first I use the map directive and a regex to set `$request_fbclid` to the request URL *without* `fbclid` stuff.
https://0xacab.org/rysiek/fasada/-/blob/master/services/etc/nginx/nginx.conf?ref_type=heads#L189
And then, in each site's config, I do a 301 redirect to that `fbclid`-less URL, if that var is set:
https://0xacab.org/rysiek/fasada/-/blob/master/services/etc/nginx/sites/example.com.conf?ref_type=heads#L117
@rysiek: thank you! I’ll implement this shortly + will test this out over other commercial social media platforms and return back with my findings.
@rysiek: insofar, only Facebook seems to be the only culprit. Haven’t checked out the messaging services such as Messenger/WhatsApp since I do not operate an account on each.
Instagram and Twitter seem to be including nothing alongside the destination link.