Interesting! Let me guess, malloc() has some locking going on while updating its lists and the performance just goes in the toilet?
Close. We're actually seeing memory bloat (ultimately resulting in OOM kills), but it is lock-related. If malloc() encounters lock contention on all existing heap arenas it creates a new arena, up to a limit of 8x the number of CPU cores. So lots of wasted RAM. Capping the arena count at a lower value prevents the bloat, but hurts performance.
The weird thing is, the heap arena management logic doesn't seem to have changed much. So this appears to be an indirect effect of something else which is causing us to contend on the heap locks more than we used to in Debian 8, causing the number of arenas to balloon. What we know is that the issue is definitely tied to the version of glibc being used, and the heap code is definitely a player in the mess.
Looks like this version may also be less aggressive about trimming unused space from the end of arenas and returning that unused space to the OS, which would also tend to aggravate the issue. If I had to speculate, maybe this was done to reduce the number of syscalls, to mitigate the performance penalty of the Meltdown patches...