Welcome back, interrupt dodgers and latency prophets.
You've made it through time lies, NUMA betrayal, prefetch treachery, and cache line sabotage. What's left?
Now we talk about the fast path – that dreamy code path you think will save you. We talk about malloc()
– the friendly allocator that turns into a trapdoor. We talk about perf
lying to your face. And we talk about the rare phenomena you pray you'll never debug.
Let's go.
Also see Part I, Part II, Part III, Part IV, and Part V.
Lesson 26: Fast Paths Are Where the Worst Bugs Hide
"If your 'fast path' has one 'if' too many, it's not fast. It's a trap."
– Zed, kernel contributor, removed 800 lines from the so-called optimized path
Everyone builds fast paths. "If condition X, take the shortcut." Except fast paths rot. They're hard to test. Rarely taken. Then suddenly taken all the time under load – and everything breaks.
Fast paths attract dangerous optimizations: unchecked assumptions, unsafe reads, hidden race conditions. And because they "work in dev", they go unnoticed.
Protip: Every fast path must be the most well-tested path. Or it becomes the most dangerous. And always benchmark your so-called shortcut.
Lesson 27: malloc() Is a Cliff
"Your 128-byte struct? Perfect. Your 129-byte struct? Page-aligned nightmare."
– Hal, allocator archaeologist, once found a 30% slowdown caused by one extra pointer
Memory allocation is not linear. It's jagged. You get fast, cached blocks for certain sizes – powers of two, slab-aligned zones. Cross a threshold? You get mmap()
or sbrk()
. That means page faults, zeroing, TLB misses.
Worse: malloc()
may hide fragmentation. You think you're saving memory. You're not. You're hoarding metadata.
Protip: Use memory pools for fixed-size objects. Use custom allocators for high-frequency structures. And never trust that heap behavior is "constant time".
Lesson 28: Kernel Queues Are Not Your Queues
"You wrote to the socket. The bytes went... nowhere. Because they’re still in the kernel."
– Rami, network throughput whisperer, once increased performance 3× by draining TX queues faster
You think write()
to a socket sends data. It doesn't. It puts data in a socket send queue. If the queue is full? You block. If the queue is slow? You stall. Same with disk, same with pipes, same with everything that queues.
You're not writing to a device – you're negotiating with a finite ring buffer you don't control.
Protip: Monitor queue depth with netstat -an
, ss -i
, or getsockopt()
. For disk, use iostat
. For pipes, avoid writing more than PIPE_BUF
unless you want blocking.
Lesson 29: perf
Can Lie to You
"You're optimizing the wrong function. Because perf couldn't sample the real culprit."
– Lila, flamegraph surgeon, once spent two weeks chasing a ghost in a kernel module
perf
is amazing. It counts cycles, instructions, stalls. But sampling is imperfect. It misses short-lived functions. It hides work done in interrupts. It doesn't see kernel-assisted offload or driver-layer pain.
It's worse in containerized environments or under virtualization – where clocks are unstable, CPU pinning is unclear, and perf events are throttled.
Protip: Use perf record
with high-frequency sampling and wide collection. Always corroborate with strace
, bpftrace
, and /proc
data. And never optimize based on the flamegraph alone.
Lesson 30: IPI Storms – When Cores Turn on Each Other
"Why is my system frozen even though nothing is pegging CPU? Answer: interrupt storm from hell."
– Raj, systems necromancer, once traced 2 million IPIs per second in a misconfigured system
IPIs (Inter-Processor Interrupts) are how CPUs talk. One core wants another to flush a TLB? Send an IPI. But if misconfigured, or if your kernel has a bug, or if hardware is unstable? You get an IPI storm.
Suddenly, cores spend all their time handling each other's interrupts. No work gets done. The system "looks idle". But it's in full-on civil war.
Protip: Monitor with perf stat -e IPI
, mpstat -I
, or check /proc/interrupts
. Pin workloads. Avoid bouncing memory. And don't trigger TLB shootdowns unnecessarily.
Hacker Meditation: The Hot Path Is Haunted
The closer you get to performance, the more fragile things become.
Fast paths break easily. Kernel queues betray timing. malloc()
morphs under load. perf
shows you dreams. And interrupts, the ghosts of hardware, can paralyze your whole system without warning.
The only real protection is understanding. You can't debug what you don't believe can happen.
Coming Up in Part VII
- Scheduler heuristics: when fairness destroys latency
- Shared memory: faster and more dangerous than you think
- When atomic ops stall CPUs
- How /proc reveals your soul
- The curse of good-enough defaults
Update: Part VII is live!
Until next time. Think like a profiler. Allocate like a miser. And never forget: the machine does exactly what you asked, not what you meant.
Stay fast. Stay aligned. Stay hacker.
P.S. Want these printed as compiler errors when you mess up alignment? Let's do it.