Welcome back, syscall spelunkers.
In Part I, we talked about the cost of touching the kernel. In Part II, we tackled page faults, misaligned structs, and how ptrace()
is the hacker's secret telescope.
Now it's time to go even deeper. Where caches lie, fork()
whispers, and concurrency doesn't care about your feelings.
If you're not scared yet – you should be.
Lesson 11: Cache Lines Are Saboteurs
"Your program doesn't run on memory. It runs on cache lines pretending to be memory."
– Morgan, multicore sadist, wrote a lock-free stack that broke two CPUs
Forget malloc. Forget RAM. If your data structures don't fit cleanly inside cache lines, you're at war with your own hardware.
Two variables on the same line? One gets updated, the other gets flushed. Welcome to false sharing. Loop touching memory every 128 bytes? Miss, miss, miss – you're airlifting your bytes with a catapult.
Protip: Use __attribute__((aligned(64)))
or pad structs manually. And know your L1D, L2, and L3 sizes better than your phone number.
Lesson 12: fork() Is a Lie
"When you fork(), the process doesn't duplicate. The illusion does."
– Niko, POSIX dissenter, hasn't used threads since 2012
Most think fork()
duplicates your process. It doesn't. It copies the page table. Until you touch something, the child uses the same memory. One write? Bang! Copy-on-write. Now you're paying for pages you thought were free.
This is why pre-forking daemons go to great lengths to mmap()
shared buffers after forking. Otherwise, you're doubling memory usage every time a log line gets written.
Protip: If you're forking in 2020, you better have a good reason. Otherwise, clone()
with flags. Or use posix_spawn()
and grow up.
Lesson 13: The TLB Hates You
"Your memory access is fine – until you walk across page boundaries like a drunk."
– Lee, memory model whisperer, reverse engineered a TLB eviction pattern with hand-rolled tests
TLBs (Translation Lookaside Buffers) are like cheat sheets for your CPU. Fast lookups for where virtual memory maps to physical. But they're small. Real small. Spill them, and you pay in full.
Touching memory across thousands of pages in tight loops? You just nuked the TLB. Suddenly you're paying page-table taxes for every access.
Protip: Use huge pages. Touch memory in stride. Preload mappings with mlock()
or madvise()
. And never write a tight loop across sparse memory without thinking.
Lesson 14: Event-Driven I/O Isn't About Events
"Everyone builds an event loop. Almost no one builds a good one."
– Ash, async heretic, rewroteepoll
handlers for fun and terror
At some point, you'll write a server. It'll handle connections. It'll read. It'll write. You'll use epoll
or kqueue
or IOCP
and call it event-driven.
Except it's not.
Unless you batch, defer, and reuse, your "event loop" is just a slow syscall dance. The kernel gives you events. Your job is to consume them fast and intelligently.
Protip: Use EPOLLEXCLUSIVE
, recvmmsg()
, and pooled buffers. Don't malloc()
in the hot path. And don't re-arm file descriptors unless absolutely necessary.
Lesson 15: You Don't Need a Mutex – Until You Do
"Atomic operations are great until you forget the memory barrier. Then your app 'sometimes' works."
– Dani, concurrent guru, once debugged a race condition that only occurred at 2 AM
Threads are great. Locks are awful. Atomics are magic. But concurrency is evil.
You can use atomics to build lock-free queues, semaphores, spin locks. But every atomic op has a memory model, and unless you're fluent in the language of fences, you're building a time bomb.
Sometimes, a simple pthread_mutex_lock()
is the correct answer. Especially when you're building something that must never go wrong.
Protip: Use atomics for fast paths, locks for correctness. And always document your invariants. Even future you won't remember why you thought that relaxed ordering was "fine".
Hacker Meditation: The Machine Never Forgets
Every decision you make in low-level programming echoes forever. Misaligned memory? Cache stalls. Unbatched I/O? Kernel wakeups. Misunderstood concurrency? Data corruption at scale.
The machine doesn't hate you. It simply remembers everything.
And it will make you pay.
Coming Up in Part IV
- How system time lies to you
- The myth of "CPU usage"
- Why signals are chaos
- When preemption breaks your mind
- The art of deterministic I/O
Update: Part IV is live!
If you're still reading, you're one of us now. Drop a comment with your war stories, syscall scars, or concurrency catastrophes. The machine is listening. So are we.
Stay sharp. Stay pinned. Stay hacker.
P.S. Want these as a shell-fortune command? Or printed on dot matrix? Just say the word.