Welcome back, kernel cowboys and memory mystics.

In Part I, we talked about the hidden costs of syscalls, the religion of zero-copy, and why the kernel doesn't love you. You came back, so now it's time to go deeper. Into the dark corners of systems programming where alignment matters, sockets lie, and performance is earned, not assumed.

Put your perf record hat on. We're going in.

Lesson 6: Alignment Isn't Optimization: It's Obligation

"Misaligned access isn't slower. It's catastrophic."
– Avi, low-latency trading dev, allergic to unaligned structs

In high-level land, you never worry about where your bytes live. In low-level land, you worry exactly where your bytes live. A single misaligned access across a cache line or page boundary? Boom! Hardware stall. Sometimes even SIGBUS.

Writing a struct that crosses a boundary is like building a bridge with mismatched bolts.

Protip: Use alignas(), posix_memalign(), or raw pointers on memory-mapped regions. Know your L1 boundaries. Align like a laser.

Lesson 7: You Are the Network Stack

"If you don't read the socket fast enough, the kernel will drop your packets."
– J, packet engineer, once lost 3 TB of UDP data in a firestorm of ENOBUFS

Here's the truth: when packets hit your NIC, they enter a race. Either your program handles them, or they vanish into /dev/ethervoid. The kernel buffers, sure. But not forever. The socket is just a paper bag. You need to consume or lose.

If your recv() is slow, no amount of threading or async sugar will save you. Buffer tuning can delay the inevitable, but only efficient userspace loops can win.

Protip: Pre-allocate buffers. Use recvmmsg(). Bind threads to cores. And always measure your drop rate.

Lesson 8: Every Page Fault Is a Teleportation Event

"You didn't crash – you just got caught asking for memory you weren't prepared to deserve."
– Lina, ex-VM engineer, once trapped 14K page faults in a malloc() call

Virtual memory is an illusion held together by lies, TLBs, and silent agreements between your process and the MMU. You don't own memory until you touch it. And when you do, the kernel wakes up, frowns, and sends in the page allocator army.

This is fine, once. But walk through memory linearly and you can bring the system to its knees.

Protip: Use mlock(), pre-fault memory, and touch every page once in advance. Cold faults are for amateurs.

Lesson 9: ptrace Teaches More Than Any Tutorial

"You want to learn how strcpy() works? ptrace it. Want to know what your shell is really doing? ptrace it."
– Mikhail, reverse engineer, hasn't read man pages since 2009

Forget the documentation. Follow the process. ptrace() lets you step into a running program's mind – instruction by instruction, register by register. It is the hacker microscope. A debugger is for figuring out why something broke. ptrace() is for learning how it worked.

You'll see system calls in flight, stack frames mutate, signals delivered by angels. It's beautiful.

Protip: Wrap ptrace() into a script. Track your shell. Trace your compiler. Trace ls. You'll never write code the same way again.

Lesson 10: System Call Hoarding Is a Survival Skill

"You've got 100,000 clients? Better be using 10 syscalls total."
– Daria, kernel I/O hoarder, wrote a DNS server that ran with just epoll_wait()

Every syscall is a trapdoor. It could block. It could cost context switches. It could fail. Good hackers treat syscalls like gold – hoard them, batch them, avoid them.

Instead of thousands of read(), you batch with readv(). Instead of waiting, you poll(). Instead of copying, you splice().

The fewer syscalls you issue, the less the kernel touches you. That's how you scale. That's how you survive.

Protip: Profile with strace -c. If you see anything other than epoll_wait dominating, you've got work to do.

Bonus Insight: Systems Programming Is Time Travel

When you work at the syscall level, you're not writing instructions – you're writing timing contracts. You are dictating when memory is accessed, when context switches happen, when packets arrive, and when the CPU pauses.

Every line of code is a potential temporal distortion.

This is why system hackers are obsessed with time: nanoseconds, ticks, cycles. They don't write software – they compose latency symphonies.

Coming Up in Part III

  • How cache lines destroy performance silently
  • What really happens during fork()
  • Why the TLB is your worst-kept secret
  • Event-driven I/O done right
  • You don't need a mutex – until you do

Update: Part III is live!

If this was your kind of medicine, share it with someone still stuck in Python abstraction hell. And if you've got your own truths forged in bitrot and bootloaders, send them my way. Hacker Wisdom is a living archive – and you're all contributors.

Stay tuned for Part III.

Until then, stay aligned, stay paged in, and stay out of userland.

P.S. Want this printed as a TSC-timed manifesto, or compiled into a statically linked .txt? I got you.