"Every time I see someone using TCP_NODELAY, I ask: Have you met its misunderstood cousin TCP_CORK?"
– A Network Programmer Who Knows
In this post, I'm diving head-first into one of the lesser-known (but incredibly powerful) socket options in Linux: TCP_CORK
. This is the kind of option you only learn about if you've stared at strace logs at 3 AM, trying to shave milliseconds off a TCP stream, or you've been burned by partial frame sends from glibc's write().
If you're the kind of person who enjoys memory-mapped files, splice(2), zero-copy TCP, or hacking the Linux TCP stack for fun – then oh boy, TCP_CORK is for you.
TL;DR;
- TCP_CORK is a socket option specific to Linux.
- It allows you to delay TCP packet transmission until you send enough data to fill a maximum-sized segment (MSS).
- It's a cousin of TCP_NODELAY, but with the opposite behavior.
- It's essential for high-performance network servers, static file servers, and anything needing write coalescing.
- Think of it as manual Nagle – but better.
Historical Context
To understand TCP_CORK, we must wind the clock back to the early days of TCP and Nagle's algorithm.
In the 1980s, John Nagle proposed an algorithm to reduce the number of small packets (tinygrams) on the network. He observed that many TCP applications were sending data byte-by-byte (think printf() in a loop), causing TCP to emit many 40-byte packets with only 1 byte of data.
Enter: TCP_NODELAY
To combat that, TCP/IP stacks introduced Nagle's algorithm, which tries to batch small writes until an ACK comes back. But sometimes, you want to send data immediately – say, a game server or trading app. So BSD sockets gave us TCP_NODELAY, which disables Nagle's buffering.
Problem solved?
Not quite.
Now Enter: TCP_CORK (Linux-only)
The Linux kernel team saw a better way. What if you could intentionally delay sending packets until you knew the data was "done" – like when you're building an HTTP header + body?
That's the purpose of TCP_CORK.
It gives you manual control over when TCP pushes data out. Cork the socket, buffer as many writes as you like, and then uncork it to flush the data.
Originally added in Linux 2.2 (around 1999), it's lived in obscurity ever since – a hidden gem for syscalls nerds.
TCP_NODELAY vs TCP_CORK
Let's burn this into your brain.
Option Effect Who decides when to flush?
------ ------ --------------------------
TCP_NODELAY Disables Nagle, sends immediately The application does
TCP_CORK Buffers until uncorked or MSS full The application does
Neither Nagle decides buffering The kernel does
And yes – they are mutually exclusive in the Linux kernel. Setting TCP_CORK implicitly disables TCP_NODELAY, and vice versa.
Show Me the Code!
Let's get dirty.
Here's a simple server that uses TCP_CORK to batch multiple write() calls into a single TCP segment.
#include <stdio.h>
#include <string.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <unistd.h>
#include <netinet/tcp.h>
int main() {
int sockfd = socket(AF_INET, SOCK_STREAM, 0);
// Enable TCP_CORK
int state = 1;
setsockopt(sockfd, IPPROTO_TCP, TCP_CORK, &state, sizeof(state));
// Now connect or accept, whatever...
// Write some data
write(sockfd, "HTTP/1.1 200 OK\r\n", 17);
write(sockfd, "Content-Length: 12\r\n\r\n", 23);
write(sockfd, "Hello world\n", 12);
// Uncork - flush it out!
state = 0;
setsockopt(sockfd, IPPROTO_TCP, TCP_CORK, &state, sizeof(state));
close(sockfd);
return 0;
}
Without TCP_CORK, those write() calls could be sent as three separate TCP segments. With it, they're coalesced into one, assuming the total is under MSS (~1460 bytes typically).
This is exactly how nginx sends static files efficiently.
File Transfers, Splice(), and Zero-Copy
Let's level up.
You can combine TCP_CORK with sendfile() or splice() to send large files without ever copying data to userspace. Like so:
int out_fd = open("video.mp4", O_RDONLY);
off_t offset = 0;
int cork = 1;
setsockopt(sockfd, IPPROTO_TCP, TCP_CORK, &cork, sizeof(cork));
sendfile(sockfd, out_fd, &offset, filesize);
cork = 0;
setsockopt(sockfd, IPPROTO_TCP, TCP_CORK, &cork, sizeof(cork));
Boom. Your disk-to-TCP stack just got a turbo boost.
Kernel Dive: Why It Works
Under the hood, TCP_CORK controls a flag in the tcp_sock structure in the Linux kernel.
When corked:
1) TCP will not send partial frames.
2) TCP waits for either:
- Enough data to fill MSS
- Or you uncork
This is critical because Linux's write() syscall does not guarantee that it flushes at a segment boundary. That's the job of TCP_CORK.
Check out tcp_push_pending_frames() in the kernel source. It literally checks tp->nonagle & TCP_NAGLE_CORK.
If you're serious, read net/ipv4/tcp_output.c.
Gotchas
- Corking only affects outgoing data.
- Works only on SOCK_STREAM sockets (TCP).
- Uncorking doesn't immediately flush if the buffer is empty.
- Do not combine it with TCP_NODELAY. That leads to sadness.
- If you close() a corked socket, the data is flushed automatically.
Benchmarking TCP_CORK vs TCP_NODELAY
Want to see the performance difference?
Try this:
1) Set up a basic server.
2) Benchmark using ApacheBench or wrk.
3) Compare:
- multiple write()s with TCP_NODELAY
- same writes with TCP_CORK
In most scenarios, TCP_CORK yields:
- Lower system call overhead
- Fewer packets on the wire
- Better throughput for large responses
Especially noticeable on HTTP/1.0 servers or non-chunked HTTP/1.1.
Use Cases
- High-performance HTTP servers (nginx, lighttpd)
- Streaming media servers
- Static file servers using sendfile()
- Any app where you construct protocol responses in chunks
- Custom protocol stacks in C/C++
Next-Level Thought
Here's something twisted: you can use TCP_CORK as a poor man's batching primitive.
Let's say you want to delay a TCP response for 10ms to coalesce more data:
setsockopt(sockfd, IPPROTO_TCP, TCP_CORK, &on, sizeof(on));
usleep(10000); // 10ms delay
setsockopt(sockfd, IPPROTO_TCP, TCP_CORK, &off, sizeof(off));
Not elegant. But it works. And in ultra-low-latency systems, every trick counts.
Final Thoughts
TCP_CORK is one of those options that you never use until you need it – and then you realize it's essential. It's the surgical tool for when you care deeply about TCP performance and packet boundaries.
When you have time:
- Go grep your favorite web server for TCP_CORK.
- Write a benchmark.
- Open Wireshark and compare.
Once you learn to cork, you never go back.
Further Reading
- Linux man page: tcp(7)
- Linux Kernel source: net/ipv4/tcp_output.c
- Nginx source code (look for ngx_linux_sendfile_chain)
Closing
If you want a deep dive into the TCP stack, socket programming at the syscall level, or optimizing servers with Linux internals – follow me here or shoot a message.
Would you like me to write a follow-up on TCP_NODELAY, SO_RCVLOWAT, or maybe TCP_QUICKACK?
Let's dig even deeper.