By Simon Karpen, Technical Consultant
Background
TCP, or Transmission Control Protocol, is the primary layer-4 communication protocol for file copies, database synchronization, and related tasks over the WAN.
Many of these TCP synchronization tasks, such as database replication or rsync over ssh, are very dependent on single-stream TCP throughput to maintain the replication consistency and latency required to meet business objectives such as RTO and RPO.
Non-Optimized Performance
Without specific performance optimizations for high-bandwidth long-haul networks, TCP throughput is a function of network latency.
If we assume a standard TCP window size of 64 kilobytes and cross-country latency of 80 milliseconds, the maximum possible single-stream throughput with no packet loss is:
When TCP was designed in the 1970s, the fastest long-haul networks available had far less than 1 megabit of available bandwidth for all traffic, let alone 6, so this was not a design consideration.
Optimizing Performance — Window Scaling
IETF RFC 1323 defines TCP window scaling, as an optional feature to improve performance over “Long, Fat Networks”. These are networks with high throughput (generally in excess of 10 megabits), but with high latency (due to geographic distance and speed-of-light delays).
TCP window scaling, if enabled, automatically scales the window to optimize the throughput of the connection. If we work backward, and use the bandwidth-delay product, let’s say we need 50 megabits of throughput over a link with 100ms latency.
50 megabits * 100 milliseconds = 512 kilobyte TCP window
While it is possible to hand-optimize this value for specific situations, the best practice is to enable automatic window scaling. On Linux, this is accomplished by:
- sysctl –w net.ipv4.tcp_window_scaling=1
- add net.ipv4.tcp_window_scaling=1 to /etc/sysctl.conf
Optimizing Performance — Selective Acknowledgement
By default, a single lost packet can result in the retransmission of an entire window worth of packets. On long-fat networks such as your typical WAN, this can be a substantial amount of data.
To resolve this problem, TCP Selective Acknowledgement, defined in RFC 2018, allows the receiving system to acknowledge all packets received even if an intermediate packet has not been received. This results in retransmission of only the lost packets, instead of an entire window worth of data.
To enable selective acknowledgments on a Linux host:
- sysctl –w net.ipv4.tcp_sack=1
- add net.ipv4.tcp_sack=1 to /etc/sysctl.conf
Before and After
This is a real-world example, within a specific business unit of a leading Silicon Valley-based software company. This involved replication traffic between two data centers (one east coast, one west coast) over a link with 90ms latency.
Throughput before optimization: 5 MB/sec
Throughput after optimization: 66 MB/sec
Risks
TCP window scaling can lead to increased host memory consumption due to the larger send and receive buffers. This is generally not an issue with modern hardware with multiple gigabytes of RAM
TCP window scaling and selective acknowledgments may not be supported by some older firewall implementations. This is generally not an issue with current products from top-tier network vendors such as Cisco and Juniper.