Networking
TCP
Connections with TCP are established with a 3-way handshake (SYN-SYNACK-ACK)
Performance
TCP can provide a high rate of throughput even on a high-latency network, by using buffering and a sliding window. TCP also employs congestion control and a congestion window set by the sender, so that it can maintain a high but also reliable rate of transmission across different and varying networks. Congestion control avoids sending too many packets, which could cause congestion and a performance breakdown.
The following is a summary of TCP performance features:
- Sliding window: This allows multiple packets up to the size of the window to be sent on the network before acknowledgements are received, providing high throughput even on high-latency networks. The size of the window is advertised by the receiver to indicate how many packets it is willing to receive at that time.
- Congestion Avoidance: To prevent sending too much data and causing saturation, which can cause packet drops and worse performance.
- Slow-start: Part of TCP congestion control, this begins with a small congestion window and then increases it as acknowledgements are received within a certain time. When they are not, the congestion window is reduced.
- Selective acknowledgements (SACKs): Allow TCP to acknowledgement discontinuous packets, reducing the number of retransmits required.
- Fast retransmit: Instead of waiting on a timer, TCP can retransmit dropped packets based on the arrival of duplicate acks. These are a function of round-trip time and not the typically much slower timer.
- Fast recovery: This recovers TCP performance after detecting duplicate ACKs, by resetting the connection to perform slow-start.
- TCP Fast open: Allows a client to include data in a SYN packet, so that the server request processing can begin earlier and not wait for the SYN handshake (RFC7413). THis can use a cryptographic cookie to auth the client.
- TCP timestamps: Includes a timestamp for sent packets that is returned in the ACK
Congestion avoidance
Routers, switches, hosts, may drop packets when overwhelmed. There are many mechanisms to avoid these problems:
- Ethernet: Pause Frames
- IP: Explicit Congestion Notification (ECN) field
- TCP: Congestion Window
Jumbo Frames
The confluence of two components has interfered with the adoption of jumbo frames: older hardware and misconfigured firewalls. Older hardware that does not support jumbo frames can either fragment the packet using the IP protocol (causing a performance cost for packet reassembly), or respond with an ICMP “Can’t Fragment” error. Misconfigured firewalls (as a response to an attack known as ‘the ping of death’) have been configured by administrators to block all ICMP traffic.
Latency
Latency can occur at various layers of the HTTP request pipeline:
- DNS lookup Latency
- Connection Latency
- First-byte latency
- Round-trip time (network latency)
- Connection Life Span (keepalives or a lack-of)
Buffering
TCP employs buffering, along with a sliding send window, to improve throughput. Network sockets also have buffers, and applications may also employ their own, to aggregate data before sending.
Buffering can also be performed by external network components, such as switches and routers, in an errot to improve their own throughput. Unfortunately, the use of large buffers on these components can lead to bufferbloat, where packets are queued for long intervals. This causes TCP congestion avoidance on the hosts, which throttles performance. Features have been added to Linux 3.x kernels to address this problem (including byte queue limits, the CoDel queueing discipline, and TCP small queues).
The function of buffering may be best served by the endpoints - the hosts - and not the intermediate network nodes.
Connection Backlog
Another type of buffering is for the initial connection requests. TCP implements a backlog, where SYN requests can queue in the kernel before being accepted by the user-land processes. When there are too many TCP connection requests for the process to accept in time, the backlog reaches a limit and SYN packets are dropped, to be later retransmitted by the client. The retransmission of these packets causes latency for the client connect time. The limit is tunable: it is a parameter of the listen syscall, and the kernel may also provide system-wide limits.
Backlog drops and retransmits are indicators of host overload.
Connection Queues in the Linux Kernel
The kernel employs 2 connection queues to handle bursts of inbound connections:
- One for incomplete connections (the SYN backlog)
- One for established connections (the LISTEN backlog)
Only one queue was used in earlier versions of the kernel and it was subject to SYN floods
The use of SYN cookies bypasses the first queue, as they show the client is already authenticated.
The length of these queues can be tuned independently. The LISTEN queue can also be set by the application as the backlog argument to the LISTEN syscall
Segmentation Offload
Network devicesa and networks accept packet sizes up to a maximum segment size (MSS) that may be as small as 1500 bytes. To avoid the network stack overheads of sending many small packets, Linux also uses Generic Segmentation Offload to send packets up to 64 kbytes in size (super packets), which are split into MSS-sized segments just before delivery to the network device. If the NIC and driver support TCP segmentation offload (TSO), GSO leaves splitting to the device, improving network stack throughput.
Tools:
netstat
ping
ip
ss
nicstat
tcplife
tcptop
tcpdump / wireshark
perf