TCP Congestion Handling



TCP Congestion Handling

Congestion by definition means intermediate devices – routers in this case – are overloaded. Meaning that TCP segments would not be delivered as fast as possible or they can be even dropped – which leads to retransmission.

Now, let’s suppose congestion dramatically increased on the internetwork, and there was no mechanism in place to handle congestion. Segments would be delayed or dropped, which would cause them to time out and be retransmitted. This would increase the amount of traffic on the internetwork between client/server. Furthermore, there might be thousands of TCP connections behaving similarly. Each would keep retransmitting more and more segments, increasing congestion further. Performance of the entire internetwork would decrease dramatically, resulting in a condition called congestion collapse.

Now TCP uses a number of mechanisms to achieve high performance and avoid congestion collapse. These mechanisms control the rate of data entering the network, keeping the data flow below a rate that would trigger the collapse. They also yield an approximately max-min fair allocation between flows.

Acknowledgments for data sent, or lack of acknowledgments, are used by senders to infer network conditions between TCP sender and receiver. Coupled with timers, TCP senders and receivers can alter the behavior of the flow of data. This is more generally referred to as congestion control and/or congestion avoidance.

Modern implementations of TCP contain 3 intertwined algorithms:

  • Slow Start
  • Congestion Avoidance
  • Fast Retransmit
  • Fast Recovery

(NOTE: The book states these mechanisms to RFC 2001, but it has evolved way more than that and now these mechanisms are stated in RFC 5681, which obsoletes RFC 2581 and which in turn obsoleted RFC 2001)

TCP Congestion Window


Is one of the factors that determine the number of bytes that can be outstanding at any given time. The congestion window is maintained by the sender.

The congestion window is a  means of stopping a link between the sender and the receiver from getting overloaded with too much traffic. It is calculated by estimating how much congestion there is between the two places.

When a connection is set up, the congestion window, a value maintained independently at each host, is set to a small multiple of the maximum segment size (MSS) allowed for that connection. Further variance in the congestion window is dictated by an Additive Increase/Multiplicative Decrease approach (I believe this was the case before Slow-Start which follows a more multiplicative increase).

This means that is all segments are received and the acknowledgments reach the sender on time, some constant is added to the window size. The window keeps growing exponentially until a timeout occurs or the receiver reaches its limit (a threshold value “ssthresh”). After this, the congestion window increases linearly at the rate of 1/congestion-window on each new acknowledgment received.

On timeout:

  1. Congestion Window is reset to 1 MSS (not sure if is a complete MSS or just the initial fraction said before)
  2. “ssthresh” is set to half the congestion widow size before packet loss started
  3. “slow start” is initiated.

An administrator may adjust the maximum window size limit, or adjust the constant added during additive increase, as part of TCP tuning.

The flow of data over a TCP connection is also controlled by the use of the TCP receive window. By comparing its own congestion window with the receive window of the receiver, a sender can determine how much data it may send at any given time.

Slow Start (RFC 5681)

Is part of the congestion control strategy used in TCP – it is used in conjunction with other algorithms. It is also known as the exponential growth phase.

Slow-start begins initially with a congestion window size (cwnd) of 1, 2 or 10. The value of the congestion window (CWND) will be increased with each acknowledgment (ACK) received, effectively doubling the window size each round trip time (“although is not exactly exponential because the receiver may delay its ACKs, typically sending one ACK every two segments that it receives”). The transmission rate will be increased with slow-start algorithm until either a loss is detected, or the receiver’s advertised window (rwnd or RCV.WND) is the limiting factor, or the slow-start threshold (ssthresh) is reached. If a loss event occurs, TCP assumes that it is due to network congestion and takes steps to reduce the offered load on the network. These measurements depend on the used TCP congestion avoidance algorithm. Once ssthresh is reached, TCP changes from slow-start algorithm to the linear growth (congestion avoidance) algorithm. At this point the window is increased by 1 segment for each RTT.

Although the strategy is referred to as “Slow-Start”, its congestion window growth is quite aggressive, more aggressive than the congestion avoidance phase. Before slow-start was introduced in TCP, the initial pre-congestion avoidance phase was even faster. (To do more digging- I think initially it was a linear approach – what we used to call diente de sierra – and was a much slower approach to fill the link capacity with data)

The behavior upon packet loss depends on the TCP congestion avoidance algorithm that is used:

TCP Tahoe: In here, when a loss occurs, fast retransmit is sent, half of the current CWND is saved as a slow start threshold (ssthresh) and slow start begins again from its initial CWND. Once the CWND reaches the ssthresh, TCP changes to congestion avoidance algorithm where each new ACK increases the CWND by: ss + (ss/cwnd)

This results in a linear increase of the CWND.

TCP Reno: This implements an algorithm called fast recovery. A fast retransmit is sent, half of the current CWND is saved as ssthresh and as a new CWND, thus skipping slow-start and going directly to congestion avoidance algorithm

Transmission Control Protocol (TCP) uses a network congestion-avoidance algorithm that includes various aspects of an additive increase/multiplicative decrease (AIMD) scheme, with other schemes such as slow-start and congestion window to achieve congestion avoidance. The TCP congestion-avoidance algorithm is the primary basis for congestion control in the Internet.[1][2][3][4] Per the end-to-end principle, congestion control is largely a function of internet hosts, not the network itself. There are several variations and versions of the algorithm implemented in protocol stacks of operating systems of computers that connect to the Internet.

From <>