TCP, Congestion Control, and Buffer Bloat

Cardwell, Neal, Yuchung Cheng, C. Stephen Gunn, Soheil Hassas Yeganeh, and Van Jacobson. “BBR: Congestion-Based Congestion Control.” Queue 14, no. 5 (October 2016): 50:20–50:53. doi:10.1145/3012426.3022184.

Article available here
Slides available here

In the “old days,” packet loss was a major problems; so much so that just about every routing protocol has a number of different mechanisms to ensure the reliable delivery of packets. For instance, in IS-IS, we have—

  1. Local reliability between peers using CSNPs and PSNPs
  2. On some links, a periodic check using CSNPs to ensure no packets were dropped
  3. Acknowledgements for packets on transmission
  4. Periodic timeouts and retransmissions of LSPs

It’s not that early protocol designers were dumb, it’s that packet loss was really this much of a problem. Congestion in the more recent sense was not even something you would not have even thought of; memory was expensive, so buffers were necessarily small, and hence a packet would obviously be dropped before it was buffered for any amount of time. TCP’s retransmission mechanism, the parameters around the window size, and the slow start mechanism, were designed to react to packet drops. Further, it might be obvious to think that any particular stream might provide more bandwidth if it uses the maximum available bandwidth, hence keeping the buffers full at every node along the path.

The problem is: all of these assumptions are wrong today. Buffers are cheap, and hence tend to be huge (probably not a good thing, actually), so packets tend to be buffered rather than dropped. Further, the idea that the best use of a link comes when a stream uses as much of it as possible has been proven wrong. So what is the solution?

One possible solution is to rebuild the TCP window size and slow start calculations to account for something other than packet drops. What will produce the best results? The authors of this paper argue the correct measures are the delay across the entire path, which they call BBR for Bottleneck Bandwidth and Round trip time. The thesis of the paper is that if the sender could estimate the delay across the link and the smallest bandwidth link in the path, then the sender can transmit packets at a rate that will just fill the lowest bandwidth link on the path, and hence as the actual maximum rate possible along this path. They illustrate the concept like this—

Mechanisms that focus on preventing packets from dropping assume the operational point with the highest throughput is at the right hand vertical line, just where packets start dropping. The reality, however, is that the operational point with the highest throughput is at the left hand vertical line, which is just where packets start being buffered. The authors have developed a new formula for calculating not only the window size in TCP, but also when to send a packet. Essentially, they are using the RTT, along with an estimate of the minimum bandwidth along the link derived from the delivery rate of packets. The delivery rate is calculated from the rate at which packets are delivered, as witnessed by ack’s, along a particular time period.

The result is a system of windowing and send rate that maximizes the throughout while minimizing buffering along the path. The following figure, from the paper, shows the buffering along the path for the best known TCP windowing algorithm compared to BBR.

You can see that TCP adjusts its send rate so the queues on slowest link fill, and then tries to overflow the link buffers every few seconds to ‘test’ for available buffer space. Since TCP is testing mainly for dropped packets, available buffer on the slowest link appears to be available bandwidth. The green line represents the buffer utilization of BBR; the buffer is filled in order to detect the available bandwidth, and then TCP with BBR drops to using almost no buffer. As contrary as this might seem, the result is BBR transfers data faster using TCP than even the most advanced windowing technique available can.

This research illustrates something that network engineers and application developers need to get used to—the network works better if we build networks and applications that work together. Rather than assuming the network’s job is simply not to drop packets, BBR takes a more intelligent direction, assuming that while the network needs to be able to handle microbursts, and not drop packets, it is the job of the application to properly figure out the network’s limits and try to live within them.

Interesting work, indeed; you should read the entire paper.

Reaction; Do we really need a new Internet?

The other day several of us were gathered in a conference room on the 17th floor of the LinkedIn building in San Francisco, looking out of the windows as we discussed some various technical matters. All around us, there were new buildings under construction, with that tall towering crane anchored to the building in several places. We wondered how that crane was built, and considered how precise the building process seemed to be to the complete mess building a network seems to be.

And then, this week, I ran across a couple of articles arguing that we need a new Internet. For instance—

What we really have today is a Prototype Internet. It has shown us what is possible when we have a cheap and ubiquitous digital infrastructure. Everyone who uses it has had joyous moments when they have spoken to family far away, found a hot new lover, discovered their perfect house, or booked a wonderful holiday somewhere exotic. For this, we should be grateful and have no regrets. Yet we have not only learned about the possibilities, but also about the problems. The Prototype Internet is not fit for purpose for the safety-critical and socially sensitive types of uses we foresee in the future. It simply wasn’t designed with healthcare, transport or energy grids in mind, to the extent it was ‘designed’ at all. Every “circle of death” watching a video, or DDoS attack that takes a major website offline, is a reminder of this. What we have is an endless series of patches with ever growing unmanaged complexity, and this is not a stable foundation for the future. —CircleID

So the Internet is broken. Completely. We need a new one.


First, I’d like to point out that much of what people complain about in terms of the Internet, such as the lack of security, or the lack of privacy, are actually a matter of tradeoffs. You could choose a different set of tradeoffs, of course, but then you would get a different “Internet”—one that may not, in fact, support what we support today. Whether the things it would support would be better or worse, I cannot answer, but the entire concept of a “new Internet” that supports everything we want it to support in a way that has none of the flaws of the current one, and no new flaws we have not thought about before—this is simply impossible.

So lets leave that idea aside, and think about some of the other complaints.

The Internet is not secure. Well, of course not. But that does not mean it needs to be this way. The reality is that security is a hot potato that application developers, network operators, and end users like to throw at one another, rather than something anyone tries to fix. Rather than considering each piece of the security puzzle, and thinking about how and where it might be best solved, application developers just build applications without security at all, and say “let the network fix it.” At the same time, network engineers say either: “sure, I can give you perfect security, let me just install this firewall,” or “I don’t have anything to do with security, fix that in the application.” On the other end, users choose really horrible passwords, and blame the network for losing their credit card number, or say “just let my use my thumbprint,” without ever wondering where they are going to go to get a new one when their thumbprint has been compromised. Is this “fixable?” sure, for some strong measure of security—but a “new Internet” isn’t going to fare any better than the current one unless people start talking to one another.

The Internet cannot scale. Well, that all depends on what you mean by “scale.” It seems pretty large to me, and it seems to be getting larger. The problem is that it is often harder to design in scaling than you might think. You often do not know what problems you are going to encounter until you actually encounter them. To think that we can just “apply some math,” and make the problem go away shows a complete lack of historical understanding. What you need to do is build in the flexibility that allows you to overcome scaling issues as they arise, rather than assuming you can “fix” the problem at the first point and not worry about it ever again. The “foundation” analogy does not really work here; when you are building a structure, you have a good idea of what it will be used for, and how large it will be. You do not build a building today and then say, “hey, let’s add a library on the 40th floor with a million books, and then three large swimming pools and a new eatery on those four new floors we decided to stick on the top.” The foundation limits scaling as well as ensures it; sometimes the foundation needs to be flexible, rather than fixed.

There have been too many protocol mistakes. Witness IPv6. Well, yes, there have been many protocol mistakes. For instance, IPv6. But the problem with IPv6 is not that we didn’t need it, not that there was not a problem, nor even that all bad decisions were made. Rather, the problem with IPv6 is the technical community became fixated on Network Address Translators, effectively designing an entire protocol around eliminating a single problem. Narrow fixations always result in bad engineering solutions—it’s just a fact of life. What IPv6 did get right was eliminating fragmentation, a larger address space, and a few other areas.

That IPv6 exists at all, and is even being deployed at all, shows just the entire problem with “the Internet is broken” line of thinking. It shows that the foundations of the Internet are flexible enough to take on a new protocol, and to fix problems up in the higher layers. The original design worked, in fact—parts and pieces can be replaced if we get something wrong. This is more valuable than all the iron clad promises of a perfect future Internet you can ever make.

We are missing a layer. This is grounded in the RINA model, which I like, and I actually use in teaching networking a lot more than any other model. In fact, I consider the OSI model a historical curiousity, a byway that was probably useful for a time, but is no longer useful. But the RINA model implies a fixed number of layers, in even numbers. The argument, boiled down to its essential point, is that since we have seven, we must be wrong.

The problem with the argument is twofold. First, sometimes six layers is right, and at other times eight might be. Second, we do have another layer in the Internet model; it’s just generally buried in the applications themselves. The network does not end with TCP, or even HTTP; it ends with the application. Applications often have their own flow control and error management embedded, if they need them. Some don’t, so exposing all those layers, and forcing every application to use them all, would actually be a waste of resources.

The Internet assumes a flawed model of end to end connectivity. Specifically, that the network will never drop packets. Well, TCP does assume this, but TCP isn’t the only transport protocol on the network. There is also something called “UDP,” and there are others out there as well (at least the last time I looked). It’s not that the network doesn’t provide more lossy services, it’s that most application developers have availed themselves of the one available service, no matter whether or not it is needed for their specific application.

The bottom line.

When I left San Francisco to fly home, 2nd street was closed. Why? Because a piece of concrete had come lose on one of the buildings nearby, and seemed to be just about ready to fall to the street. On the way to the airport, the driver told me stories of several other buildings in the area that were problematic, some that might need to be taken down and rebuilt. The image of the industrial building process, almost perfect every time, is an illusion. You can’t just “build a solid foundation” and then “build as high as you like.”

Sure, the Internet is broken. But anything we invent will, ultimately, be broken in some way or another. Sure the IETF is broken, and so is open source, and so is… whatever we might invent next. We don’t need a new Internet, we need a little less ego, a lot less mud slinging, and a lot more communication. We don’t need the perfect fix, we need people who will seriously think about where the layers and lines are today, why they are there, and why and how we should change them. We don’t need grand designs, we need serious people who are seriously interested in working on fixing what we have, and people who are interested in being engineers, rather than console jockeys or system administrators.

Triangle Network Engineers

I’m speaking at the Triangle Network Engineers meeting on the 9th of March; click below to find out more.