Reaction; Do we really need a new Internet?

The other day several of us were gathered in a conference room on the 17th floor of the LinkedIn building in San Francisco, looking out of the windows as we discussed some various technical matters. All around us, there were new buildings under construction, with that tall towering crane anchored to the building in several places. We wondered how that crane was built, and considered how precise the building process seemed to be to the complete mess building a network seems to be.

And then, this week, I ran across a couple of articles arguing that we need a new Internet. For instance—

What we really have today is a Prototype Internet. It has shown us what is possible when we have a cheap and ubiquitous digital infrastructure. Everyone who uses it has had joyous moments when they have spoken to family far away, found a hot new lover, discovered their perfect house, or booked a wonderful holiday somewhere exotic. For this, we should be grateful and have no regrets. Yet we have not only learned about the possibilities, but also about the problems. The Prototype Internet is not fit for purpose for the safety-critical and socially sensitive types of uses we foresee in the future. It simply wasn’t designed with healthcare, transport or energy grids in mind, to the extent it was ‘designed’ at all. Every “circle of death” watching a video, or DDoS attack that takes a major website offline, is a reminder of this. What we have is an endless series of patches with ever growing unmanaged complexity, and this is not a stable foundation for the future. —CircleID

So the Internet is broken. Completely. We need a new one.


First, I’d like to point out that much of what people complain about in terms of the Internet, such as the lack of security, or the lack of privacy, are actually a matter of tradeoffs. You could choose a different set of tradeoffs, of course, but then you would get a different “Internet”—one that may not, in fact, support what we support today. Whether the things it would support would be better or worse, I cannot answer, but the entire concept of a “new Internet” that supports everything we want it to support in a way that has none of the flaws of the current one, and no new flaws we have not thought about before—this is simply impossible.

So lets leave that idea aside, and think about some of the other complaints.

The Internet is not secure. Well, of course not. But that does not mean it needs to be this way. The reality is that security is a hot potato that application developers, network operators, and end users like to throw at one another, rather than something anyone tries to fix. Rather than considering each piece of the security puzzle, and thinking about how and where it might be best solved, application developers just build applications without security at all, and say “let the network fix it.” At the same time, network engineers say either: “sure, I can give you perfect security, let me just install this firewall,” or “I don’t have anything to do with security, fix that in the application.” On the other end, users choose really horrible passwords, and blame the network for losing their credit card number, or say “just let my use my thumbprint,” without ever wondering where they are going to go to get a new one when their thumbprint has been compromised. Is this “fixable?” sure, for some strong measure of security—but a “new Internet” isn’t going to fare any better than the current one unless people start talking to one another.

The Internet cannot scale. Well, that all depends on what you mean by “scale.” It seems pretty large to me, and it seems to be getting larger. The problem is that it is often harder to design in scaling than you might think. You often do not know what problems you are going to encounter until you actually encounter them. To think that we can just “apply some math,” and make the problem go away shows a complete lack of historical understanding. What you need to do is build in the flexibility that allows you to overcome scaling issues as they arise, rather than assuming you can “fix” the problem at the first point and not worry about it ever again. The “foundation” analogy does not really work here; when you are building a structure, you have a good idea of what it will be used for, and how large it will be. You do not build a building today and then say, “hey, let’s add a library on the 40th floor with a million books, and then three large swimming pools and a new eatery on those four new floors we decided to stick on the top.” The foundation limits scaling as well as ensures it; sometimes the foundation needs to be flexible, rather than fixed.

There have been too many protocol mistakes. Witness IPv6. Well, yes, there have been many protocol mistakes. For instance, IPv6. But the problem with IPv6 is not that we didn’t need it, not that there was not a problem, nor even that all bad decisions were made. Rather, the problem with IPv6 is the technical community became fixated on Network Address Translators, effectively designing an entire protocol around eliminating a single problem. Narrow fixations always result in bad engineering solutions—it’s just a fact of life. What IPv6 did get right was eliminating fragmentation, a larger address space, and a few other areas.

That IPv6 exists at all, and is even being deployed at all, shows just the entire problem with “the Internet is broken” line of thinking. It shows that the foundations of the Internet are flexible enough to take on a new protocol, and to fix problems up in the higher layers. The original design worked, in fact—parts and pieces can be replaced if we get something wrong. This is more valuable than all the iron clad promises of a perfect future Internet you can ever make.

We are missing a layer. This is grounded in the RINA model, which I like, and I actually use in teaching networking a lot more than any other model. In fact, I consider the OSI model a historical curiousity, a byway that was probably useful for a time, but is no longer useful. But the RINA model implies a fixed number of layers, in even numbers. The argument, boiled down to its essential point, is that since we have seven, we must be wrong.

The problem with the argument is twofold. First, sometimes six layers is right, and at other times eight might be. Second, we do have another layer in the Internet model; it’s just generally buried in the applications themselves. The network does not end with TCP, or even HTTP; it ends with the application. Applications often have their own flow control and error management embedded, if they need them. Some don’t, so exposing all those layers, and forcing every application to use them all, would actually be a waste of resources.

The Internet assumes a flawed model of end to end connectivity. Specifically, that the network will never drop packets. Well, TCP does assume this, but TCP isn’t the only transport protocol on the network. There is also something called “UDP,” and there are others out there as well (at least the last time I looked). It’s not that the network doesn’t provide more lossy services, it’s that most application developers have availed themselves of the one available service, no matter whether or not it is needed for their specific application.

The bottom line.

When I left San Francisco to fly home, 2nd street was closed. Why? Because a piece of concrete had come lose on one of the buildings nearby, and seemed to be just about ready to fall to the street. On the way to the airport, the driver told me stories of several other buildings in the area that were problematic, some that might need to be taken down and rebuilt. The image of the industrial building process, almost perfect every time, is an illusion. You can’t just “build a solid foundation” and then “build as high as you like.”

Sure, the Internet is broken. But anything we invent will, ultimately, be broken in some way or another. Sure the IETF is broken, and so is open source, and so is… whatever we might invent next. We don’t need a new Internet, we need a little less ego, a lot less mud slinging, and a lot more communication. We don’t need the perfect fix, we need people who will seriously think about where the layers and lines are today, why they are there, and why and how we should change them. We don’t need grand designs, we need serious people who are seriously interested in working on fixing what we have, and people who are interested in being engineers, rather than console jockeys or system administrators.

RTGWG Interim Meeting on Data Center Challenges

Last week, the Routing Area Working Group (IETF) held an interim meeting on challenges and (potential) solutions to large scale data center fabric design. I’ve filed this here because I spoke for all of about 3 minutes out of the entire meeting—but I really wanted to highlight this meeting, as it will be of interest to just about every network engineer “out there” who deals with data center design at all.

There are three key URLs for the interim

The agenda
The session slides and links to drafts presented
A Webex recording of the entire proceedings

My reaction, in general, is that we are starting to really understand the challenges in a networking way, rather than just as a coding problem, or a “wow, that’s really big.” I’m not certain we are heading down the right path in all areas; I am becoming more convinced than ever that the true path to scale is to layer the control plane in ways we are not doing today. You can see this in the LinkedIn presentation, which Shawn and I shared. I tend to think the move towards sucking every bit of state possible out of the control plane is a bad idea in the long run.

Specifically, the complexity model tells us that complexity forces us to trade off between state, (de)optimization, and surfaces (interaction surfaces, specifically). Our natural reaction, when we face complexity, is to either move entirely towards solutions that remove state, and ignoring layering (creating new interaction surfaces) as a solution, or to move entirely towards solutions that layer in order to split state up. We rarely think about using intertwined solutions that carefully attempt to choose specific points at which layering is used to hide information versus hiding information by simply abstracting it away. Optimization often comes out the loser; we end up pushing more complexity back into the network to re-optimize, or simply living with the de-optimized network.

Overall, this was a very interesting set of presentations, from some of the largest data center fabric operators in the world. It’s well worth listening to the whole thing.

Can I2RS Keep Up? (I2RS Performance)

What about I2RS performance?

The first post in this series provides a basic overview of I2RS; there I used a simple diagram to illustrate how I2RS interacts with the RIB—


One question that comes to mind when looking at a data flow like this (or rather should come to mind!) is what kind of performance this setup will provide. Before diving into the answer to this question, though, perhaps it’s important to ask a different question—what kind of performance do you really need? There are (at least) two distinct performance profiles in routing—the time it takes to initially start up a routing peer, and the time it takes to converge on a single topology and/or route change. In reality, this second profile can be further broken down into multiple profiles (with or without an equal cost path, with or without a loop free alternate, etc.), but for our purposes I’ll just deal with the two broad categories here.

If your first instinct is to say that initial convergence time doesn’t matter, go back and review the recent Delta Airlines outage carefully. If you are still not convinced initial convergence time matters, go back and reread what you can find about that outage. And then read about how Facebook shuts down entire data centers to learn what happens, and think about it some more. Keep thinking about it until you are convinced that initial convergence time really matters. 🙂 It’s a matter of “if,” not “when,” where major outages like this are concerned; if you think your network taking on the order of tens of minutes (or hours) to perform initial convergence so applications can start spinning back up is okay, then you’re just flat wrong.

How fast for initial convergence is fast enough? Let’s assume we have a moderately sized data center fabric, or larger network, with something on the order of 50,000 routes in the table. If your solution can install routes on the order of 8,000 routes in ten seconds in a lab test (as a recently tested system did), then you’re looking at around a minute to converge on 50,000 routes in a lab. I don’t know what the actual ratio is, but I’d guess the “real world” has at least a doubling effect on route convergence times, so two minutes. Are you okay with that?

To be honest, I’m not. I’d want something more like ten seconds to converge on 50,000 routes in the real world (not in a lab). Let’s think about what it takes to get there. In the image just above, working from a routing protocol (not an I2RS object), we’d need to do—

  • Receive the routing information
  • Calculate the best path(s)
  • Install the route into the RIB
  • The RIB needs to arbitrate between multiple best paths supplied by protocols
  • The RIB then collects the layer 2 header rewrite information
  • The RIB then installs the information into the FIB
  • The FIB, using magic, pushes the entry to the forwarding ASIC

standardsWhat is the point of examining this process? To realize that a single route install is not, in fact, a single operation performed by the RIB. Rather, there are several operations here, including potential callbacks from the RIB to the protocol (what happens when BGP installs a route for which the next hop isn’t available, but then becomes available later on, for instance?). The RIB, and any API between the RIB and the protocol, needs to operate at about 3 to 4 times the speed at which you expect to be able to actually install routes.

What does this mean for I2RS? To install, say, 50,000 routes in 10 seconds, there needs to be around 200,000 transactions in that 10 seconds, or about 20,000 transactions per second. Now, consider the following illustration of the entire data path the I2RS controller needs to feed routing information through—


For any route to be installed in the RIB from the I2RS controller, it must be:

  • Calculated based on current information
  • Marshalled, which includes pouring it into the YANG format, potentially pushed to JSON, and placed into a packet
  • Transported, which includes serialization delay, queuing, and the like
  • Unmarshalled, or rather locally copied from the YANG format into a format that can be installed into the RIB
  • Route arbitration and layer 2 rewrite information calculation performed
  • Any response, such as an “install successful,” or “route overridden” returned through the same process to the I2RS controller

It is, of course, possible to do all of this 20,000 times per second—especially with a lot of heavy optimization, etc., in a well designed/operated network. But not all networks operate under ideal conditions all the time, so perhaps replacing the entire control plane with a remote controller isn’t the best idea in the world.

Luckily, I2RS wasn’t designed to replace the entire control plane, but rather to augment it. To explain, the next post will begin considering some use cases where I2RS can be useful.

Tags: , , |

DHCP Topology Customization Options

The Dynamic Host Configuration Protocol (DHCP) is widely used, and yet poorly understood. There are, in fact, a large number of options in DHCP—but before we get to these, let’s do a quick review of basic off-segment operation.


When the client, which has no IP address, sends out a request for configuration information, what happens? The Router, A, if it is configured to be a DHCP helper, will receive the packet and forward it to the DHCP server, which is presumably located someplace else in the network. The router generally knows what address to send the request to because of manual configuration—but how does the server know how to get the packet back to the original requester?

The helper—Router A in this case—inserts the IP address of the interface on which the request was received into the giaddr field of the DHCP packet. As boring as this might seem, this is where things actually get pretty interesting. It’s possible, of course, for a router to have an logical layer three interface that sits on a bridge group (or perhaps an IRB interface). The router obviously needs to be able to put more information in the DHCP request to handle this sort of situation. Or perhaps the DHCP server needs information beyond a simple IP address to assign an IP address correctly—something like the topological location of the requesting host.

A number of additional DHCP fields have been proposed, standardized, and implemented to cover these situations. Until recently, though, you’ve always needed to go dig up each case and extension individually, because there was no single, up to date, reference anyplace.

Now there is.

This last week, an IETF draft called Customizing DHCP Configuration on the Basis of Network Topology passed through last call on its way to informational status (so it will soon have an RFC number, rather than just a draft name). This draft describes the documentation for, and use of DHCP options, including a relay agent running on a host, cascaded relays, regional configurations, and multiple subnets on the same link.

It’s well worth reading if you want to round out your knowledge of DHCP.

What should IETF “standard track” actually mean?

This post is going to be a little off the beaten path, but it might yet be useful for folks interested in the process of standardization through the IETF.

Last week, at the IETF in Buenos Aires, a proposal was put forward to move the IPv4 specifications to historic status. Geoff Huston, in his ISP column, points out the problem with this sort of thing—

As one commenter in the Working Group session pointed out, declaring IPv4 “Historic” would likely backfire and serve no better purpose other than exposing the IETF to ridicule. And certainly there is some merit in wondering why a standards body would take a protocol specification used by over 3 billion people, and by some estimated 10 billion devices each and every day and declare it to be “Historic”. In any other context such adoption figures for a technology would conventionally be called “outstandingly successful”!

The idea to push IPv4 to historic is, apparently, an attempt to move the market, in a sense. If it’s historic, then the market won’t use it, or will at least move away from it.


reaction-02Another, similar, line of thinking came up at the mic during a discussion around whether to move a particular set of specifications to standards track or experimental track. To be precise, the only requirement for standards track is the existence of two interoperable implementations—a hurdle over which the set of specifications under consideration have. The discussion at the mic centered around the commercial viability of the solution, which often also plays into the designation of a document as either a standard or an experimental.

The argument offered was that if these documents were taken to experimental status no-one would be serious about deploying them. The reality, on the other side, is that no matter what status this set of documents obtains, no-one is going to deploy the solution described in any widespread commercial system. No amount of tweaking and messing around is ever going to make this particular solution deployable; there are structural issues that cannot be overcome in any meaningful way forever.

So, what’s the point? There are two, actually.

First, one of the problems the IETF faces is the thought that somehow, in some way, if you build standards, they will come. The IETF, like most standards organizations (and most open source projects), vastly overrates their importance and power in the face of real commercial pressures. Again, this isn’t just a problem the IETF faces, it’s just a reality of all standards bodies and open source projects across all time, and probably in just about every field of endeavor.

Making a standard isn’t the same thing as making a commercial product. Making a commercial product isn’t the same as making a standard. Refer to the entire article by Geoff Huston on this point if you want more information about the reality of historical and active standards.

Second, when you’re reading IETF documents, don’t take the status of the document as a guide to the usefulness or adoption of a protocol or technology. Document status often doesn’t relate to the usefulness of a technology in the real world, but is rather the result of internal political struggles within the IETF itself. Another instance of this same thing is the wild warnings sometimes attached to specific individual informational drafts for no reason other than internal political considerations.

In other words, when you read an IETF document, engage your brain, rather than counting exclusively on the brains of others. Or perhaps: think for yourself.

By the way—before you rant, or say how broken the IETF is, I once worked for an engineering standards body that operates in a completely different area, and I have experience with a number of other standards bodies. Every standards body I’ve worked with, and every open source project I’ve looked at, has this same sort of problem. If you’re honest, your own IT shop has this sort of problem, as well. This is a human level problem, not a problem in the IETF specifically. Standards are politics in a very real sense. Remember that, and you’ll find the documents produced by any standards body—and projects developed in any open source environment—a little more useful in your daily work.

How the Internet Really Works

Way back in April of 2014, I started a series over on Packet Pushers called “How the Internet Really Works.” This is a long series, but well worth reading if you want to try and get a handle around how the different companies and organizations that make up the ecosystem of the ‘net actually do what they do.

DNS Lookups
The Business Side of DNS (1)
The Business Side of DNS (2)
Reverse Lookups and Whois
DNS Security
Provider Peering Types
Provider Peering and Revenue Streams (1)
Provider Peering and Revenue Streams (2)
Standards Bodies
IETF Organizational Structure
The IETF Draft Process
Reality at the Mic (Inside the IETF, Part 1)
Reality at the Mic (Inside the IETF, Part 2)
Reality at the Mic (Inside the IETF, Part 3)
Internet Exchange Points
That Big Number Database in the Sky (IANA)
NOG World (Network Operator Groups)
The Internet Society

The slides that go with this set of posts are available on slideshare, as well. This set is in Ericsson format, but I have older sets in “vendor neutral” formatting, and even cisco formatting (imagine that!).