The Self Operating Network
by Derick Winkworth
Software Defined Networking has passed through its first major cycle. Many associated ideas have been consigned to the trash bin of history, while many other ideas have either evolved or sprung anew. As a network engineer I really hoped, back in 2011, that SDN would address many of my day-to-day challenges. With very few exceptions, this did not happen. In fact, within the SDN movement there was open disregard for network engineers. They were often referred to as the “mainframe” engineers of the day. Six years later, most network engineers are still doing their jobs and SDN has not had an enormous impact on their day-to-day work. Interfaces and methods may have changed, but the complexity and day-to-day firefighting remain. As a network engineer, I must admit that much of this problem is our own fault. Let’s talk about what some of our challenges are.
It’s All In Our Head
Networking and Perl have a long history together. Unsurprisingly, networks are a lot like Perl scripts: They are often ‘write-only.’ In many circumstances, it can be difficult to understand what a network engineer was trying to accomplish when they implemented some part of their network. This is true even when the same engineer comes back some time later to some part of the network they, themselves, were responsible for implementing. Making changes in this kind of environment is just like making changes in a Perl script. The results can be unpredictable. When trying to solve one problem, you may introduce another. Unlike a Perl script, you don’t get to throw the network away when it’s confusing then start from scratch.
This stems from two major issues. First, there is no succinct, common parlance for describing the end-to-end behavior of a network. Describing the network as a system in this manner is difficult enough that most of the time there is no attempt to do so. This means that the network is often described in documentation at the device and link level. Second, the documentation is highly inconsistent. It exists in numerous formats including diagrams, spreadsheets, word docs, configuration guidelines, and so on. Various documents are often missing, incomplete, out-of-date, or never made.
Exceptions Are The Rule
In my 17 years of experience designing, building, and troubleshooting networks, I have noticed one persistent and insane behavior in the network engineering field. We have been trained, as a discipline across our field, to establish network standards for the networks we are responsible for. Once these standards are in place, we tell ourselves, then the network will grow and behave in a predictable manner. When pressure is applied to deviate from these standards, we’ll fight and say “no” and life will go on as expected.
This, of course, has never happened in my career. If the standards are established at Day 0, then by Day 30 there are already numerous forces applying pressure to deviate from those standards. Of course business and application requirements can change over time, but most variation in the network happens from other kinds of constraints. To name just a few: Power issues, cable plant issues, network device firmware issues, time constraints, budgetary constraints, and many times just plain politics. In spite of our collective experience telling us these things always happen, we react with shock and indignation when they occur. Over time every network grows organically and becomes more and more a collection of exceptions. Consistency within one network is impossible to achieve, let alone across many networks.
When we consider the combined effects of poor documentation (at the “end-to-end” service level as well as the device level) and the inevitability of variations in the network then we, as network engineers, can see the source of much of our day-to-day stress. The signs are clear as day. We can never know what the outcome of a network change will be in production, so every change must be carefully controlled. We work many, many weekend hours. Operations teams often simply escalate issues to second and third tier support, which is usually the same people that designed and implemented that part of the network to begin with. When we make changes we schedule them at times when there will be the least impact, but paradoxically this also means we may not know what the impact is until applications are flowing during business hours again. And on, and on, and on…
The Self Operating Network
SDN taught us a lot of things, but for many people it didn’t address the core issues that network engineers face. Going forward, there needs to be a new a vision that addresses these challenges. First, we need a way to succinctly and programmatically talk about, and ultimately document, what the network should be doing as a system. By capturing the end-to-end network services the business requires, we can avoid the “write-only” problem that comes from trying to capture these requirements via device and link-level documentation. These latter details naturally derive from the former. This gives us a way to embrace, in a sane manner, the organic growth of the network and the variation that comes with it.
Further, from the description of these end-to-end services, we can derive an expectation model that the network can be validated against on a continuous basis. With this approach, we can evolve network automation from a workflow, task-oriented paradigm to an intent-driven closed-loop feedback system, further aiding network engineers in their day-to-day operations.
This is the Self Operating Network. This is the future of networking.