On the Design of Application Protocols - Feature Story, pg 1/6

Download This Article

This Mappa.Mundi feature article is available in the following formats:

About The Cover

"Blocks" by Brian Main: This month's cover reflects the power and elegance inherent in well-designed application protocols. Brian's unique illustrative style also graced the February 2000 issue of Mappa.Mundi.

On the Design of Application Protocols

By Marshall T. Rose

On the Design of Application Protocols

Marshall T. Rose is Chief of Protocol at Invisible Worlds, Inc. where he is responsible both for the Blocks architecture and the server-side implementation. Rose lives with internetworking technologies, as a theorist, implementor, and agent provocateur. He formerly held the position of IETF Area Director for Network Management, one of a dozen individuals who oversee the Internet's standardization process.

A Problem 19 Years in the Making.

SMTP[1] is close to being the perfect application protocol: it solves a large, important problem in a minimalist way. It's simple enough for an entry-level implementation to fit on one or two screens of code, and flexible enough to form the basis of very powerful product offerings in a robust and competitive market. Modulo a few oddities (e.g., SAML), the design is well conceived and the resulting specification is well-written and largely self-contained. There is very little about good application protocol design that you can't learn by reading the SMTP specification.

Unfortunately, there's one little problem: SMTP was originally published in 1981 and since that time, a lot of application protocols have been designed for the Internet, but there hasn't been a lot of reuse going on. You might expect this if the application protocols were all radically different, but this isn't the case: most are surprisingly similar in their functional behavior, even though the actual details vary considerably.

In late 1998, as Carl Malamud and I were sitting down to review the Blocks architecture[2], we realized that we needed to have a protocol for exchanging Blocks. The conventional wisdom is that when you need an application protocol, there are three ways to proceed:

find an existing exchange protocol that (more or less) does what you want;

define an exchange model on top of the world-wide web infrastructure that (more or less) does what you want; or,

define a new protocol from scratch that does exactly what you want.

An engineer can make reasoned arguments about the merits of each of the three approaches. Here's the process we followed...

The most appealing option is to find an existing protocol and use that. (In other words, we'd rather "buy" than "make".) So, we did a survey of many existing application protocols and found that none of them were a good match for the semantics of the protocol we needed.

For example, most application protocols are oriented toward client-server behavior, and emphasize the client pulling data from the server; in contrast with Blocks, a client usually pulls data from the server, but it also may request the server to asynchronously push (new) data to it. Clearly, we could mutate a protocol such as FTP[3] or SMTP into what we wanted, but by the time we did all that, the base protocol and our protocol would have more differences than similarities. In other words, the cost of modifying an off-the-shelf implementation becomes comparable with starting from scratch.

Another approach is to use HTTP[4] as the exchange protocol and define the rules for data exchange over that. For example, the IPP[5] (the Internet Printing Protocol) uses this approach. The basic idea is that HTTP defines the rules for exchanging data and then you define the data's syntax and semantics. Because you inherit the entire HTTP infrastructure (e.g., HTTP's authentication mechanisms, caching proxies, and so on), there's less for you to have to invent (and code!). Or, conversely, you might view the HTTP infrastructure as too helpful. As an added bonus, if you decide that your protocol runs over port 80, you may be able to sneak your traffic past older firewalls, at the cost of port 80 saturation.

HTTP has a lot of strengths, for example, it uses MIME[6] for encoding data and is ubiquitously implemented. Unfortunately for us, even with HTTP 1.1[7], there still wasn't a good fit. As a consequence of the highly-desirable goal of maintaining compatibility with the original HTTP, HTTP's framing mechanism isn't flexible enough to support server-side asynchronous behavior and its authentication model isn't similar to other Internet applications. In addition, we weren't of a mind to play games with port 80.

So, this left us the final alternative: defining a protocol from scratch. However, we figured that our requirements, while a little more stringent than most, could fit inside a framework suitable for a large number of future application protocols. The trick is to avoid the kitchen-sink approach. (Dave Clark has a saying: "One of the roles of architecture is to tell you what you can't do.")

You can Solve Any Problem....

...if you're willing to make the problem small enough.

Our most important step is to limit the problem to application protocols that exhibit certain features:

they are connection-oriented;
they use requests and responses to exchange messages; and,
they allow for asynchronous message exchange.

Let's look at each, in turn.

First, we're only going to consider connection-oriented application protocols (those that work on top of TCP[8]). Another branch in the taxonomy, connectionless, are those that don't want the delay or overhead of establishing and maintaining a reliable stream. For example, most DNS[9] traffic is characterized by a single request and response, both of which fit within a single IP datagram. In this case, it makes sense to implement a basic reliability service above the transport layer in the application protocol itself.

Second, we're only going to consider message-oriented application protocols. A "message" — in our lexicon — is simply structured data exchanged between loosely-coupled systems. Another branch in the taxonomy, tightly-coupled systems, uses remote procedure calls as the exchange paradigm. Unlike the connection-oriented/connectionless dichotomy, the issue of loosely- or tightly-coupled systems is similar to a continuous spectrum. Fortunately, the edges are fairly sharp.

For example, NFS[10] is a tightly-coupled system using RPCs. When running in a properly-configured LAN, a remote disk accessible via NFS is virtually indistinguishable from a local disk. To achieve this, tightly-coupled systems are highly concerned with issues of latency. Hence, most (but not all) tightly-coupled systems use connection-less RPC mechanisms; further, most tend to be implemented as operating system functions rather than user-level programs. (In some environments, the tightly-coupled systems are implemented as single-purpose servers, on hardware specifically optimized for that one function.)

Finally, we're going to consider the needs of application protocols that exchange messages asynchronously. The classic client-server model is that the client sends a request and the server sends a response. If you think of requests as "questions" and responses as "answers", then the server answers only those questions that it's asked and it never asks any questions of its own. We'll need to support a more general model, peer-to-peer. In this model, for a given transaction one peer might be the "client" and the other the "server", but for the next transaction, the two peers might switch roles.

It turns out that the client-server model is a proper subset of the peer-to-peer model: it's acceptable for a particular application protocol to dictate that the peer that establishes the connection always acts as the client (initiates requests), and that the peer that listens for incoming connections always acts as the server (issuing responses to requests).

There are quite a few existing application domains that don't fit our requirements, e.g., nameservice (via the DNS), fileservice (via NFS), multicast-enabled applications such as distributed video conferencing, and so on. However, there are a lot of application domains that do fit these requirements, e.g., electronic mail, file transfer, remote shell, and the world-wide web. So, the bet we are placing in going forward is that there will continue to be reasons for defining protocols that fit within our framework.

Next » Protocol Mechanisms

contact | about | site map | home