Maps, Space, and Other Metaphors for Metadata, by Carl Malamud and Dr. Marshall T. Rose

Download This Article

This Mappa.Mundi feature article is available in the following formats:

Carl Malamud currently collaborates with webchick at media.org. He was the founder of the Internet Multicasting Service and is the author of eight books.

Marshall T. Rose is Chief of Protocol at Invisible Worlds, Inc. where he is responsible both for the Blocks architecture and the server-side implementation. Rose lives with internetworking technologies, as a theorist, implementor, and agent provocateur. He formerly held the position of IETF Area Director for Network Management, one of a dozen individuals who oversee the Internet's standardization process.

Maps, Space, and Other Metaphors for Metadata

By Carl Malamud and Dr. Marshall T. Rose

Space as Better Metaphor

In 1998 (after several unsuccessful attempts to spin up the efforts), we formed Invisible Worlds and met several times with our Protocol Advisory Board. The Protocol Advisory Board provides advice and direction on the core specifications for the Blocks Protocol Suite. The Protocol Advisory Board is consensus driven, which means its advice is not necessarily the product of, or agreed to by, any particular member. Note that the participation of these people on the Protocol Advisory Board does not consist of or imply an endorsement by the member's respective employers, nor does their participation constitute an endorsement in any of their various official capacities. The members of the PAB at the time included David Clark, David Crocker, Paul Vixie, Paul Mockapetris, and Steve Deering.

We were determined to proceed on the production of an Internet Atlas, a large-scale effort to map the Internet.

But, what does mapping the Internet mean? A few people bought the proposition that you started with a critical mass of information from network topology and used this as a bootstrap mechanism, but not everybody was convinced. What became clear was that the map was the wrong metaphor. And, because the map was the wrong metaphor, we were solving the wrong problem. A map of a network topology assumes there is something to map. And, everybody is going to want to map something different. The map assumes a space to be mapped. Space is the proper metaphor and the map is one possible visualization of that space. Our first architectural principle thus became the late binding of the collection of the data to the means of visualizing that information.

Once we realized we were looking at a data flow architecture with resources being discovered and then visualized in a variety of ways, it became equally clear that the problem we were dealing with was a more general problem, the management ofmetadata. Metadata defines a space and is the raw material that one uses to navigate that space.

Space as a metaphor proved quite powerful, with immediate applications to maps (or other navigation means) not only of network topology, but of spaces such as the web, or a particular collection of related information.

Our job became one of providing the user with what David Clark calls the "up" button. Given a resource on the Internet, say a document or a router, our job became that of giving the user the ability to hit an up button, take a step above, look around, and "see" what resources are nearby. According to Clark, our system should allow the user to define what "near" means in any given context.

Our canonical example of a space became what we call "deep wells" of information. Take the SEC's EDGAR system as the beginning of a space.

EDGAR is a constant flow of filings by public corporations that accumulates over time. The dimensions of the space are pieces of metadata that are in common over several of these filings, such as the name of the filing corporation, the company's state of incorporation, the form type, or the company's Standard Industrial Classification.

In this space, an annual report (known as a 10-K) by Cisco might be "near" other objects on a variety of dimensions. An earlier 10-K by Cisco, other filings by Cisco, a 10-K by other companies in the same SIC code, or a 10-K in other companies that Cisco has invested in are all objects that are near the annual report in question.

The deep wells of information form the same bootstrap mechanism that we had hoped to achieve with network topology. The EDGAR database has several hundred thousand documents that are rich in metadata.

One of the things we realized, however, is that resource discovery and data mining are processes that are difficult to do. There are many different algorithms to use to discover things ranging from simple transformations based on regular expressions to complex linguistic analysis to determine the presence of certain forms of business events (e.g., "any evidence of insolvency disclosed in this document").

A requirement for our application (and hence for the underlying protocol supporting the application) was that many different methods of resource discovery had to be able to coexist. Spaces may be defined through a simple process (e.g., taking each EDGAR document and creating some metadata), but it must also be possible for the spaces to accumulate over time. In particular, we wanted many different resource discovery mechanisms to coexist peacefully with each other and with human beings.

It became clear that we needed to begin defining some architectural principles. To illustrate why this "location" and "navigation" protocol called Blocks was needed, we'd been giving the example of the pain caused by search engines that returned 30,000 results on a simple query. Why were the search engines not solving this problem of navigation?

Next » Distributing Search

contact | about | site map | home