|
|
Distributing Search
The modern search engine (indeed, any portal, vortal, or other buzzword denoting a large amount of information served on-line) is in a sense the classic centralized service. Crawling the net for keywords, indexing and searching the database of keywords, and preparing the results as a web page are all bundled together in a single proprietary, centralized solution (centralized in the sense of the administrative boundaries, not in the sense of the numbers of computers needed to create any one service such as Yahoo).
While a Google, Yahoo, or Altavista are all complete solutions, there is no interoperability among these services.
Only through cheap hacks can one ask a question of Google and Altavista, and then combine the results
together. While there is no interoperability among these services, it is clear that each of them has created a space, a view of the underlying network.
The lack of interoperability among search engines and portals was certainly one issue, but these solutions
also missed some of the things we considered important, particularly programmatic access and late binding to visualization. While one can perform cheap hacks to programmatically access a Google, this is certainly not a satisfactory solution (indeed, if you send too many such programmatically-constructed queries, the system is likely to start refusing them). And, for the results returned, those are always in the format that the portal decides is appropriate: a banner ad, a few results, some formatting they decided looks good.
We took the modern search engine and asked ourselves what we could do to chop the monolithic application up into several pieces, each specializing in a specific task. Our solution came up with 3 pieces, each focusing on a piece of the puzzle:
- Finding things on the net, a process we call
mixing
- Managing that metadata, the function of a traditional
server
- Preparing that metadata for a particular application
and user interface, a process we call building
While the process of mixing can be achieved by a global web crawler (indeed a global bot is a mixer), our philosophy and hence the software we've built focuses on more limited, specialized crawling inside of deep wells and specifically targeted other resources. While leading-edge, all-inclusive algorithms to read every word on the Web are certainly an honorable activity, we also wanted to make sure that our architecture would leave room for more specialized agents under the control of domain experts. We wanted these specialized mixers to be easy to make.
A mixer, ideally, should be able to exist with many other mixers to create a space. Each mixer, focusing on a few tasks of limited scope, contributes a set of resources to the task. Since mixers can extract metadata not only from the underlying network but from the information produced by other mixers, the process becomes incremental. If today's big 5 search engines can only index 20-30% of the network, our vision is of 1 million little mixers, each examining 1%.
While the mixers are specialized modules, we left out any specification of how the mixers find things. These are implementation details. The server, on the other hand, clearly needs to be in the middle of an hourglass, with a very simple, fast and well-defined interface to the mixers.
One of our goals with the server was to use index and search techniques such as SQL databases and full-text engines, software that has become a commodity. Rather than writing our own, our goal was to have the server use these commodities as the engine, and to hide the details of any one commodity from the mixer.
The mixer uses a very loose definition on one side of the hour glass ("find things") and a very tight definition
of the interface to the server. Likewise, the builder has a very clearly defined interface to the
server, and a very loose definition on the outside of the hourglass. The job of the builder is to pipe data into any user interface (or other output source).
The architectural philosophy of builders, mixers, and servers has been expressed in an Architectural Precepts document,
in a core protocol (BXXP), and in a metadata application called the Simple Exchange Profile. These building blocks form the framework, but beg the issue of what to do with that framework.
Next » Avoiding the OSIfication of Space
Copyright © 1999, 2000 media.org.
|
|