Mappa.Mundi Magazine - Why search engines are clueless


	David Strom is a networking and communications consultant based in Port Washington, NY. Along with Marshall Rose, he co-authored Internet Messaging: From the Desktop to the Enterprise (Prentice Hall). » Complete Bio » Informant Archives



	Related Links Links that are related to the article: » Rocket Science » Danny Goodman's SpaceKit Viewer » Invisible Worlds » The Importance of Being EDGAR by Carl Malamud. » David Clark » The MIT Laboratory for Computer Science (LCS)



Buy the Book Today!



	Editor's Choice Check out these past articles by David Strom hand-picked by the staff at Mappa Mundi. » Recommended Reading » Making Beautiful Music on Your PC » The coming broadband congestion » I’ve Been Hacked » Home Networks

By David Strom, david@strom.com

Web Informant Archives »

Why search engines are clueless

Web Informant #169, 21 September 1999

Finding things on the Internet is still way too difficult, even for experienced users who spend a lot of time online. To fix this, Danny Goodman and his colleagues at Invisible Worlds, a San Francisco-based Internet startup, have developed a series of tools called the SpaceKit Viewer. While the tools are still new and unfinished, the idea is to make searching more effective by exposing the underlying relationships among the data. (Disclaimer: I am an advisor to IW.)

Why is searching so hard? Several reasons. With each search portal such as Yahoo, Infoseek and AltaVista, you have a different universe of web sites covered by the portal. Each site also uses a different syntax for composing your queries, especially for queries containing more than one keyword. For example, one site might require quotation marks around a series of words while another will require conjunctions such as “and” to refine your query.

	Self-promotions dep’t
	My analysis entitled “The Caching Question” appeared in Internet World this past week. I review the several different kinds of caching software servers and services and the various vendors involved.

Often you get either too much or too little information from your queries - there is no easy way to narrow or expand your search without a great deal of trial and error. Many sites order their search results by some kind of relevance ranking -- I never have found these very useful myself. The number of keywords you enter also impacts the accuracy of your search. If you aim for precision and enter too many keywords (three or more), you often miss relevant pages. But one keyword isn’t usually enough to uniquely identify a topic.

Finally, once you complete a search you can’t easily return to your result set after a period of days or weeks, unless you happen to remember the keywords and search portal you initially used.

The problems I am pointing out, which we’ve all experienced, are due to the fact that searching the Web requires a tool that does more than simply bookmark a few starting points. It also has to do more than string queries together with a series of qualifiers, and understand the relationships of the web pages being searched. Ideally, the tool should also have the ability to personalize and save our searches as well.

Enter Danny Goodman’s SpaceKit Viewer from IW. The viewer is designed to search specific databases or collections of web pages and get around these problems with general search portals mentioned above. This makes it suitable for use in intranets or for eCommerce sites where good searching is important. [Danny Goodman's SpaceKit Viewer can be found at: http://edgar.space.invisible.net, ed.]

While the viewer is still new and far from finished, there is a demo of the product which is working with two specific databases: the several thousand Internet Requests for Comment (RFCs) and EDGAR, which is the Securities and Exchange Commission’s database of corporate filings.

You can search for particular keywords in these documents, and view which items all documents in your result set have in common, such as a company director or the particular form used by the company to file with the SEC. Let’s say you want to see all the filings by a particular company. You bring up the viewer and type in the company name or stock ticker, and press return and get a bunch of results. Each result is linked to the actual document itself, so you can click on it and get the full text of the particular filing. This seems pretty ordinary so far -- most search sites don’t operate any differently.

But now you want to group your results into something that will be more meaningful: such as all the 10-Q forms or sort all the filings by the filing date. And you’d like to save your query to come back to it for future reference. These are the viewer’s most impressive features. All of your search results can be sorted in ways to make them more useful to your own circumstances. This is what David Clark, Senior Research Scientist at MIT’s Laboratory for Computer Science means when he says, “when you wander around on the Web, you would like to get above it and look around.”

This is all still very new stuff. Eventually, IW will have a commercial product that you can implement with your own datasets and on your own servers, along with documentation for its protocols. But in the meantime, you can try out the viewer on EDGAR and read more about what IW is doing at their site.

Web Informant copyright 1999 by David Strom, Inc., reprinted by permission
Web Informant is ® registered trademark with the U.S. Patent and Trademark Office.
ISSN #1524-6353 registered with U.S. Library of Congress.

contact | about | site map | home