Spacer Image Spacer Image Spacer Image
Web Informant
Mappa.Mundi Magazine
Spacer Image David
David Strom is a networking and communications consultant based in Port Washington, NY. Along with Marshall Rose, he co-authored
Internet Messaging: From the Desktop to the Enterprise (Prentice Hall).



» Complete Bio
» Informant Archives
Spacer Image
Spacer Image
Spacer Image
Spacer Image
Spacer Image Related Links
Spacer Image

Links that are related to the article:

» CMP Media Inc.

» IDG.com

» Ziff-Davis

» December Communications, Inc.

» NewsLibrary.com

» Computer Magazine Archive

» San Jose Mercury News

» Boston Globe

» Washington Post
Spacer Image
Spacer Image
Spacer Image
Internet Messaging
Buy the Book Today!
Spacer Image
Spacer Image
Editor's Choice
Spacer Image
Check out these past articles by David Strom hand-picked by the staff at Mappa Mundi.

» Attention loyal shoppers
» Why search engines are clueless
» Recommended Reading
» Making Beautiful Music on Your PC
» The coming broadband congestion

Spacer Image
Spacer Image
Khaki Left Top Corner
By David Strom, david@strom.com Web Informant Archives »
Spacer Image

Preserving online archives

Web Informant #172, 18 October 1999

      Years from now, researchers looking back at the dawn of the web era may have a small fraction of the web to use for their research. Why? Because many of today’s publishers, including the computer trade press, do a poor job of archiving old content. And as more publications fall by the wayside, their online archives disappear from view as well.

      Even those publishers who have methods in place for obtaining back issues haven’t consistently carried that policy forward into the web era. This seems ironic since we used to have more complete archives in the days before the web, when back issues were easily available on microfiche. There are some notable efforts to preserve web content from outside the publishing industry, including Brewster Kahle’s Internet Archive project.

Olive Left Top Corner Spacer Image
Self-promotions dep’t
Spacer Image
Olive Right Top Corner


My latest article for Computerworld reviews two long-time contenders, Laplink and PCAnywhere. Entitled, “Remote Control and File Transfer: Comparing the Two Champs,” it can be found here.

I wanted to also let you know about a multi-city tour of one of the companies I advise, Delano Technology. The tour will focus on how to use Delano’s eBusiness Interaction Suite of email/web applications products.


Khaki Left Bottom Corner
Khaki Right Bottom Corner

      But for the most part all of our current digital technology doesn’t much matter. Actually, what is involved in preserving archives isn’t really a technical challenge -- it is mostly politics. Someone from On High has decreed that All Old Stuff Must Go.

      It bothers me on several levels that this information is slipping from sight, and from sites. We’re losing our rich techno-cultural history. I want to see what the pundits, reporters, and experts were saying years ago to learn from their mistakes. (Or maybe to poke fun of them in a subsequent column.) A few years ago I did a column looking back ten years in our industry. The only way I could do that research was to look through the printed paper archives that I gathered from friends and pack rats still working at the publications.

      And, storage is cheap these days. The cost to keep the old content can’t be much. Granted it can be a nuisance for web site operators to backup the files and remember the old links, especially if they change the file structure of the site and all those pages refer to now-broken links.

      Also, as a writer of a few of these words of previous wisdom, I want to see my stuff preserved for all to browse. I mean, that is one of the reasons I got started in this business 13 years ago -- to make a mark, however small, upon the world. And when ZDnet (not to pick on them in particular) eliminates my old Windows Sources articles because they no longer produce the publication, it bugs me. Not to mention that now all MY links to these pages from my web site are broken too.

      A few years ago we had ZD’s Computer Library, an expensive monthly CD subscription that was the industry bible. It had tons of full-text articles from many (even non-ZD) pubs, although going back only for the past year. Then when CMP, ZD and IDG started their web efforts, it was a joy to be able to search their sites and come up with articles. Well, maybe joy is too strong a word. But it was certainly easier than finding the current CD or digging up an old paper issue. But computer publishers have begun to change their archives lately. The dead tree trade publishers are more interested in what is happening today than supporting what they said yesterday, let alone several years ago.

      There are two particular items I want to cover here. First is being able to go to a page for your magazine’s archive and links to various issues going back several years. Second is to index this content in such a way that most ordinary humans will be able to easily refine a search through these archives and come up with useful results.

      The big three publishers fail on both scores. Only a few of the individual magazines have clear links to any archive pages. (Byte.com is a good example here, but their archives only go back to 1994.) All three publishers’ home pages offer a “search” box that is all but useless in my opinion. You type in a few words and hit return, and what you get is usually too much, too little, or too unfocussed to really help you find what you are looking for.

Best of a Bad Lot

      CMP is the best of a bad lot. They offer full-text archives of some publications going back to 1994, not far enough for me, but it is a start. You can refine your search by publication, date and other parameters from a screen that isn’t too many clicks from the home page either. You can even search defunct publications.

      ZDNet has taken the CD Computer Library and turned it into the Computer Magazine Archive going back three years with articles from several hundred publications (some are abstracts, not full text). It costs a few dollars per month for access. For free, ZDNet offers limited searches of their current publications. But these searches can be painful. You can’t easily limit your search to particular publications unless you first do a simple search and then refine it.

      With IDG, you can go to individual publication web sites and then use the search functions you’ll find there. (Computerworld’s archives, for example, go back to 1994.) IDG.net has a search function that will scour all its publications’ web sites, but to use it you have to learn both the domain name used by the publication and the underlying Infoseek syntax. I would guess about ten people in the world could figure this out, even if they do come across the page explaining it all in rather gruesome detail.

      Many Web-only publications don’t fare much better when it comes to archiving content. The best example I found is John December’s Computer-Mediated Communication magazine. Its archive page is a great example of how to place everything you might need about a publication together in one simple, single place. Too bad he stopped publishing the magazine last January. Still, all the old issues are still available here.

      And if we want to pay for content for the general press, NewsLibrary.com offers many years of archives to newspapers such as the San Jose Mercury News, the Boston Globe, the Washington Post and many others. The past week’s archive and headline results are both free.

      Of course, you might be wondering where I am going here. I have a page of links to all of my back issues, and another search page that covers all content I’ve published on the Web, not just Web Informant issues. I admit this could be better, and one of these days I’ll get around to improving it. But at least I’ll leave the old stuff around for you to enjoy (and poke fun of, too).



 Copyright © 1999, 2000 media.org.

      Web Informant copyright 1999 by David Strom, Inc., reprinted by permission
Web Informant is ® registered trademark with the U.S. Patent and Trademark Office.
ISSN #1524-6353 registered with U.S. Library of Congress.



Spacer Image
Mappa.Mundi
contact | about | site map | home T-O