Preserving online archives
Web Informant #172, 18 October 1999
Years from now, researchers looking back at the dawn of the web era may have a small fraction of the web to use for their research. Why? Because many of todays publishers, including the computer trade press, do a poor job of archiving old content. And as more publications fall by the wayside, their online archives disappear from view as well.
Even those publishers who have methods in place for obtaining back issues havent consistently carried that policy forward into the web era. This seems ironic since we used to have more complete archives in the days before the web, when back issues were easily available on microfiche. There are some notable efforts to preserve web content from outside the publishing industry, including Brewster Kahles Internet Archive project.
But for the most part all of our current digital technology doesnt much matter. Actually, what is involved in preserving archives isnt really a technical challenge -- it is mostly politics. Someone from On High has decreed that All Old Stuff Must Go.
It bothers me on several levels that this information is slipping from sight, and from sites. Were losing our rich techno-cultural history. I want to see what the pundits, reporters, and experts were saying years ago to learn from their mistakes. (Or maybe to poke fun of them in a subsequent column.) A few years ago I did a column looking back ten years in our industry. The only way I could do that research was to look through the printed paper archives that I gathered from friends and pack rats still working at the publications.
And, storage is cheap these days. The cost to keep the old content cant be much. Granted it can be a nuisance for web site operators to backup the files and remember the old links, especially if they change the file structure of the site and all those pages refer to now-broken links.
Also, as a writer of a few of these words of previous wisdom, I want to see my stuff preserved for all to browse. I mean, that is one of the reasons I got started in this business 13 years ago -- to make a mark, however small, upon the world. And when ZDnet (not to pick on them in particular) eliminates my old Windows Sources articles because they no longer produce the publication, it bugs me. Not to mention that now all MY links to these pages from my web site are broken too.
A few years ago we had ZDs Computer Library, an expensive monthly CD subscription that was the industry bible. It had tons of full-text articles from many (even non-ZD) pubs, although going back only for the past year. Then when CMP, ZD and IDG started their web efforts, it was a joy to be able to search their sites and come up with articles. Well, maybe joy is too strong a word. But it was certainly easier than finding the current CD or digging up an old paper issue. But computer publishers have begun to change their archives lately. The dead tree trade publishers are more interested in what is happening today than supporting what they said yesterday, let alone several years ago.
There are two particular items I want to cover here. First is being able to go to a page for your magazines archive and links to various issues going back several years. Second is to index this content in such a way that most ordinary humans will be able to easily refine a search through these archives and come up with useful results.
The big three publishers fail on both scores. Only a few of the individual magazines have clear links to any archive pages. (Byte.com is a good example here, but their archives only go back to 1994.) All three publishers home pages offer a search box that is all but useless in my opinion. You type in a few words and hit return, and what you get is usually too much, too little, or too unfocussed to really help you find what you are looking for.
Best of a Bad Lot
CMP is the best of a bad lot. They offer full-text archives of some publications going back to 1994, not far enough for me, but it is a start. You can refine your search by publication, date and other parameters from a screen that isnt too many clicks from the home page either. You can even search defunct publications.
ZDNet has taken the CD Computer Library and turned it into the Computer Magazine Archive going back three years with articles from several hundred publications (some are abstracts, not full text). It costs a few dollars per month for access. For free, ZDNet offers limited searches of their current publications. But these searches can be painful. You cant easily limit your search to particular publications unless you first do a simple search and then refine it.
With IDG, you can go to individual publication web sites and then use the search functions youll find there. (Computerworlds archives, for example, go back to 1994.) IDG.net has a search function that will scour all its publications web sites, but to use it you have to learn both the domain name used by the publication and the underlying Infoseek syntax. I would guess about ten people in the world could figure this out, even if they do come across the page explaining it all in rather gruesome detail.
Many Web-only publications dont fare much better when it comes to archiving content. The best example I found is John Decembers Computer-Mediated Communication magazine. Its archive page is a great example of how to place everything you might need about a publication together in one simple, single place. Too bad he stopped publishing the magazine last January. Still, all the old issues are still available here.
And if we want to pay for content for the general press, NewsLibrary.com offers many years of archives to newspapers such as the San Jose Mercury News, the Boston Globe, the Washington Post and many others. The past weeks archive and headline results are both free.
Of course, you might be wondering where I am going here. I have a page of links to all of my back issues, and another search page that covers all content Ive published on the Web, not just Web Informant issues. I admit this could be better, and one of these days Ill get around to improving it. But at least Ill leave the old stuff around for you to enjoy (and poke fun of, too).
Copyright © 1999, 2000 media.org.
Web Informant copyright 1999 by David Strom, Inc., reprinted by permission
Web Informant is ® registered trademark with the U.S. Patent and Trademark Office.
ISSN #1524-6353 registered with U.S. Library of Congress.