|
|
Flowing From Site to Site
Brian Reid's USENET Traffic Flow Maps (circa 1986-95) |
|
Knowledge of how much data traffic flows between different points on the globe is pretty much non-existent today. There are very few publicly available traffic maps and those there are tend to be country-level aggregations [1] or just limited to a single company’s network. In fact, it is likely that we knew more about data flows in cyberspace ten years ago than we do today. One of the most notable traffic mapping projects, from an era before the Internet went mainstream in the mid 90s, was undertaken by Brian Reid. Starting in 1986, he produced very detailed maps of traffic flows for the USENET network for nearly a decade.
Reid’s geographic flow maps were produced monthly from data gathered using a distributed USENET measurement system known as Arbitron. This system gathered both detailed statistics on data flows (the number of news articles) between sites across the world and readership statistics for newsgroups. Reid started the project in 1986 whilst a faculty member at Stanford University. Like most university projects, it was one person (me, the professor) with various graduate student assistants , said Reid when Map of the Month asked him about his flow maps in a recent email interview. Reid’s background is in computer science, with a doctorate from Carnegie Mellon University, where he made significant contributions to the development of Internet email protocols. From the late 1980s through the 1990s Reid worked at Digital Equipment Corp.’s Western Research Lab (WRL) [2] in Palo Alto, California and he was a key figure in both the technical and social development of the USENET [3] with WRL acting as a major backbone node in the USENET network for many years. Just recently he moved back to work at CMU as Professor of the Practice of Computer Systems in the department of computer science [4].
According to Reid, the hardest part of the Arbitron USENET measurement project was, Devising a technique that could be used to collect usable data, and finding something that had mathematical validity in the middle of all of the noise. Ignoring all of the people who, upon discovering that their own computer was not ranked in the top 10, started complaining about the methodology. In this article we are focusing on the flow maps produced, rather than the underlying measurement methodology and, as Reid noted, I think that instinctively I did not try to collect data that I did not know how to map, so the usual ‘figuring out how to map this data is the hardest part’ problem did not arise.
Although Reid ended the mapping project in 1995, the results are really now of historical significance, providing one of the best continuous series of maps of this portion of the Net [5]. Their value as a census of the USENET network was enhanced because they were so detailed, both in terms of spatial scales (showing flow between hundreds of individual sites on the networks) and temporal scale (monthly basis for nearly a decade). As Reid succinctly commented to Map of the Month, "I did these maps for about 10 years. During that time they were the world's primary source of information about which USENET nodes were 'important' and which nodes were not."
The map below is an example of one of Reid's flow maps, from May 1993, showing traffic for the worldwide USENET network. A simple symbology of black dots and smoothly curving lines represents the USENET sites and the flows between them. The line thickness is proportional to volume of flow and the small arrows indicate the direction of flow. This map represented several thousands sites approximately fifty different countries, although it is clear that the large majority were in North America and Western Europe. (The Arbitron USENET measurements for May 1993 covered some 35,920 site. Reid estimated that the total size of the USENET network at that time to be on the order of 87,000 sites and some 2.6 million people connected.) Indeed the most striking feature of the network apparent from the map is the sheer density of sites in the east and the west coast of the US which have effectively blanketed the map. Given the concentration of sites in the USA and Europe, it was hardly surprising that a great deal of traffic was flowing across the North Atlantic. (This has not changed today, with this being the busiest route for intercontinental data flows.) The peak volume of traffic shown on this map is was in around 57 megabytes per day.
|
Map of complete aggregate news flow, worldwide, from 13 May 1993.
(Courtesy of Brian Reid)
| |
At this global scale of representation the more local detail and patterns in North America and Europe are obscured by the dense over-plotting of data in a relatively small area of the map. One solution is obviously to produce maps at smaller scales showing particular regions, countries and cities. This is what Reid did, and below are two further examples, also from May 1993, showing Europe and the SF Bay-Silicon Valley area. (In addition, maps of various US regions, along with metropolitan maps of Boston, New York City and Washington DC were produced.)
|
|
Map of backbone news flows for Europe, from 13 May 1993.
Map of complete aggregate news flows San Francisco Bay area, from 13 May 1993.
(Courtesy of Brian Reid)
| |
Another way to reduce the data volumes cluttering the display is to select only the key features. Reid also used this strategy to show the most significant flows at the core of the network. As can be seen in the European map, only the flows between backbone sites are drawn. (Backbone sites are identified with labelled circles, while normal sites on the USENET network are simple black dots.)
The maps were produced with a custom-written cartographic tool called netmap. The maps were relatively simple in cartographic design, limited in part by the need for easy distribution and printing on the black and white laser printers of that time. While simple country boundaries provided necessary geographic context on the maps. Clearly, the USENET network followed many of the same patterns of geographical dispersion as the Internet. The uneven distribution Internet infrastructure across the globe is well illustrated in other maps - for example, Matrix.Net’s host maps or UUNET’s backbone network maps [6].
There have been other limited attempts to map USENET, particularly looking at the logical connectivity of the network [7]. And more recently, the Netscan project, directed by Marc Smith at Microsoft Research, has been analysing the social structures of the actual articles carried on the network [8]. Reid himself is interested in more detailed mapping of network infrastructure. As he says, I am very interested in producing maps that show physical-to-logical mapping of network reach into residential areas. But I’m not going to spend a lot of time on it now because the raw data aren’t available. Taking this further, Reid described his ‘dream’ map of the Net thus:
No one map. Depends on what I'm after. I like maps that show actual flows and not connectivity. I like maps that show the global importance of local phenomena.
If it were possible to determine the physical location of an IP address, and if it were possible to get a statistically meaningful sample of end-to-end flows, then I could make all of the maps I want. But the information isn't there, so I'm off doing other things.
Whoever can crack the problems of getting realistic measurements of global Internet traffic will be ready to flow mapping to the next level and fill in a major gap in or current knowledge.
Copyright © 1999-2001 media.org. ISSN: 1530-3314
|
|