The inspiration for undertaking this project in Digital Preservation,
came from reading an article by Carol Casey, as well as Rothenberg’s popular
one in Scientific American.
I was instantly intrigued by the fact that it was an ancient problem in a
modern situation. I was also infected with a great fear that my children may
have to live in the next Dark Ages if the word was not spread, and the problems
not thought about.
Because the topic of Digital Preservation is relatively new, I was met with a few problems. First off, any type of research or study was out of my scope not only because I had neither time nor resources, but because there is barely anything to study! Most institutions and organizations that should be worried about these issues, have no idea about them whatsoever. The next idea was the possibility of drawing up some policy guidelines for an organization with digital preservation needs. The trouble with that plan is that nobody is quite sure how to go about preserving digital information and so there are not many examples to create policy from.
However,
some ideas can be gleamed from what this project did turn into, which is 1) a
paper on the current status of Digital Preservation, and 2) a stepping stone for
further research. From
reading this paper, an organization looking to create policy would find this:
There are many paths to Digital Preservation, but all have problems.
In the end, a good preservation policy will take into consideration all
avenues, picking and choosing what is best (while keeping in close contact with
others in the field to see how they are handling similar issues).
Be certain not to discard original analog copies of information if it is
the only remaining copy. Be certain
to examine the long term costs of any
digitization. Always ask yourself
if this will be worth it in 5 years? The
key in all of this is an old Latin phrase:
Make Haste Slowly.
Here at the beginning of a new millenium, human beings have found themselves alive at a truly exciting and frightening period of transition. And it’s not just Y2k fever, or the cyclic jitters of the Fin-de-siecle – it indeed seems the world is in a birthing process. At the forefront of the change - influencing not only fads, politics, and the economy, but the deep currents of society, culture, civilization, and nature itself – is what we call technology. More specifically, it is digital technology that is acting as both plow and seeder in this field. It is the ability to store patterns of information in binary form – 10111101010100000100010 – on, off, on, off, off, off, on – it is this combination of electricity and logic that has vaulted us into the next communication revolution.
The world’s information institutions - libraries, archives and museums - whose duty it is to store, organize, preserve, and provide access to the people’s collective cultural knowledge are feeling this change directly and often painfully. For nearly the entire history of libraries, roughly 5,000 years, information has been analog. Even with the relatively recent invention of the camera and phonograph, library collections remained analog. Then, in the last half a century, digital was born – and in the last few decades it exploded. The storage, organization, and access to the information on these new mediums is full of challenges and difficult choices, but it is with preservation that the truly frightening troubles arise. It is with the preservation of digital information that this paper is concerned. It will present an overview of the issues and thinking surrounding this timely problem. Also included will be a comprehensive bibliography. Throughout the paper, there is hypertext linked to the world wide web – if reading this document on a screen, it will appear in underlined blue. A web site with links to many of the people, organizations, and projects mentioned can be found at: http://www.geocities.com/azitiz/Bibliographi.html
This project can be considered multi-purpose, dependent on the reader. If you are new to digital preservation, this paper will hopefully explain some issues and more importantly alarm you into further research and action. If you already are involved with thinking about these problems, the paper itself may not offer new insights (though it may), but perhaps the links to more specific areas, projects, and people will be of help. And to those whom the paper, and all it contains, has nothing new under the sun, then maybe it can be passed on to someone who may find it relevant. At the very least it can be used to help spread the word.
A final warning: Anything that is not in quotations and/or cited, is ultimately just my opinion. This is not a scientific paper. Nor do I believe in such a thing as straight non-fiction. All non-fiction writing is biased, though hopefully not prejudiced. This is an informal piece of writing and thinking. Any asides - whether parenthetical, set off by comas in a opinionated clause, or subtly included in the text in any other way – should be taken with this in mind.
The
Problem
Assigning blame is easy, and usually a reactionary way out of a complex situation. The problem we speak of when we speak of the problem of digital preservation, is an ancient one drowning in a modern condition. The problem of preservation has always been, no matter what age or tools available, providing continual access to information. This has always been the focus and challenge in the preservation field. The modern condition that threatens this important role of libraries, archives, and museums, is the relatively sudden explosion in our technology’s rate of change.
The world seems to be moving faster and faster every year. Stewart Brand (of the Long Now Foundation) writes in his new book The Clock of the Long Now: “Civilization is revving itself into a pathologically short attention span. The trend might be coming from the acceleration of technology, the short-horizon perspective of market-driven economies, the next-election perspective of democracies, or the distractions of personal multitasking. All are on the increase.”(:02) Though these trends are all interconnected, it is the acceleration of digital technology that worries the preservation field. Though certainly one of the major forces driving that acceleration is “short term-horizon perspective of market-driven economies” – and it may be this trend that is at the root of the digital preservation issue of media decay. Writer James Gleick also takes on this new culture of quick change in his book Faster: The Acceleration of Just About Everything. In it he quotes Kafka “There are two cardinal sins, from which all the others spring: impatience and laziness.”(247) Impatience and laziness are certainly two sins we are guilty of here at the end of the 20th century. What this all adds up to – the myopia, the acceleration, the impatience – are the two main problems of digital preservation: technological obsolescence and media decay.
Media decay is an issue in all areas of preservation. It is the problem of fighting against the third law of thermodynamics: Chaos. Entropy. Simply put, all things fall apart and the preservation specialist must figure out how to slow down this process in order to continue providing access. Over the centuries, this had mostly been an issue of preserving paper. Then photographs, microfilm, sound recordings, and a host of other media entered the picture to complicate matters. Digital information is stored in media too – magnetic tape reels, floppy disks, hard drives, CD-ROMs – and these containers decay as well. What we have recently discovered however, is that these storage containers are often much more fragile than their analog storage counterparts. Carvings in stone can last forever, animal skins – thousands of years, acid free paper – over 500 years, photographs and sound recordings if taken care have lasted over a 100 years so far. Yet CDs begin to lose information in around 15 years, videos and floppy disks have a shelf life of less than 10 years before decay begins. 10 years!
It appears that there is a pattern here. Each new medium provides more access, better distribution, and easier portability, but becomes increasingly more fragile. One could make the argument that this is all part of a balance (and necessary shift) that must be kept as cultural memory is distributed more and more widely (and shallowly) through a networked world, where media decay will eventually be moot because of a massive extended expanded nervous system – but until that possible future, preservationists must deal with the very real fact that storage mediums for digital information are extremely short lived and must be specially cared for.
Like all media, digital containers are most stable under certain environmental controls. Temperature, relative humidity, and light are all important factors to regulate. Each medium has specific ranges for these factors, and should be stored accordingly.
The libraries, archives, and museums of the world (despite their dreaming otherwise) hold very little power in economic driven societies. No matter how loud those in the information field scream, corporations that create CDs, videos and other popular modern information containers (that make up much of our library shelves) are not concerned with longevity. In fact, they probably count on the opposite. It is like the modern legend of fellow who designs a car that will hardly ever break down, gets wonderful gas mileage, and costs next to nothing. He is, of course, disappeared rather quickly by the combined efforts of the U.S. auto industry. There are however companies that are developing CDs and other media that will supposedly last for long periods of time, but in the end media decay is a minor problem compared to technological obsolescence.
Technological obsolescence is a fancy term to describe the situation that even those without computers know of: as soon as you buy a computer, it is out of date and worth a fraction of what you bought it for. Even worse, the programs you run on it probably wont run on the new computer you decide you need. And unless you keep downloading later versions of software, the programs may not even run on the computer that created them in the first place. Now imagine this on a world wide level. What is this madness? Certain industry leaders, politicians, and tunnel-vision futurists call it Progress. They are unfortunately confusing Progress for what it really is – change and growth. As Edward Abbey writes: “Growth for the sake of growth is the ideology of the cancer cell”[1]
Whether the cause of, and reason for, this rapid change is actual progress or just a headlong fling into the future (or more likely a little of both) the results are the same. Information cannot be retrieved from outdated systems – and by outdated, we are talking sometimes only three years!
Two years ago my parents moved from their house into a small apartment. Not having room for all of the “clutter” from the basement and attic in their new place, they asked me to come and claim any belongings I didn’t want sold. Going through a closet, I came upon a box containing my old Smith Corona Word Processor. Here was the machine I wrote on throughout junior-high and high-school. One by one I inserted floppy disks and was transported back. Here was essay, and poem, and story - priceless treasures from my past. These writings were certainly something I wanted to preserve.
I was then stuck with a problem. These
Smith Corona formatted disks were not readable on any other machine.
I could not copy them onto my monochrome 286 that was only a year or two
older, nor my new Gateway Pentium III. They
were stuck in a limbo. No problem I
thought, I’ll just print them all out on this machine.
This worked fine until I ran out of ink.
Not a single store in the area had the type of ink cartridge it needed.
Panic. My head began reeling
– what if the screen malfunctioned, an internal part broke, the drive went
screwy? If just getting ink was
this hard.....I had just had my first introduction....
Technical obsolescence is not just an annoyance for the family PC – it is already, whether realized or not, an enormous problem for all areas of society. Governments (on all levels), militaries, corporations (of all sizes), banks, universities, publishers, broadcasters, organizations of all types, scientists, researchers, writers, artists, musicians, not to mention millions of individuals, are ALL storing important information in digital form. And there already have been some scares, as well as losses.
Buried within a 1965 paper by Gordon Moore, called: “Cramming More Components onto Integrated Circuits”, was a prediction that the number of transistors that could be fit on a computer chip would double every year. Although this was later adjusted to a doubling every 18 months, the prediction is now called Moore’s Law and has so far proved correct. This means every year and a half, a system’s storage and speed increases, and its cost decreases. It is no wonder those in the information fields are nervous. How are we supposed to confidently store information for later access when suddenly mediums disappear from the market, particular drives are no longer produced, device drivers are no longer written, version 2.0 can’t read 1.2, parts are no longer available, or there no longer remains professionals who know how to fix your hardware when it crashes?
Is there an end in sight? Not likely. Ray Kurzwell, in his new book The Age of Spiritual Machines, writes: “The chip companies have expressed confidence in another fifteen to twenty years of Moore’s Law by continuing their practice of using increasingly higher resolutions of optical lithography (an electronic process similar to photographic printing) to reduce the feature size – measured today in millionths of a meter – of transistors and other key components. But then – after almost sixty years – this paradigm will break down. The transistor insulators will then be just a few atoms thick, and the conventional approach of shrinking them won’t work.”(21) And then, most certainly, we will have discovered some other leap. As Kurzwell and others point out, Moore’s Law seems to exist as a section of a much larger, more than century long, exponential growth curve of computing – one that has no foreseeable end.
Solutions
There are no easy solutions. In fact, there are – as of now – no solutions. There are however theories, discussions, arguments, and serious research circulating among the information professionals. One of the major problems of starting on the road towards solutions, is making people aware that there is an actual problem. Up until very recently, this has been a major obstacle. Why is this? It may be the seemingly inherent conservative (not political) nature of libraries, archives, and museums, it may be because the problem is relatively new, or perhaps it is because these professions have jumped on the digital bandwagon with what money and resources they have in tow and are either blinded to – or terrified of – the fact that they may be heading for a disaster. Or it simply could be that they don’t believe it.
There are however people who have become concerned with the problems of digital preservation. These people come from every corner of the information field, which is a positive step for professions which have historically worked in isolation. Whereas previously, libraries, archives, and museums quibbled over their differences and purposes, now because they are reducing much of their collections to ones and zeros, members of these institutions must work together. And more impressively, the library, archive, and museum folk are having to go outside their own exclusive “cultural” institutions to work with computer scientists, information technologists, business and industry, and government groups. Leibniz, who’s belief (in the late 17th century) that a universal character set (characteristica universalis) could help solve all the world’s possible problems would be overjoyed at such co-operation brought about by a binary code.
With the Australians leading the way, as they always seem to do in the library and archive fields, research and studies in the area of digital preservation have moved off into several directions. The most common idea put forth is migration. This is followed by considering emulation, and the creation of standards. Others have also suggested creating hard copies for all digital information, as well as storing hardware/software in computer museums. Most likely, it will be combinations of several of these techniques and ideas that the preservation field adopts.
Migration of information is a technique at least as old as human life, and the case certainly could be made that it has been around since the beginning of the universe. Simply put, migration is the moving of information from one storage medium to another, usually because the former is outdated. The classic image of migration from our (relatively) recent past is that of the squinting monk, hunched over a table, writing out passages of scripture. The introduction of microfilming spurred perhaps the largest migration project to date in the modern world, and although the bane of many researchers, remains one of the preservation field’s most important technologies.
When those currently involved with digital preservation speak of migration, they are referring to the same basic process. The Task Force on Archiving of Digital Information’s report in 1996, defined migration as: “periodic transfer of digital materials from one hardware/software configuration to another, or from one generation of computer technology to a subsequent generation”[2] Digital migration covers three similar, yet separate areas. The first is digital information that was originally analog, such as a photograph that has been scanned, or novel that has been digitized. The second is information that has been “born” digital, like scientific data, a web site, Netart, or even this paper. The third is the migration from digital to a more stable non-digital medium, such as microfilming digital information, or printing out documents.
For many organizations and individuals, migration is the only technique currently used. In fact, many in the information field see the act of migrating analog to digital as an actual preservation method. While digitizing collections of photographs, text, or other information provides exponentially more access, this should not be thought of as a long term preservation strategy. Just this year, a state librarian and a librarian in charge of the databases for a large university, each responded to digital preservation policy questions with similar answers. The state librarian told me that: ‘sure they were practicing digital preservation. They were transferring much of their information onto computers.’ The database specialist in charge of the university wide library system told me that there was no particular preservation policy, but that they burned copies of CD-ROM’s for back up [in case of media decay or accident, I assume he meant]. In both cases, what is not being considered is the long term preservation of the information. Migration, even when used by organizations that view it as a long term method to deal with technological obsolescence, is at best only a Band-Aid solution.
The problems with migration for long term preservation are many. To start with, it is labor intensive, time consuming, and expensive. Because it is expensive, organizations often must make decisions of what to migrate and what to leave behind. There is also the issue of losing and/or corrupting information during migration from one medium to the next, as well as the extremely crucial aspect of “context”. By transferring a document from one medium to another, it is vitally important that the context is preserved! Over our history, these have always been problems inherent in migration. The labor involved (and all that follows with that), the possibility of information lost or changed, maintaining context for fullness of future interpretation, and the agonizing choices involved with selection – these have always been issues, the difference now is how often it happens. These cycles of migration are occurring more and more often as the technology changes come faster and faster. Another modern variation on the migration problem is that of multi-media – since this type of document is made up of different parts with different formats, different sections may have to be migrated at different times. A nightmare. And to make matters worse, because each following migration is completely different, with a technology we can in no way predict, each migration must be approached as entirely new with no past experience to guide us. This is a frustrating aspect for professions that base themselves on collecting knowledge to prevent just such a problem.
So although migration is widely used, and is perhaps the best temporary method of insuring digital preservation, it should not be blindly thought of as the solution. Libraries, archives, and museums who digitize their collections for access should realize the problems of preserving information this way, and NOT abandon their analog originals (at least not before checking to see if other institutions have them in their collections). Organizations that assume migration of their already digital information to new formats is the best method, should take a closer look at the costs – both financial and long term preservation wise. Migration, if properly and intelligently done, is an important aspect of digital preservation, but because of all the troubles listed above, it has sent those in the field searching for other solutions.
Emulation is an idea born out of the reality of technological obsolescence. The term emulation is a broad one. Bob Rasmussen, President of Rasmussen Software, Inc., writes: “[there is] ...terminal emulation, that is, making a PC emulate the behavior of another type of hardware. There is also in-circuit emulation (creating software analogs of digital hardware), operating system emulation (emulating Windows on Mac), hardware emulation (emulating an Intel chip on an Alpha), you can emulate a piano with a synthesizer, and on and on.”[3] The Australian National Library’s PADI (Preserving Access to Digital Information), describes emulation as: “the process of mimicking, in software, a piece of hardware or software so that other processors think the original equipment/function is still available in its original form.”[4] The idea being, that if you have information that was created on an obsolete technology, you can access this information in its original form, by fooling the computer into thinking it is that obsolete technology.
The idea itself is a relatively simple one, and one that has been successful in other areas of the computer field. One of the most popular examples of emulation can be found all over the Web within the subculture of classic video gaming. For instance, it was nearly impossible to play classic arcade games like Q-bert and Space-Invaders without transporting yourself back to the 1980’s. However, thanks to a generation of hackers who grew up on a diet of such games and wanted to preserve them (and don’t they deserve preservation just as much as the games of jump-rope or jacks do?), one can download an emulator that makes your Window’s 2000 OS, think it’s an arcade machine.
This is the type of idea that someone like Jeff Rothenberg believes can be translated into the digital preservation field well beyond the scope of video games.
Rothenberg’s 1998 paper, Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation, set forth some ideas (as well as questions) on how this was to actually be done. His plan is constructed around three types of information, encapsulated within each digital document. The first would be the document itself, the original software, and the OS (as well as any other files that made up the software environment). The second grouping would be the “emulator specification”. This would allow the creation of the emulator on any conceivable computer. This is no doubt challenging when given, as he points out, that the only assumption we can make about computers in the future is that “they will be able to perform any computable function and (optionally) that they well be faster and/or cheaper to use than current computers.”(section 7 web site note) The third part would contain “explanatory material”, including labeling information, annotations, human-readable “boot-strap” directions, documentation, and metadata.
One of the largest advantages to the emulation solution, is the preservation of context. Migration, and other methods often destroy this important, and often overlooked, aspect of preservation. Viewing a historical document or object out of context can completely twist its meaning. I am reminded of a wonderful book, Motel of the Mysteries by David Macaulay, in which a explorer of the distant future (4000 AD) discovers a motel room preserved from the 1980’s. He is convinced that the T.V. is an alter to the Gods, the toilet seat an elaborate piece of jewelry, and the plunger a musical instrument. Another advantage of emulation is that, unlike migration, once an emulator for a particular system is created, it can be used for all like documents. The encapsulated metadata can also make cataloging and other organizational duties easier.
Emulation seems like an idea that if could be implemented would work wonders. There are a few in the information field, who like Rothenberg, believe that it is worth seriously exploring, while others see it as much to impractical and difficult. In an attempt to get an outside (outside the library/archive/museum world that is) opinion on the matter, I wrote to several randomly chosen (from a Yahoo search of “emulators”) companies who created emulators, as well as a handful of arcade game emulator hackers. The two responses, to my idealistic questioning of emulators’ role in saving our digital cultural heritage, were full of skepticism. However, it is certainly an area for further research and thinking, because its possibilities are much more promising than many of out other choices.
The creation of universal standards, would seem to be an obvious solution to the digital preservation issue. While standards for media formats like audio, video, and image would be of great help, it seems an unlikely goal anytime soon. Not only is the arena to crowded, but it is always changing. Especially now, so early in the development of the Web and other multi-media avenues, it would be foolish to attempt standardizing file formats – not necessarily because it would retard creative development, or stall certain economic pockets, but because it could be arresting development in one of the greatest social and intellectual experiments of all time.
The most probable use of standards in the digital preservation puzzle, is in the realm of metadata. Metadata, one of the key buzzwords of the last few years, is literally data about data. It is embedded in the document, usually out of sight from the end user. It provides management information for the information professional. This could include description of content, location information, migration history, formats, dates, cataloging numbers, etc. An example of one of the major movements in standardizing metadata is the Dublin Core Metadata Initiative. The Dublin Core offers a set of 15 metadata elements, including title, creator, keywords, format, relation, date, resource identifier (URL), and others. Meant to be extensive and simple, Dublin Core hopes to create metadata standards for electronic resources all over the world. Eventually, the hope is that the burden of metadata will not be a retrospective task, but something done during the creation of a digital document.
Some of the other ideas floating around in the digital preservation field are creating hard copies, and creating computer museums. The idea of museums just isn’t very practical. Not only would this greatly limit access, but time would wear down parts - even ones taken care of in environmentally controlled areas. Chips and systems decay over time, despite our efforts to prevent them from doing so. Then there are the financial costs, which to keep such a place (places...) functioning for continual use would no doubt be staggering. This is not to say museums of obsolete digital technology should not be built (in fact they already exist). They could perhaps be of limited help in certain situations, and more importantly serve the purpose of any cultural institution: to preserve and provide access to our culture’s accumulated memory.
Creating hard copies of digital information is another solution sometimes suggested for preservation means. The largest problem with this idea is the loss of the information’s context. Even legally, this issue was decided in favor of maintaining context, when the Reagan White House told the courts (during the Iran-Contra fiasco) it would supply them with print out copies of executive e-mails. The court decided that these hard copies were not the same as the electronic versions, and demanded the e-mail in their original context. The context issue is especially crucial with Web documents that would lose their connectivity and non-linear nature if printed out.
There is also the idea of printing out the actual bit streams of information onto paper, or engraving them in metal. Not only does this daunting task seem extremely labor intensive, and expensive, but it sacrifices their original machine readability. Though a good idea for your personal e-mails, creating hard copies is not a large scale digital preservation solution.
Conclusion
If anything can be drawn from this broad look at the pressing problem of digital preservation, it is that more work must be done, more papers written, more research funded, more people made aware. A solution to these issues probably includes aspects of all ideas mentioned. Migration is certainly necessary, if not for the long haul (which it may be) then for now. Emulation, a promising idea, that must incorporate migration and standards. Even more importantly than the joining of these ideas, is the joining of different fields in and outside of the information professions. This is not just a library issue, this is a world wide, discipline crossing issue. Thankfully, this is underway and will hopefully increase and expand.
The FEAR is expressed by documentary filmmaker Terry Sanders: “I have a fantasy that preservationists of the digital information age are like passengers who have wandered up on the deck of the Titanic and see the approaching iceberg. They run down to the grand ballroom to warn everyone. There’s a great party raging, and none of the passengers really wants to hear about icebergs. They prefer to keep dancing and simply refuse to heed the warnings. Finally, some in the festive crowd grudgingly concede the possibility of an iceberg ahead, but they confidently assure everyone that there’s nothing to worry about. The Titanic’s officers and crew have everything under control. And, in any case, the ship is unsinkable.”[5]
The HOPE is here in a message from Colin Webb: “....there is a huge gap in awareness that needs to be filled. We shouldn't be too pessimistic: the increase in awareness has been pretty impressive over the past 10 years or so! And there has been an even more impressive effort in looking for answers - a lot of people are actually working on this. So don't be too dismayed.”[6]
[1] Abbey, Edward. One Life at a Time, Please. Henry Holt and Co., New York. 1988. pg. 21
[2] Preserving Digital Information: Final Report and Recommendations. Task Force on Archiving of Digital Information; (05/1996)
[3] Rasmussen, Bob. E-mail to author. Mon, 6 Dec 1999 11:31:34 -0800 (PST) [see Appendix for full letter]
[5] Sanders, Terry. “Escaping the Digital Dark Age”. Library Journal. February 1999, Vol.124, Is.2
[6] Webb, Colin. E-mail to author. Mon, 29 Nov 1999 08:46:35 +1100
Alexander Zimmerman Copyleft 02000