We read it in the papers all the time. Some fragment of an ancient text is found and scholars invest countless hours to interpret and validate, even when most of the information is lost. In the digital age, however, all should be well, right?
I posted previously how successive upgrades of computers and operating systems, and archiving of old files, had led me to lose soft copies of huge numbers of files from research into the effects of privatization in Mongolia in the 1990s. It was only a hard copy that saved me.
Now I find myself encountering a similar problem, with a twist. When I search online for a copy of another old study from the 1990s on Mongolia’s informal sector, I find a nice light PDF. I vaguely recall producing that version, the rationale and the problems I encountered in doing so. I had written the report in 1997 and early 1998 using WordPerfect, with graphics pasted from Quattro Pro and statistical analysis by a batch version of TSP. My printed hard copies were used to print the nice Policy Research Working Paper in early 1998, and our document management system took the official PRWP and scanned it for posting online. I remember thinking at the time that they should be able to simply make a PDF directly, but that our document retention folks had explained their reasons for scanning hard copies—reasons which I now appreciate.
As best as I can recall, I found that original scanned version of the paper to be unnecessarily large and therefore difficult to download. After all, it was nearly 5 megabytes. Five Megabytes! Huge! It would take an hour to download and wouldn’t even fit on a diskette!
So, at some point I set out to make a lighter version. The problem was that by the time I got around to it, we had migrated to Word. I took my WordPerfect version and created a Word version. I guess I shrugged off the loss of formatting, and even the fact that I could not include the questionnaires, and charged ahead in the name of making a lighter version of the PDF. And I guess I succeeded, as the light version is one tenth the size of the scanned version.
This afternoon I had occasion to revisit my old paper. I had been reading some colleagues’ work on Ulaanbaatar’s labor market and wanted to look at some of my numbers from the 1990s. I found the light version of the paper on the internet, but was dismayed by the formatting and loss of information. Although it took a while, I eventually found the original scanned version online. Not only had there been superficial errors in formatting (e.g., alignments in the table of contents), but some of the charts became difficult to read, letters were dropped here and there, etc. There were other changes of which I was aware, or even generated, at the time (e.g., with equations or charts I just gave up upon), leaving me wondering why I thought 5 MB (Five Megabytes!) was such a big deal.
In the end, however, our document retention experts were proven right. Scanning the printed copy gives the true depiction of what was published at the time. I am just glad that they still have that version online, even if the search engines point to the inferior “light” version. And while the march of technology helped generate the loss of information in the first place, the stubborn adherence to a hard copy-centered approach to archiving helped ensure that the information was preserved after all.