Around 90-95% of all of the digital data produced in public science becomes lost overtime due to a lack of standardized archiving practices in the scientific community. There are many different contributions to this staggering figure, most elementary is the absence of best-practices for archivists and institutions when concerning the preservation of scientific data. Purely digital artifacts come in a seemingly endless number of forms and often can be read or used on a limited number of platforms. These digital artifacts when simplified in an attempt to force them to work across all platforms often loose the important behavioral qualities that made them usable and important in the first place. Another hindrance in the preservation of digital data is a lack of funding. Most resources (particularly grants) go to the creation of new data and research, which are short term projects with immediate results. Conversely, the preservation of data requires long-term (potentially infinite) funding.
Luckily, according to Gregory Goth the author of ‘Preserving Digital Data‘, suggests that the US and UK governments are beginning warm up to the idea of creating a preservation infrastructure by giving funding to organizations that will help to create discrete data management requirements. One organization that has already received funding by the NSF is iRODS (integrated Rule-Oriented Data System). iRods, in fact has already be adopted by data centers throughout the globe including in the US, Canada, France and the U.K.
Personally, I have no background in the scientific research community, and prior to reading Goth’s article had assumed that individual users deployed version control on their projects and research, which in turn lead to more drafts and data being saved. Evidently this is not the case, and I am excited by the prospect that use of iRODS might become a standard practice in the field.
April 2012 | vol. 55 | no. 4 | Communications of the ACM: Preserving Digital Data by Gregory Goth