We all should be concerned about preservation of valuable artifacts. It has been stated by IDC, that 98% of all new data being generated is digital.
Last week I had the privilege to attended a tour of one of the Regional National Archives facilities. Impressive as it was, it also was enlightening. This particular facility (1 of 15 Regional sites.) had just executed a barcode system. They had moved to a new facility, which necessitated them to handle approximately a million storage boxes. Perfect opportunity to upgrade to Barcode (A 30 year old technology.)
* Picture a standard sized office paper moving box… White, about 9 X 12 X 18 inches.
* Now picture a set of shelves. 14 shelves high. 4 boxes deep to a shelf.
* An isle of about 2 1/2 feet wide. Enough for a ladder.
* Picture a row of that configuration about 150 feet long…
* Picture a set of rows about 100 feet wide.
That is one vault.
This facility had 5 vaults. There are 2 other local vaults.
Total of 7 vaults. 1.4 Million Sq. Ft. 14 shelves high. Filled to 85% capacity.
This facility managed the data for the 100 Federal Agencies associated with four states.
Now picture 15 such facilities.
1.4 million sq ft each X 15= 20+Million sq feet of storage space… 14 racks high. ALL PAPER!!!!!
NOTE: Single copy of paper files….
This does not count the Federal Archives outside of Washington DC that are massive in comparison. This does not count the data maintained at the Agencies sites.
All of this data is at risk. Single copy. Single location. Hard copy only.
My point. This is a minor fraction of the data content that is being stored and generated within the Federal Government. These are cultural documents that need to be preserved. Many into perpetuity.
There was a worker at a station that was processing birth records by hand. Pull a file. Take a picture of one side of a page,,, Flip the page.. Take another picture… Re-file the hard copy. Run the images through an OCR process to digitize and record the data. Very manual process… The good news… Two copies. One original hard copy. The other digital.
In conclusion, If 98% of all new data is being generated in a digital format, and there are instances and examples of massive data farms (Hard Copy.) as indicated above, we as a culture and society need to be very concerned with data preservation. It is much easier to maintain Hard copy paper artifacts, than it is to maintain digital files for a long period of time.
The tour was a very enlightening experience. I was very impressed with the National Archives staff. What they have been able to do with very limited resources was and is vey impressive.
By the way,,, I have been told that the amount of data that is associated with the genealogical research and documentation efforts underway is in the area of many petabytes of data. That is the equivalent of several of the sites described above.
See the link to the Ancestral archaeologist: http://scribbler714.wordpress.com/2010/08/02/dark-ages-maybe-a-little-gray/