DH – 804 – Digitization and Crowdsourcing

During my junior year of my undergraduate, I wrote a paper entitled “American Revolutionary Sermons: Commonalities and Trending Rhetoric.” I found two collections of sermons, spanning from 1750 to 1783, that I did a close reading comparison on. Originally, I found the sermons on microfilm and microfiche at the library. During my first year in the PhD program, I revisited this paper in a digital project. I had discovered that these same two collections were available online through the ever glorious, Internet Archive. I was provided with OCRed .txt files that I was able to do a quick close reading to fix any blatant OCR errors (one collection suffered from some distortion from the feeder for the scanner). I was then able to text-mine these sermons, compare the results to my paper, and discover new avenues of inquiry (this project can be found here). This anecdote serves to show that DH projects rest on a base of digitized sources. Excluding the born digital sources that being around the mid-twentieth century (at the earliest), DH is at the beck and call of digitization efforts.

Digitization efforts seem to revolve around two camps. Those that are access oriented and those that champion preservation. Ricky Erway and Jennifer Schaffner in their publication “Shifting Gears: Gearing up to Get in the Flow” exclaim that access wins. The focus of all cultural heritage institutions should be to provide as much access to their collections as possible. Earwax and Schaffner argue that by nature, these institutions will preserve the original to the best of their abilities. The digitization process then should be about access.

At this point, I paused to wonder what they meant by access. The term is thrown around very easily without much time devoted to what kind of access is being discussed. Through my pondering, I came down to two different types of access: obtaining accessibility and knowing accessibility. The first, obtaining accessibility, is what most people think of when they are discussing accessibility. This is the idea that a researcher can obtain that document using the internet and a computer. Erway and Schaffner use access in this way. Yet, the second access, awareness accessibility or knowing that the document is available in a digital format and where that document can be found, is something that is often overlooked.

Being aware of what a digital archive contains is extremely important and valuable. Andrew Torget and Jon Christensen explore this conundrum in their article “Building New Windows into Digitized Newspapers.” They develop tools to illustrate the contents of a digital newspaper repository¬†across space and time. This visualization leads to an acknowledgement of “areas of emphasis” within that archive. There are time periods and subjects that receive more attention, they state, because those things receive more funding. This increase in transparency of the archive by highlighting its contents greatly increases its awareness accessibility. It is surprising that more attention is not given to this within cultural heritage institutions. The – sometimes – blindly accepted methods of searching digital archives are not set in stone nor are they premiere in their application. Erway and Schaffner argue that digitization needs to be adopted as part of the program instead of being labeled as projects. I would argue that a better understanding of accessibility should also be included in their program.

Erway and Schaffner state that with new technology comes update or improved copies of the sources. In my brief exposure to the field thus far, I would have to disagree with their sentiment. I have explored countless digital copies of sources that have been remediated multiple times. They originate as a hard copy, are scanned onto microfilm or fiche, then later that microfilm or fiche is digitized to create the digital copy. Roy Rosenzweig and Dan Cohen articulate in their “Becoming Digital” chapter of their book¬†Digital History that the move from analog to digital entails a loss of information. This can include marginalia, texture, possibly color (if a B&W scan), sometimes context and location on the page etc. I would say this exists with any transition a source goes through from one medium to another. Thus the transfer of source from hard copy to microfilm and then from microfilm to digital means information is lost. The “new” technology comes to rely on the previous technology instead of the original document. Let us not, in our rush to access, limit the depth and extent of that access.

