Project Report
For my final practicum project, I spent approximately 50 hours processing a series of 17 digital oral history interviews about the Nature Center at Shaker Lakes. The series is a subset of a rather large and diverse collection held by the History Department at Cleveland State University. Each interview was researched, indexed, abstracted, and cataloged in order to be published on theOhioLink Digital Resource Commons (DRC ), an online repository open to Ohio academic libraries. This brief report will provide background information, detail some of the processes and challenges encountered in the course of this project, and offer a model for continued processing of the complete collection, which consists of approximately 500 interviews as of late 2008.
The Cleveland Regional Oral History Collection was initiated by faculty at the Cleveland State University Department of History, beginning in 2002 as a component of Dr. Mark Tebeau’s research on the Cleveland Cultural Gardens. As part of their required coursework in local history seminars, students were asked to conduct oral history interviews with individuals from the city’s many ethnic communities. As the project expanded to include students in the department’s urban history and public history courses, the collection began to grow steadily, branching out into personal, neighborhood, and institutional histories. By 2005, the project directors had created a collaborative arrangement with Greater Cleveland Regional Transit Authority (RTA), Cleveland Public Art, and Ideastream, to expand the project and bring the interviews to the public via multi-media touchscreen kiosks placed along Euclid Avenue as part of the Euclid Corridor Transportation Project. The kiosks enabled the public to listen to selected excerpts from the interviews as they pertained to historic sites along the avenue, but otherwise the collection remained unpublished and largely inaccessible to students and researchers outside of theCSU history community.
Having worked extensively with the collection, first as a student and later as a consultant on the Euclid Corridor Project, it always bothered me that such a unique resource was not easily available to the public. Additionally, having accepted a job as Project Coordinator forCSU’s Center for Public History and Digital Humanities, I knew that I would soon be charged with remedying this problem. As a result, I decided to tackle the problem as part of my practicum, taking a week off work prior to starting my new position in order to come up with a reasonable and repeatable solution. Because I knew I would be working on this over time, in a position that includes many other duties, while utilizing the labor of non-MLIS graduate assistants in the History Department, a major concern was creating a system that was streamlined, well-documented and easily implemented, but which still employed sound cataloging standards.
In consultation with Kiffany Francis at the CSU University Library, it was decided that the DRC would be a fine place to begin the long process of publishing the collection. Although the CONTENTdm-driven Cleveland Memory Project (CMP) includes a few oral histories and documents many of the same topics in Cleveland history, that collection consists overwhelmingly of archival images. Furthermore,CMP was quickly approaching its item limit under their existing CONTENTdm license with OCLC; thus, initiating another large scale project at that point could have complicated matters for my project and for CMP. Using the dSpace-driven DRC as an alternative, we leave open the option of migrating the collection to another platform in the future should CMP expand their license, or should the History Department wish to create alternative or additional oral history exhibits in Omeka , a more flexible and customizable platform that is being developed by the open source community and the Center for History and New Media (but which is currently still in beta). In a sense then, theDRC space will serve as “smart storage” for the collection – a short-term solution that will facilitate organized online public access, but will not necessarily become the official “home” of the collection.
The first step was to investigate dSpace and the DRC. dSpace uses a qualified version of Dublin Core metadata , which is customizable within the DRC installation. After surveying the metadata used in some other digital oral history projects, and comparing that with what had been collected at CSU, I decided upon a core baseline of metadata fields to use across the collection. I would have liked to make this core a bit more robust; however, because the collection had been created in large part by students with varying degrees of commitment and attention to detail, there was not sufficient documentation to do so consistently. The resulting coremetadata looks like this (scope and usage guidelines may be viewed here):
- title
- identifier:filename
- identifier:other
- contributor:author
- date:created
- contributor:interviewer
- contributor:facilitator
- format:extent
- format
- format:mimetype
- relation:uri
- relation:isformatof
- relation:ispartof
- type
- language:iso
- rights
- description:abstract
- description:sponsorship
- description:tableofcontents
- subject:local
- subject:lcsh
- coverage:spatial
Once the preliminary metadata fields were chosen, I began processing each item. Due in part to the lower level of documentation for the collection, but also due to the nature of oral histories, this process posed some unique challenges. To begin with, you cannot browse an unprocessed oral history or check the title verso page for publication details – the interview must be listened to and analyzed. For each interview (ranging in length from 30 minutes to 2.5 hours), I created a minute-by-minute logs to gain as much information as possible, noting topics of discussion, and gleaning information about the item which was not included in the existing documentation (for example, oftentimes, the full name of the interviewer, facilitator, or even the subject was missing). These audio logs formed the basis for the abstract, the subject terms, and the table of contents for each item. Listening to the audio alone accounted for roughly half of the project hours. That is just for 17 items, or about 4% of the total collection (providing a clear illustration of why these interviews, most completed in 2006, have yet to be published in any form). Using the audio logs, I abstracted and classified the items, which necessarily involved doing some basic research on the interesting history of the Nature Center at Shaker Lakes. Although I included a table of contents field in mymetadata plan, I was unable to find a suitable formatting solution by my deadline. As such, I will continue to investigate this problem in the future, perhaps creating multi-bit stream records that contain both the audio file and a separate file containing the log sheet.
The next step was to organize my files for submission. Established DRC users can upload items and metadata directly through an online browser interface; however, at this time, custom metadata fields are not available to first time users. In my case, the custom fields had to be submitted to DRC administrators at OhioLink at the time of submitting the initial package of items. This was made easy by OhioLink’s bulk submission process, which involves using a macro-enabled spreadsheet to distribute metadata records into item folders within the project directory. After running the macro, each folder in my project directory now contained the original audio file, and XML file containing themetadata, and a contents manifest. The final step was to burn the entire project directory to a DVD for delivery to OhioLink.
A link to the records (and perhaps further discussion) will be posted here when the items are published. Unfortunately, the timeline for this is out of my hands at this point.
–UPDATE: A portion of the series may be viewed on the Ohio Link DRC Development space here: http://drcdev.ohiolink.edu/handle/123456789/3071. At the moment, only three records are appearing, but they may serve as examples for the larger collection. The formatting for the page is less than ideal, but this is a temporary home for these files. Be sure to click on “Full Record” to see all metadata fields.
Although this process was rather complicated considering my relative lack of experience in the field, I feel that I have established a system by which the remainder of the Cleveland Regional Oral History Collection may be processed and published toOhioLink’s Digital Resource Commons. In the future, it is likely that graduate assistants in CSU’s History Department will be doing much of the work toward accomplishing the total publication of the collection. Their main responsibilities, as I envision, will be to listen to interviews; collect allrelevant information about the interview, speaker, and topic; create audio logs; choose appropriate keywords relating to Cleveland history; and compose abstracts of the interview. Of these, only thelatter most is a new responsibility and is a relatively simple affair following naturally from the rest of the processes. Topical keywords, previously assigned to facilitate searches in the offline project catalog (a multi-tab spreadsheet), could be easily turned toa basis for choosing basic subject terms from the Cleveland Memory Project controlled vocabulary. It remains to be seen if further cataloging duties may be trusted to non-MLIS students . Nevertheless, the fact that they will be able to do the bulk of the processing work is both promising in terms of publication goals, and also gives the students valuable experience and insight into the library and archival practices employed to their benefit as researchers and historians.