Last tuesday I attended the first Maryland Institute for Technology in the Humanities Digital Dialogs talk of the fall semester. The talk titled Large Scale Text Analysis in the Digital Humanities: Methods and Challenges featured Aditi Muralidharan who talked about her research and experience creating a linguistic research tool called WordSeer (http://bebop.berkeley.edu/wordseer).
Aditi focused first on Wordseer which includes discovery, annotation and visualization tools and second on some recommendations for others engaged in digital humanities collaborations. WordSeer is based on a database of slave narratives and featured a semantic search that allows the researcher to explore the relationships between words (e.g. God described as good). This type of exploration of relationships builds on the sort of searching that can be done another similar tools and hints at the possibility of creating semantically rich maps of full text content.
Aditi demonstrated some of the visualization tools built into the system including a heat map that shows search terms in context of the entire corpus and a word/sentence tree that shows words in relation to all of the sentences they occur in. During the question and answer time some interesting observations were made about how tools like this are enabling the development of a new research agendas surrounding literary and historical texts in addition to helping to sustain or question previously completed research or thought in humanities fields. I found this to be an intriguing idea, particularly considered within the context of the sort of ‘resource discovery’ research that occurs in library settings where research systems that libraries support are based on a very different type of data.
As an example of the integration of this sort of discovery tool in traditional library discovery services, the integration of Google Book and HathiTrust search results in catalog searches is an interesting start. I wonder what impact full text analysis tools and semantic searching would have on these discovery systems. I was left wondering whether or not traditional library metadata would play a valuable role in these systems or if the syntactic and semantic analysis of the entire text of the library would render traditional access points like subject headings irrelevant. It seems possible on the surface at least that existing metadata would provide the researcher more context for topical analysis and that administrative and technical metadata could provide other useful tools.
As the discussion revolved around how to develop computational research skills in humanities scholars, I wondered about how library research skills might need to be updated to work with these systems. It seemed on one hand that a firm grasp of metadata structures might make interpreting the results of these systems easier and that the ability to carry search strategies across multiple systems might also add value to a research experience. By the same token – I get the sense that these research tools are not part of the general familiarity of librarians so that while librarians might be good at working with these systems and helping users find and work with them, that they are not generally aware of them.