Code4Lib10 tuesday notes

The morning started with a nice 12 mile run based on a few different routes.  A closed road near the Biltmore house had me walking (terrified) across a very short railroad trestle before I noticed a nice pedestrian bridge (way to go Asheville!). Code4Lib is a very connected conference which leads to in-depth online note taking and an active IRC channel.

People, their digital stuff, and time: Opportunities, challenges, and life-long challenges

Code4Lib 2010 kicked off with a keynote by Cathy Marshall.  She mentioned a few interesting stats (a webpage weighs 80 micrograms – Brewster Kale), 4.5 Billion personal photos in flickr, and continued to talk about how people preserve, use, and manage their digital artifacts.  BTW – her talk was titled – People, their digital stuff, and time:  Opportunities, challenges, and life-long challenges.

For me a compelling question she asked was – “digital lets us keep everything – should we?”  She had a good point – you never get rewarded for deletion, but sometimes you get rewarded for saving.”(ala Gordon Bell).

She talked about a study she did on massed data where she typed the metadata (e.g. place, artifact, context) that was assigned to digital objects.  The interesting aspect of typing metadata assigned to digital objects is in thinking about metadata assignment within the context of teh idea that people don’t approach digital object description/archiving in a deliberate and systematic way.   Her final message was that “new opportunities lie in the aggregation of individual archives and efforts.”

cloud4lib – Jeremy Frumkin, Univ Ariz., Terry Reese, OSU – An open library platform.

They are working on building this at They have setup a collaborative workspace which they are running out of OSU libraries using a research account on Amazon EC2. They are having a breakout session at code4lib today to talk about how to structure this idea.

Ross Singer – Linked data

Ross talked about how he took MARC data & built a linked data service. Too complicated to talk about in depth here but you can find his presentation on the code4lib site. The really neat thing he did was take a vufind instance & embedded some RDFa into the template to make the linked data discoverable.

Harrison Decker – DIY cloud computing with Apache & R (rapache)

Data librarian, UCBerkeley. Like many things – data analysis in libraries is getting more complicated – libraries have an opportunity to help people learn how to analyze data & how we can use data analysis in our own work. His definition of the cloud “a replacement for the desktop, when it makes us work smarter, extendable.” He asserted the idea that decision makers are not statisticians but rather need processed charts & data. Rapache is an apache module with the R interpreter compiled into it, lets you embed an R script in web pages, provides interface to get/post (baseball example). Some ideas he had for use – interactive we/e-journal use visualizations, real-time survey results, data visualization for instruction, network analysis. . .

Public datasets in the cloud – Rosalyn Metz, Michael B. Klein

Their cloud definition ( on demand self-service, broad network, resource pooling, rapid elasticity, IAAS, PAAS, SAAS). She demoed launching an EC2 instance & mounting datasets hosted on EC2 for analysis. She mentioned socrata, Google fusion tables in which you can create tables and visualizations of your data. Michael talked about data access & issue and wondered where/how libraries might get data & asked what the role of IRB might be in data analysis.

Karen Coombs – OCLC – 7++ ways to improve library UIs with OCLC web services

Karen had a laundry list of ideas for using OCLC web services to extend & embed data in library discovery services. They are represented briefly – mostly because she flew through them!

  • Cross listing print & electronic stuff – Use WC search api to see if lib ha print version & link it to the full record of ebook
  • Item availability localization – Take zip code, OCLC number, bring up a google map, link to library catalog
  • Bring Journal TOC to feed to MARC record – XISSN to see if feed of recent content is available
  • Peer Review indicator – Use data from XISSN to add peer reviewed data to appropriate screens
  • Information about Authors – Use Worldcat Identities, wikipedia API to get author information, link together
  • linking to free full text – use XOCLCNum to check for free full text from OCA, Hathi Trust, etc
  • Similar items – Use dewey decimal classification, WC search api to retrieve similar items
  • Bonus – build a mobile catalog – use WorldCat Search api to get to your library catalog data, build in images from open library, ratings from librarything
  • links –,

Extensible catalog – Jennifer Bowen

The afternoon started off with an overview of the eXtensible Catalog. They have developed a suite of tools including user interface, metadata, and connectivity (NCIP, OAI) to serve as a holistic replacement for discovery & management services. XC includes frbrized metadata, “why you got that record” in the index display, works as a drupal module, includes some nice complex staff metadata management tools so that staff have flexibility in defining how the metadata is displayed. The toolkit (Bowen indicated that there were lots) allow you to automate data loading and processing! It appears that right now XC is still in a semi-release phase but all software is available.

Conference fatigue has set in – The rest of the sessions are discussed in abstract here

Ok, I take it back. Jeff Sherwood talked about using Levenshtein string distance as a method for matching records using a mathematical algorithm – very neat. They also usd the Jaro-Winler algorithm for string comparison. code is at,, secondstring for Java, MARCXimiL – a MARC deduplication package,

This entry was posted in Uncategorized and tagged . Bookmark the permalink.

Leave a Reply