Highlight of BIBFRAME update forum June 2014 (1)

Mr Kevin Ford tells the story of BIBFRAME: Why and How in this 8 minutes video clip:

Transcript:

I wanted to begin with a very, very quick recap of where, what we’re doing, why we’re doing it, and what we hope to achieve with the BIBFRAME initiative in general and wanted scenarios to reimagine the entire bibliographical and eco system for a post-MARC world. If we were to look at our systems now it looks a little bit like this in a very rough schematic drawing. You have your OPAC, you have your ILS database, you have our OPAC, you might have a Z39.50 service sitting on top of that, you might have an SRU service sitting on top of that. There is a separate cataloging interface, the cataloging interface, of course, is what the cataloger works with. A developer works with Z39.50 and SRU. A patron has to go to your OPAC, and the patron understands your OPAC mostly, but when a machine goes to your OPAC it basically gets nothing out of it. The data is not described very well, the text is unclear. Semantically what a machine sees, and by machine I mean Google and Yahoo and the search engines, is a bunch of nothing. And so one of the things we hope to do with BIBFRAME in the long term is greatly simplify this scenario. There will always be a database of some kind or another in the

background, of course, and there will always be a web layer, but that web layer will become the function for or the locus of all activity. Catalogers will work at the web layer, developers will work at the web layer, patrons will hopefully understand equally, if not better, the web layer, in fact, the objective is to make it better, to exploit the links you find in RDA. And, finally, make it so that machines can understand it better. And one of the reasons this is so important is because with the Semantic web and with the current state of technology a lot of things are done under the hood with machines talking to each other. A lot of things with Google and Yahoo are coming to the website, trying to crawl your website and extract information from it. They extract very little from library websites right now, from OPACs and from catalogues, and we hope to improve that. It’s one of the main objectives of BIBFRAME. We want to make it so that where our user is starting their searches, we’re there. So, like I said, the structured data is exposed at the ILS web layer. HTTP becomes the common transport protocol. And the ILS web layer, itself, becomes a service layer. It’s, of course, the customary OPAC layer, but it also provides a layer for, a technological layer for search, auto technological search access, a machine-to-machine data access, and we’re looking at various web triggers to receive and send packets with that. But we are here right now, this is your standard MARC record, and we know that we can break this MARC record into smaller parts and link them together already. So it’s very easy to go, to step through this MARC record and identify the information that would align closely, most closely or be with a BIBFRAME work or an RDA work. This happens to be a translation, so we can actually identify the information that is, again, part of the BIBFRAME work and would be identified as information pertaining to the RDA expression. And, of course, there’s BIBFRAME instance data or RDA manifestation data, and even in this little record there’s a little bit of item information when you get that dollar b in the 050. Like I said, this is where we are. Machines don’t understand this very well, and you and I, librarians, know how to look at this output and make sense of it. We can see what pertains to the work, we can see what pertains to an expression, we can see what pertains to a manifestation and item information, but we need to help machines that are active today better understand these things. And machines tend to break these things into smaller parts, instead of creating monolithic composite records they tend to break these things into smaller parts, and we saw this slide, in fact, years and years and years ago, borrowed from Eric Miller. In this slide, more or less, represents the preceding MARC record. ^M00:19:59 You have a book, Mark Twain’s Huckleberry Finn, written by Mark Twain. We saw, we were looking, in fact, was a translation, so this translation is a translation of Mark Twain’s. They both have the same author. This translation, of course, has a translator. One was issued in paperback, one was issued in hardback, and both had different publishers. So breaking this information up into these types of bubbles and relating them with relationships is how machines understand this today and how many, many systems are built, they’re no longer built to handle a monolithic record that we were accustomed to. And so this is a little bit of RDF right here. It’s, again, very textual, it looks a lot like the MARC in that respect, but you’ll see with the space indicating the three separate things. So the top thing is the BIBFRAME work, the middle one is the BIBFRAME instance, closely aligns with an RDA manifestation, the bottom item, the bottom entry is a holding resource. And it’s possible to use identifiers to link all of these things together. And this is what a machine understands today, and when a machine understands this they can do special things with it. And one of the reasons for us bringing ourselves or migrating from MARC to this newer data model is to help machines better understand what we have to offer. So right now if you go to Google and do a search for this book, Predictably Irrational, this is more or less the type of entry you will see in a Google search result. Of course, it has the title, but you can also see the store location for this, you can see the price of this item being offered by that store, and of course it has a description. And Google is able to understand this information and mostly because Barnes & Noble has described these aspects of this particular product that it’s offering under the hood at the data level so that when Google goes to a web page it can figure out from the code that this is the title, this is the price, this is the author, this is the description. You and I, humans, can look at that page and automatically figure that out, we know when you see a dollar sign followed by a number, followed by a decimal point and followed by a couple more digits, that’s a price. Google needs to know that that’s a price or I should say it needs help knowing that that’s a price. And Barnes & Noble has gone out of its way to describe its data in such a way that Google can make sense of it, and one of the things we hope to do with BIBFRAME more generally is to start describing our data at the web level in such a way that they can better understand it, that the search engines can better understand it. So here is a complete mockup of what we might see in the future. Again, the title of the book is there. It has location information, in this case it’s the D.C. Public Library. It might even have a special location identifying that a copy of this book can be found in the Southeast Library and the Northeast Library, the MLK Branch. If we even encode the call number you might be able — Google might be able to extract that call number and display it in the search results. And, finally, of course, we can provide a description. When so many of our users are starting at search engines this could be a massive boon to — for our users, and it’s one of the main reasons we’re going through all this effort.