Monthly Archives: April 2013

Linked Data for Dummy

From 

“Imagine you’re in a huge building with several storeys, each with an incredible large amount of rooms. Each room has tons of things in it. It’s utterly dark in that building, all you can do is walk down a hallway till you bang into a door or a wall. All the rooms in the buildings are somehow connected but you don’t know how. Now, I tell you that in some rooms there is a treasure hidden and you’ve got one hour to find it.

Here comes the good news: you’re not left to your own resources. You have a jinn, let’s call him Goog, who will help you. Goog is able to take you instantaneously to any room once you tell him a magic word. Let’s imagine the treasure you’re after is a chocolate bar, and you tell Goog: “I want Twox”. Goog tells you now that there are 3579 rooms where there is something with “Twox” in there. So you start with the first room Goog suggests to you, and as a good jinn he of course takes you there immediately; you don’t need to walk there. Now you’re in the room you put everything you can grab into your rucksack and get back outside (remember, you can’t see anything, in there). Once you are outside the building again and can finally see what you’ve gathered you find out that what is in your rucksack is not really what you wanted. So, you have to get back into the building again and try the second room. Again, and again till you eventually find the Twox you want (and you are really hungry now, right?).

Now, imagine the same building but all the rooms and stairs are marked with fluorescent stripes in different colours, for example a hallway that leads you to some food is marked with a green stripe. Furthermore, the things in the rooms have also fluorescent markers in different shapes. For example, Twox chocolate bars are marked with green circles. And there is another jinn now as well- say hello to LinD. You ask LinD the same thing as Goog before: “I want Twox” and LinD asks you: do you mean Twox the chocolate bar or Twox the car? Well, the chocolate bar of course, you say and LinD tells you: I know about 23 rooms that contain Twox chocolate bars, I will get one for you in a moment.

How can LinD do this? Is LinD so much more clever than Goog?

Well, not really. LinD does not understand what a chocolate bar is, pretty much the same as Goog does not know. However, LinD knows how to use the fluorescent stripes and markers in the building, and can thus get you directly what you want.

You see. It’s the same building and the same things in there, but with a bit of a help in forms of markers we can find and gather things much quicker and with less disappointments involved.

In the Linked Data Web we mark the things and hallways in the building, enabling jinns such as LinD to help you to find and use your treasures. As quick and comfortable as possible and no matter where they are.”

Access points and entries

The phrase ‘Access points’ may be a modern term, but the concept behind is not new.

“.. the phrase is indicating the concept of providing structured headings that the catalogue user can predict and, therefore, use to form successful search strategies – search strategies that retrieve the information they are seeking (and ideally only the information they are seeking).”

In other words, “an access point is a specific piece of data that catalogue users can and should expect to provide them with a way into the bibliographic record.”

Cutter thinks a library catalogue should provide these types of access:

– Author-entry with the necessary references
– Title-entry or title-reference
– Subject-entry, cross references and classed subject-table.

An entry, in the pre-computer period, means a card catalogue record providing the bibliographical information for user to find the information resource. It was a time-consuming process if we had to put the full information in each entry, each access point, in the card catalogue. The ‘work-around’ was to provide the full information in the entry, the access point’, the card record the users were likely to think of. That brought up the concepts of “main entry” and “added entries”.

 

FRBR is an end point, not the end point

William Denton said FRBR “is an end point of almost 175 years of thinking about what catalogs are for and how they should work—an end point, not the end point. There is no the end point to how libraries should make their collections available to people. That changes all the time, and lately it’s been changing quickly. That’s one of the reasons we have FRBR.”

In tracing the history of FRBR, he followed these “four ideas through modern Anglo-American library history and see how they lead up to FRBR:
– the use of axioms to explain the purpose of catalogs,
– the importance of user needs,
– the idea of the “work,” and
– standardization and internationalization.

The last three ideas are fairly simple. Library users are important people and wherever they are, whatever they want, serving them is the basis of what we do. “Work” has quotes around it to make it clear that under discussion is the abstract notion of a work, not the FRBR entity. (The idea goes beyond just FRBR—different people have different definitions of what a “work” is, but they’re all generally the same.) As a librarian, you know all about standards and the international sharing of information.”

By axioms explaining the purpose of catalogs, he meant “a core set of simple, fundamental principles that form the basis for complete cataloging codes such as Anglo-American Cataloguing Rules.”

All the four ideas are showing strongly in FRBR.  It is the starting point, not the end point.

[Source: Denton, William. FRBR and the History of Cataloging]

Evolution of Catalogue

In his Rules for the Compilation of the Catalog, Sir Anthony Panizzi of British Museum said a catalogue should not be just functioning as a finding tool, which was a simple list designed to help one find a particular bok in the library and nothing else.  His ideology of cataloguing was based on this perceptions: “that the book sought by a person is really, most frequently, not the object of his/her interest, but the work contained in it is; that that work may be found also in other editions, translations, or versions, published under different names of the author and/or different titles, some or all of which may be of equal of greater interest to that person; and that, consequently, to serve well the user of the library, the catalog must be designed not merely to tell him/her whether or not a particular book is in the library, but also to reveal to him/her at the same time what other editions, translations, or versions of the work, as well as other genetically related works, the library has.” …. (see also this )

Panizzi’s had to defend his points of separation of ‘Work’ and ‘Book’, and his cataloguing ideology, in many hearings before Royal Commission during 1847 to 1849.  Now we have the FRBR report fully reflecting his ideology.  Panizzi argued a perfect catalogue should be a “system” that reflected the related entities of the books (in FRBR term: work, expression, manifestation) and their authors supported by a “cross-reference” mechanism taking care of various forms of titles and author names. “this catalog would then present, not a list of separate books, but a bibliographically integrated picture of the resources of the library — one much more revealing, helpful, and responsive to the actual needs of the libary user than the finding catalog.

[Source: Lubetzky, Semour (1979) Ideology of Bibliographic Cataloging: Progress and Retrogression. In The Nature of Future of the Catalog: Proceedings of the ALA’s Information Science and Automation Divsion’s 1975 and 1977 Institutes on Cataloging,]

Cataloguing standards

There are so many standards in cataloguing for libraries- RDA, AACR, MARC21, Dublin Core, LCSH, LCNA, etc.  Some are rules, derived from principles.  Some are data standards.  Some are encoding standards in systems.  They are to answer one or more of these questions:

How to describe a resource (e.g. book)?
Where to find the information from the resource, and what is the best practice?
What are the necessary information you need to put in the description?
What are the standard terms you can use in describing certain data elements (e.g. subjects, names)?

All these rules and standards are the result of collective wisdom.  The standards facilitate the exchange of information and records.  This makes our work easier.  We do not need to think hard to make many decisions.  That is why sharing of works and decisions are the first value in cataloguing.

With so many standards, it takes years for individuals to become seasoned cataloguers.  Sometimes, we even think whether it is necessary to put so much information in the records.  But, there is always this ‘just in case’ worry in our mind. We have to prepare the data to satisfy the information needs of both internal and external customers.  Many of these data are for our internal customers, for collection development, for management reports.  Can we afford not to be comprehensive enough, not to be thorough enough.  Who can answer this question? and how to draw the line?

We may not have the answers of the questions, but at least we should maintain local standards for consistency.

Bibliographic Entities

Entity is the identity of something. When a user comes to library to find the book of Macbeth, you would not know whether the user is looking for something for night reading, or the user prefers to read a French version, or just want to know more about Shakespeare,… If we can categorise all bibliographic information into different entities and record them accordingly, that would help us organise the information better to serve the needs of our users. In other words, by simply organise all catalogue into Author, Title, and Subject, is not enough.

FRBR provides us a framework to see the bibliographic universe differently.  By recording and describing the different entities from the information resources, we are creating meaningful ‘information’ from information resources.  We are building networks of relationships of entities that can help our users discover what they need and expand the horizon of information seeking journey.

FRBR in one page

1.  The authors of FRBR determined that there are four primary tasks performed by users in the course of using bibliographic records/catalog records — Find, Identify, Select, Obtain.  The specifics of these tasks are beyond a cursory treatment but are reasonably accessible in the original FRBR text, subsequent textbooks,or in-depth training.(Some notes: Some may disagree with those tasks, especially as computers and the internet continue to blur the lines between record and resource and consequently between finding and accessing resources. I will confess, I’m not always entirely clear on the distinctions,even in ‘traditional’ library settings and resources, between the first three.  Personally, I think they work in aggregate, even if the attempt at making these tasks discrete might be considered of questionable success.)

2.  The authors of FRBR selected the Entity-Relationship model as the underpinning of their analysis of records.  The E-R model comes from the computer science field and is used there for examining database structures.  The pertinent components of the E-R model are: a) Entities — the primary objects of concern; b) Relationships — the connections between those entities; and c) Attributes — the characteristics that describe specific exemplars in the entities and relationships.

3. Using the E-R model, the authors of FRBR identified 3 primary groups of Entities: a) Group 1 — the products of intellectual or artistic endeavor; b) Group 2 — the parties responsible for such endeavors; and c) Group 3 — the subject matter of such endeavors.

4. Much of FRBR focuses on the Group 1 entities and the relationships between them.  The authors of FRBR identified four categories within Group 1: Work, Expression, Manifestation, Item.  They further identified the following set of primary relationships: a work is Realized through an expression, which is Embodied in a manifestation,which is Exemplified by an item; reciprocally, the item Exemplifies a Manifestation, which Embodies an expression, which Realizes a work. (Some notes: The progression from Item to Work is from concrete to abstract — the Item is the thing you can hold in your hand; the Manifestation is the object of traditional cataloging and bibliographic catalog records — a publication/edition; the Expression and Work are decidedly in the realm of abstraction and have traditionally been addressed in title and author-title authority records, although Work is a level of abstraction that is almost beyond anything previously conceived.)

5. The authors of FRBR identified key attributes for each of the four categories within Group 1.  They roughly correspond to the typical information recorded in a catalog record.  The specific breakdowns for each Group 1 category can be lengthy, with detailed descriptions, and are best explored within the FRBR text itself, subsequent textbooks,or in-depth training.

6. The authors of FRBR identified two categories within Group 2:Person and Corporate Body.  Within FRBR, Group 2 is explored primarily with respect to the relationships between Group 2 entities and Group 1– creating, realizing, producing, owning.  Further E-R assessment of Group 2, specifically the attributes of Group 2 entities, is being explored in Functional Requirements of Authority Data (FRAD) –formerly FRANAR.(A note: The current cataloging code under development with FRBR principles (RDA) has added a third category to accommodate the practices of the archival community: Family.)

7.  The authors of FRBR identified four categories within Group 3:Concept, Object, Event, Place.  Any of the Group 1 or Group 2 entities may also serve as a subject.  FRBR concerns itself primarily with the relationship — subject — between this group and Group 1.  Further assessment of Group 3 is being explored in Functional Requirements for Subject Authority Records (FRSAR).(A note: It will be interesting to see how the current practices with respect to places as jurisdictions (Group 2) and as subjects (Group 3)play out with respect to FRAD, FRSAR, and RDA.)8.  FRBR addresses other relationships across the Group 1 entity categories — work-to-work; manifestation-to-manifestation;expression-to-work; etc., which in practical terms covers things like aggregates, derivatives, adaptations, reproductions, etc.  Again, the specifics of these relationships are beyond a cursory treatment and are best reviewed in the original FRBR text, subsequent textbooks, or in-depth training.(A note: Reconciling these relationships to current cataloging practice can be complicated, but also affords the best opportunity to explore the application and implications of FRBR, especially when theyare considered across the various ‘traditional’ divisions applied in the library world.)

Hope this helps,

John Myers, Catalog Librarian

Schaffer Library, Union College

Schenectady NY 12308

Statement of International Cataloguing Principles (2)

From the previous post, you will notice ICP regards any cataloguing codes should use the conceptual models of entities, attributes found in FRBR, FRAD, and FRSAD.  This is reflected in Section 3 of the statement.  The statement also accepts the four user tasks of FRBR as the objectives and functions of catalogues (in Section 4).  They are

-Find
-Identify
-Select
-Obtain or Acquire.

However, the statement adds a fifth one, which is

“4.5. to navigate within a catalogue and beyond (that is, through the logical arrangement of bibliographic and authority data and presentation of clear ways to move about, including presentation of relationships among works, expressions, manifestations, items, persons, families, corporate bodies, concepts, objects, events, and places)”

Besides the principles and objectives, the statement also includes guiding rules on search and retrieval capabilities in catalogues.  This is reflected in Section 7:

“7. Foundations for Search Capabilities

7.1. Searching
Access points are the elements of bibliographic and authority records that 1) provide reliable retrieval of bibliographic and authority records and their associated bibliographic resources and 2) limit search results.

7.1.1. Searching Devices
Names, titles, and subjects should be searchable and retrievable by means of any device available in the given library catalogue or bibliographic file (by full forms of names, by key words, by phrases, by truncation, by identifiers, etc.).

7.1.2. Essential Access Points
Essential access points are those based on the main attributes and relationships of each entity in the bibliographic or authority record.

7.1.2.1. Essential access points in bibliographic records include:
authorized access point for the name of the creator or first named creator of the work when more than one is named
authorized access point for the work/expression (this may include the authorized access point for the creator)
title proper or supplied title for the manifestation
year(s) of publication or issuance of the manifestation
controlled subject terms and/or classification numbers for the work
standard numbers, identifiers, and ‘key titles’ for the described entity.

7.1.2.2. Essential access points in authority records include:
authorized name or title of the entity
identifiers for the entity
variant names and variant forms of name or title for the entity.

7.1.3. Additional Access Points
Attributes from other areas of the bibliographic description or the authority record may serve as optional access points or as filtering or limiting devices for a search.

7.1.3.1. Such attributes in bibliographic records include, but are not limited to:
names of creators beyond the first
names of persons, families, or corporate bodies in roles other than creator (e.g., performers)
variant titles (e.g., parallel titles, caption titles)
authorized access point for the series
bibliographic record identifiers
language of the expression embodied in the manifestation place of publication
content type
carrier type.

7.1.3.2. Such attributes in authority records include, but are not limited to:
names or titles of related entities
authority record identifiers.
7.2. Retrieval
When searching retrieves several records with the same access point, records should be displayed in some logical order convenient to the catalogue user, preferably according to a standard relevant to the language and script of the access
point.”

This section 7 should be used a basic requirement of the online catalogues.

Statement of International Cataloguing Principles (ICP).

The Statement of International Cataloguing Principles (ICP) was published in February 2009 by IFLA.  ICP is the replacement of Paris Principle (also known as The Statement of Principles)>  It is an effort to make the principles more relevant to on online environment. It includes not only principles and objectives of the catalogue, and the guiding rules of cataloguing codes.

This statement covers:
1. Scope
2. General Principles
3. Entities, Attributes, and Relationships
4. Objectives and Functions of the Catalogue
5. Bibliographic Description
6. Access Points
7. Foundations for Search Capabilities

Here are some excerpts:

2. General Principles
Several principles direct the construction of cataloguing codes.
The highest is the convenience of the user.
2.1. Convenience of the user. Decisions taken in the making of descriptions and controlled forms of names for access should be made with the user in mind.
2.2. Common usage. Vocabulary used in descriptions and access should be in accord with that of the majority of users.
2.3. Representation. Descriptions and controlled forms of names should be based on the way an entity describes itself.
2.4. Accuracy. The entity described should be faithfully portrayed.
2.5. Sufficiency and necessity. Only those data elements in descriptions and controlled forms of names for access that are required to fulfil user tasks and are essential to uniquely identify an entity should be included.
2.6. Significance. Data elements should be bibliographically significant.
2.7. Economy. When alternative ways exist to achieve a goal, preference should be given to the way that best furthers overall economy (i.e., the least cost or the simplest approach).
2.8. Consistency and standardization. Descriptions and construction of access points should be standardized as far as possible. This enables greater consistency, which in turn increases the ability to share bibliographic and authority data.
2.9. Integration. The descriptions for all types of materials and controlled forms of names of all types of entities should be based on a common set of rules, insofar as it is relevant. The rules in a cataloguing code should be defensible and not arbitrary. It is recognized that these principles may contradict each other in specific situations and a defensible, practical solution should be taken.

3. Entities, Attributes, and Relationships
A cataloguing code should take into account the entities, attributes, and relationships as defined in conceptual models of the bibliographic universe.
3.1. Entities
The following entities may be represented by bibliographic and authority data:
Work
Expression
Manifestation
Item
Person
Family
Corporate Body
Concept
Object
Event
Place.
3.2. Attributes
The attributes that identify each entity should be used as data elements.
3.3. Relationships
Bibliographically significant relationships among the entities should be identified.

 

Lubetzky’s great ideas

Lubetzky is regarded as the greatest cataloguers of the twentieth century.  When he joined Library of Congress, he noticed there were a lot of backlog of cataloguing.  He determined to simplify the cataloguing rules.  His ‘back to basic’ approach was to base the rules back to the objectives and principles of cataloguing, so that cataloguers in future generation would know how to deal with new formats and new information ecosystem.

In his “Cataloging Rules and Principles, he said there were two objectives of cataloguing.  “The first objective of cataloguing is to enable the user of the catalog whether or not the library has the book he want….. The second objective is to reveal the the user of the user to the catalog, under one form of the author’s name, what works the library has by a given author and what editions or translations of a given work” [1]

There are two significant new thinkings in these two objectives:
1. A catalogue is not only for helping the users to find what they are looking for.  We have to provide the choices, alternatives to users.
2. The editions or translation of a given work marks the distinction between work and expression.
Both ground breaking thinkings laid the foundation of the concepts we are seeing in FRBR.

Reference:
[1] Denton, William (2007) “FRBR and the History of Cataloging”.  Chapter 4 of “Understanding FRBR: What It Is and How It will Affect Our Retrieval Tools ” edited by Arlene G. Taylor.