Author Archives: nlsxy

Application of Linked Data in BBC

BBC is one of the pioneers who uses the Semantic Web technology to enhance the user experience in their website. This video shows how the BBC’s /programmes and /music sites make use of rich RDF data, as well as giving a demo of a simple mashup of user reviews using SPARQL and the Talis Platform.

W3C: Semantic Web

We have been hearing the Semantic Web is going to be the next phase of the World Wide Web, the W3.0. What is Semantic Web? The standard answer is that it is the status of ‘Web of data’

W3C (The World Wide Web Consortium) is an organisation leading the full potential of World Wide Web by developing protocols and guidelines that ensure the long-term growth the web. In its webpage of Semantic Web, it says:

“The term “Semantic Web” refers to W3C vision of the Web of linked data. Semantic Web technologies enable people to:
– create data stores on the Web,
– build vocabularies,
– and write rules for handling data.”

It will be easier to understand if we look at a traditional library collection. We build our collection – that is the data stores. We use standard vocabularies to describe the items in our collections. We follow the established metadata standards (AACR, RDA, MARC21, Dublin Core). These are the agreed rules of handling data.

In order to build a web of data, there are four areas to look at:

“1. Linked Data: The Semantic Web is a Web of data — of dates and titles and part numbers and chemical properties and any other data one might conceive of. RDF provides the foundation for publishing and linking your data. Various technologies allow you to embed data in documents (RDFa, GRDDL) or expose what you have in SQL databases, or make it available as RDF files.

2. Vocabularies: At times it may be important or valuable to organize data. Using OWL (to build vocabularies, or “ontologies”) and SKOS (for designing knowledge organization systems) it is possible to enrich data with additional meaning, which allows more people (and more machines) to do more with the data.

3. Query: Query languages go hand-in-hand with databases. If the Semantic Web is viewed as a global database, then it is easy to understand why one would need a query language for that data. SPARQL is the query language for the Semantic Web.

4. Inference: Near the top of the Semantic Web stack one finds inference — reasoning over data through rules. W3C work on rules, primarily through RIF and OWL, is focused on translating between rule languages and exchanging rules among different systems.”


WorldCat Identities

Searches in standard OPACs is always about the resources or document. WorldCat Identities is an alternative search engine that provides you the details about the identities in WorldCat database. The identities includes person and corporate body, real or fictitious. When you search for a name, the result data pull from the WorldCat database and other resource in one page. The summary page will list out the overview, publications timeline, works about the person, works by the person, related identities, subjects associated to the person, various form of names in different language.

The WorldCat Identities is also incorporated in the (the OPAC of WorldCat). You can get it from the Details Section in a detail record display. There is a pull-down in the ‘Find more information about’ box under the ‘All creators/contributors’ field.

There is always a creative way to present the data from our catalogue and associated resources. WorldCat Identities service is one good example.


WorldCat Identities search page
Example: Search for ‘Mark Twain’

Keyword Search vs Subject Search by LCSH

In respond to the thinking of “Google makes books easily accessible”, Dr. Thomas Mann, in his paper “What is Distinctive about the Library of Congress in Both its Collections and its Means of Access to Them…” (2009) compared the keyword search and subject search based on LCSH (p.11-14):

“Tens of thousands of examples are possible here; for the present we will have to let one suffice: the subject cataloging access to books on “Afghanistan” that is infinitely more efficient in providing an overview of the whole scope of our relevant collections than could be provided by either Google or Amazon search mechanisms. And I need not emphasize how important it is to our national interest that Congress, and scholars generally, have access to as much knowledge on this subject as we can possibly provide.

A researcher using LC’s online catalog can easily call up a browse-display such as thefollowing:

Afghanistan—Constitutional history
Afghanistan—Defenses—History—20th Century—Sources
Afghanistan—Description and travel
Afghanistan—Economic conditions
Afghanistan—Economic Policy
Afghanistan—Emigration and immigration

……. <snipped>

Such “road map” arrays in our OPAC enable scholars who are entering a new subject area to recognize what they cannot specify in advance. They enable scholars to see “the shape of the elephant” of the book literature on their topic early in their research.

Neither Google nor Amazon makes such systematic overviews of subjects accessible at all, let alone “easily accessible.”

Subject cataloging in our OPAC accomplishes the goal of extending the scope of scholars’ inquiries by showing them more of the full range of what is available than they know how to ask for before they are exposed to it. LCSH cataloging enables them both to recognize a much broader range of topical options within their subjects that would not occur to them otherwise; and it also enables them to pick those aspects of interest in a way that separates them from other aspects that would only be in the way, as clutter, without this roster of conceptual distinctions to choose from.

…… <snipped>

… LC now has a new way (the Subject Keyword option in our OPAC’s Basic Search menu) to bring up, systematically, a browse-menu of all other headings in which Afghanistan is itself a subdivision of another topic, for example:

Abandoned children—Afghanistan
Administrative law—Afghanistan


Buddhist antiquities—Afghanistan
Cabinet officers—Afghanistan—Biography


Muslim women—Afghanistan
Muslim women—Education—Afghanistan—Bibliography
Rural women—Afghanistan—Social conditions
Sex discrimination against women—Afghanistan
Single women—Legal status, laws, etc.–Afghanistan
Women—Afghanistan—Social conditions

…. <snipped>”

LCSH: Pre- vs. Post-Coordination

In 2007, Library of Congress released a report: Library of Congress Subject Headings: Pre- vs. Post-Coordination and Related Issues.

The report lists the pros and cons of the two different approaches. For those who are interested to know the comparison, the report is a ‘must read’.

In summary, the LC concluded that though the pre-coordinated system is still the best way to build a subject heading system for such a wide scale. The pre-coordinated system provides a best view of the context of subject matters of the work being described. To quoted an example cited in the Appendix II of the report:

‘The work “Evitas Geheimnis” has the following subject headings:

600 10 Peron, Eva, ‡d 1919-1952 ‡x Travel ‡z Switzerland.
600 10 Peron, Juan Domingo, ‡d 1895-1974.
651 _0 Argentina ‡x Ethnic relations.
650 _0 Voyages and travels.
651 _0 Argentina ‡x Foreign relations ‡z Switzerland.
651 _0 Switzerland ‡x Foreign relations ‡z Argentina.
650 _0 Immigrants ‡z Argentina ‡x History ‡y 20th century.
650 _0 Bank accounts ‡z Switzerland.
650 _0 Antisemitism ‡z Europe ‡x History.
651 _0 Argentina ‡x Emigration and immigration.
651 _0 Germany ‡x Emigration and immigration.
650 _0 War criminals ‡z Germany ‡x History ‡y 20th century.
650 _0 War criminals ‡z Argentina ‡x History ‡y 20th century.
650 _0 Nazis ‡z Argentina ‡x History.

After removing the duplicate headings, there are 20 separate concepts represented.

20th century
Bank accounts
Emigration and immigration
Ethnic relations
Foreign relations
Peron, Eva
Peron, Juan Domingo
Voyages and travels
War criminals

This means there are 2,432,902,008,176,640,000 possible combinations of these terms. There are quite a few chances to get some false drops in this set of terms.’

Authority Control

Authority control here means the works involved in creation of unique terms or identifiers to be used as access points in bibliographic records. The access point serves two functions: the finding function and the gathering function. Charles Cutter says a catalogue should 

1) enable user to find a book when either one of these is known to the user: the author, the title, or the subject.
2) show what the library has by a given author, or on a given subject, or in a give kind of literature.

Authority records contain data like: a unique control number, the preferred terms, the non-preferred terms, the variant form of the terms, the broader/narrower/related terms, scope notes, and supporting reasons of creation of the terms.

Library OPAC provides search by author, subject and title, etc.; in addition to keyword search. In FRBR’s terminology, it provides search by types of entities (Work, Expression, Person, Corporate Body, Family, Concept, Events, Place, Object). Nowadays, OPAC even provides faceted search by these entities and few key parameters derived from fixed fields and other fields of marc.

The success in providing the FINDING and GATHERING function in cataloguing relies on two things. First is the authority control, which is the work of building of authority files that houses the authority records. Second is the bibliographic control, which includes the process of assigning of appropriate terms (access points) from authority file in the bibliographic records (or the metadata). Take the example of the previous post of ‘Haze’ collection, the success of the catalogue requires a some authority control work to re-organise the index terms.

The Library of Congress Name and Subject Authority files are the most widely used controlled vocabularies in libraries worldwide, although they are known to have bias. The two files represent many many great librarians’ contribution to the library communities servicing their customers in more than 100 years. LC Subject Headings sometimes are difficult to use and there are calls to simplify it. FAST is a project trying the tackle this problem.

Playing with the Index Terms

Index Terms are the terms that we use to describe the subject matters of the document in hand. These terms are the index keys assigned to the document. The keys are to facilitate the search and retrieval of the documents. Using the terms that are listed in the previous post, we can come out several interesting points:

1. By looking at the list of index terms shown in the surrogate records (catalogue records), we can have a better understanding of the subject matters of the document being described. For example, if we see the terms ‘Haze’ and ‘Health advices’ appear in the record, we know that document is about some health tips when haze comes, and not economic impact of haze. That is something free text search cannot offer.

2. If our imaginative collection grows, we may need to control the words and phrases used in the index terms. We may need to create notes to indicate the scope of use of the terms. We may start to build a thesaurus.

3. We can try to create a map for all these terms. Draw a big circle at the centre and put the word ‘Haze’ in the circle. This becomes the central topic. Draw bubbles to contain each index term and draw lines to connect them together. You will start to see some branches linking the central topic and the subtopics. We should see some hierarchical relationships between the terms. In some cases, we may need to create new terms to connect the nodes and branches. Instead of putting the central topic at the centre, we can put it at the top of an organisation chart. This is a good exercise. You will benefit from it.

Index Terms for our Haze Attack 2013 collection

In response to the previous post, here are the index terms proposed for our imaginative collection.

fire; haze; 2013
June 2013
Singapore; Malaysia; PSI;
Public Health; Human-made disaster; International cooperation
Air quality; Air pollution; smoke; haze;  smoke haze; burning; farming; land use; monsoon session
Air cleaners; health; health advice; medicine; outdoor activities; loss; fire-fighting; water; cloud seeding;  rain; dry; schools; children; air-condition; food; agriculture; tourism; tourists; complaints; compensation;
NEA; Air quality; forest; forest fire; risks; health risks; face masks; Pollution Standards Index; N95 masks; pollutants; measurements; herbal tea; herbs; health impacts; pharmacies; elderly; pregnant; children; slash-and;burn; measurements; Air purifiers; Global Warming; Sumartra; Smog; Weather; Respiratory; Rain; Dry session.

There should be different aspects of information about Haze.  Use your imagination.  Once you find the angle, you should be able to think out more different sets of index term.

Please continue to submit your terms in this form:

Return TopTrackbackPrintPermalink

Currently rated 0.0 by 0 people

Haze attack

Singapore & Malaysia – Haze Attack 20 June 2013

Published on Jun 19, 2013


Amazingly, a related article was created in Wikipedia: 2013 Southeast Asian Haze 2013. The page was created at 6:34 on 19 June 2013.

If you were to compile a collection of information resources in this current event, what index terms you think you will assign to it.

ALCTS eForum on eBook Cataloguing

The June 2013 issue of ALCTS Newsletter also includes the summary of discussion about applying the Provider-Neutral Guidelines in cataloguing of eBooks. The title of the eForum is:

MARC Records in the Age of AACR2, Provider-Neutral Guidelines, and now RDA

Moderated by Amy Bailey, ProQuest, and Becky Culbertson, California Digital Library

“This e-forum, held April 23–24, 2013, focused on creating MARC records for ebooks using standards such as AACR2, Provider-Neutral Guidelines, and RDA, as well as issues related to managing those records. The intention was to engage library catalogers, consortiums, authors of standards, and vendors in a productive dialog in order to understand how each approaches the characteristics of ebooks and resolves cataloging issues they may generate.

The forum began with a question about the use of a single record (following the provider-neutral guidelines) or separate records when cataloging ebooks. The question asked if it is difficult to determine when the provider-neutral guidelines should be applied. One respondent stated that she leaves the original paging and illustration information from the print when deriving from a print record. Sometimes the front matter is missing from the ebook but she does not usually verify this information and accepts what was in the print record. Nonrelevant ISBNs left in the record can cause problems with overlaying.

A participant introduced a question about how others are capturing provider/platform information in their local ebook records if they chose to do so. At first the discussion centered around where and what fields to use for this purpose—a stunning array! This included the following suggestions: 856 $3; 856 $z; combination of 793 field and 856 $3; 710 field; 740 field; 773 field; 830 field; combination of 590 and 856 $3; 9XX. The person who mentioned using the 9XX fields felt that since the package names would be only useful for internal record management purposes; that having the 9XX fields be only available on the staff side of the system would be fine. This statement then led to a query by someone else about whether package information was indeed useful to our users. One cataloger felt that while it might not be useful directly to some patrons, that it would indeed be useful to the public service librarians. They have greater knowledge of the “warts” of some providers and would like to be able to quickly steer patrons in other directions.

A new thread addressed the many duplicate records for ebook titles in OCLC and what their merging algorithm was. The questions asked how a record is selected from among so many, if anyone reports duplicates to OCLC, and what issues may arise when records are merged. A problem with multiple records and batch loading was noted as well as ebook and print records that merge because of incorrect use of ISBNs. While separate records for each vendor would help maintain information unique to each provider (format, pagination, multiple versions, links), it makes batch loading difficult.

A participant pointed out that the Provider-Neutral Guidelines are incompatible with RDA and asked if the principles of RDA should trump the problem of duplicate records. A reply noted that P-N was also in contradiction to AACR2 as well. RDA is problematic for electronic reproductions and also microform reproductions because it emphasizes the reproduction information over the original publication information which is likely more important to patrons/researchers. A P-N approach to microforms may be discussed at a PCC meeting in May. Print-on-Demand is another area that might present issues with provider information in the record. (Update: At the PCC Operations Meeting at the Library of Congress on May 3 and 4, the intent was to set up a Task Group to document Best Practices for describing all kinds of reproductions under RDA.)

RDA offers some advantages over AACR2 but the FRBR model does not work well with some current systems. One response noted that converting AACR2 records to RDA would be expensive, so records derived from another will retain the standard used in the original. New access points would be created following RDA. Some aspects of RDA are seen as carryovers from AACR2 and are not FRBR compatible. Relationship designators in RDA records have been inconsistent, in that they may not be there are all, may be in code ($4) or terms ($e). Participants felt that using terms is clearer for users, although codes can be displayed in any term if the system is set up that way. With RDA, the use of $e seems to be preferred among many catalogers. Linking relationships such as “Reproduction of (manifestation)”, while supporting FRBR principles, do not work in most current systems. These relationships are valuable but textual displays are more useful to patrons.

Day 2 began with a posting that addressed call numbers and genre terms. Many participants said they do not use call numbers for ebooks, and remove them from ebook and e-audiobook records when importing or deriving records. It was suggested the call numbers could be moved to a 099 field instead of deleting them. Several participants noted that the classification number is useful for collection development and statistical purposes, so they are retained but suppressed from the public view. Those who display the call numbers often do so because of their virtual call number browse—this way the print and ebook versions are together. Several respondents said they append “eb” or “EBOOK” or some other ebook designation to the end of the call number, so that patrons won’t expect to find the item on the shelf. Some also distinguish between ebooks read remotely and those that are downloadable in a call number that the public sees. One response said that including the vendor name in the call number is useful to find items from a particular vendor if you want to remove or make changes to those records, to manage duplicate ebooks from different packages. A vendor noted that her clients have a wider range of preferences for ebook call numbers than for print records.

While some catalogers say they leave 655s for genre terms in records they import, many libraries now delete these genre/form terms because they feel they are no longer useful. After all, catalogers never have supplied “print books” as a genre term, so why should we do this for ebooks? Streaming video or Internet video might still be a useful term to include. It was suggested that the 072 could be used instead (e.g. 072 _7 ART $x 057000 $2 bisacsh). One participant noted that in OCLC-merged records there could be multiple 655s if there isn’t an exact character string match. Some catalogers add the term when creating an original record but do not add it when copy cataloging. The term could be added automatically by a program. Possibly, the term could be put in a 590 for staff use. A few catalogers mentioned they have an ebook search template in the OPAC or a discovery layer that can be used to find ebooks. One participant pointed out that the 655 rules have changed a lot recently.

A question about ISBNs asked where various ISBNs were recorded on ebook records. Often libraries take great pains to make sure that the ISBNs for the print version are labeled as $z on ebook records and ISBNs for e- are in $z on print records. It is useful to have the other ISBN to prevent the ordering of the other format if the print is already owned (or vice-versa). Not that the other version wouldn’t be purchased, but it is a good practice to flag staff if the title is already owned in a different format. One participant mentioned an excellent PowerPoint by Brian Green, the former Executive Director of the International ISBN Agency. This turned out to be a most sought after item by the e-forum participants!

Although normally demand-driven acquisition (DDA) is thought of as an acquisitions-based activity, the question was posited as to whether there were any procedures or issues related to DDA that were relevant for catalogers. One cataloger said that all his institution did was change the public note from “Read this MyLibrary ebook” to “Read this electronic book” once the book is officially purchased. He felt that the whole process was simple and required little manipulation on their part.

Regarding 856s, libraries generally remove any URLs from the record that are not relevant for them; often MarcEdit is the method of choice to remove them. Practices differ in other subfields in the 856 field. It would appear that the understanding and use of the subfield $3 varies from cataloger to cataloger, but most use this subfield to indicate vendor names. One cataloger said that they prepend their proxy information to the URL string, except in the case of open access journals. Some libraries ignore the $z note field; others use it to indicate “VIEW EBOOK” or “VIEW VIDEO.””