What does the search engines do?

On the Web, of the Web
LITA 2011 Keynote, October 1, 2011
by Karen Coyle

“Why does Google (Yahoo, Bing) use keyword searching? Because it’s easy. It is mechanical. It is a match between a string in a query and a string in a database (even with all of its enhancements, that’s the bottom line). It requires no knowledge of the topic, no human intervention, no experts. Keyword searching is NOT knowledge organization. With keyword searching there are no relationships between things. You can’t go broader or narrower; you can’t get “things like thi,”; it doesn’t even have facets. I said before that users are accustomed to the single search box, and many see it as representing freedom – wide open, anything goes. It’s not a freedom, it’s a constraint. It basically constrains the user to try to guess what words will bring up the information you are seeking – which is a bit unfair since the assumption is that there is something the user doesn’t know which is why she is doing a search. The user has to translate what might be a complex information need to a couple of words. And as Elaine Svenonius notes:

“At the same time, it is known that users in their attempts to search by subject
sometimes find themselves at a loss for words.” (Svenonius, The Intellectual Foundation of Information Organization, p. 135)

What works for keyword searching?

  • nouns, especially proper nouns
    – places
    – organizations
  •  named things
    – programming languages (Python, Ruby) (Note that you don’t retrieve much about snakes or gems with these searches, showing a particular bias in the content of the Web itself)
    – titles of books or essays (Moby Dick)

What doesn’t work?

    • searching for concepts
    • searching for things with common terms in their names (library, catalog) (Often when I’m searching for topics relating to libraries I find myself in github.)
    • you can’t ask a specific question: When did Melville write Moby Dick? You can only put in those terms and hope that a retrieved web page contains the answer. (Wolfram Alpha is trying to address this problem)

Google has all of the knowledge basis of a phone book. You name it, you retrieve it.

Did you ever wonder why so many searches turn up Wikipedia in the first few hits? Wikipedia is ORGANIZED INFORMATION. To me it is the proof that organized information is needed, works, and helps people find and learn. Wikipedia does have pages for concepts, it does have links between related subjects, it IS organized knowledge. How well does keyword searching work? Some analogies:

    • it’s like dumpster diving for information; you dig through a lot of garbage but you might find a clean, wrapped sandwich
    • it’s like dynamite fishing; you through dynamite into a lake and see what gets thrown up in the air.
    • it’s like your grandmother’s button box; you need a button and you can spend ages digging through trying to find one that matches on size and color. Or you can go to the store where they have the buttons in order by size and color, and pay a couple of bucks.

We tend to ignore the false hits and zoom in on the successes. But the main thing is that this imprecise retrieval puts a huge burden on the user, who has to essentially game the system to get retrievals and then has to dig through what comes back to sort wheat from chaff. In his book Everything is Miscellaneous David Weinberg talks about tagging, and says that a search on Flickr for “San Francisco” will bring up photos of a number of different places named San Francisco, but what does that matter? I think it matters, and it matters especially for the least experienced users who find such things confusing. Everything might be miscellaneous but it is also time consuming and annoying.”