Thursday, May 17, 2012

Search by misconception: Should search engines support information needs that are ill conceived?

The real Mozartkugeln? (Flickr: davidroethler)
Should our search engines make information findable based on misconceptions? Well, it's complicated. This post gives some examples to highlight the relationship of misconceptions to search.

Let's start with an example. When I refer to "Mozartkugeln" I mean the ones in the pictured here. They are gold and show Mozart in his red jacket. Their shape is round. I differentiate these from the ones with flat bottoms, which are for me "fake" Mozartkugeln.

My idea of Mozartkugeln can be considered a misconception. The original Mozartkugeln are apparently produced by a company called "F├╝rst" and are silver with a blue Mozart. Additionally, the producer of the flat-bottom ones apparently has the right to call their product "Real Reber Mozartkugeln". Digging on Wikipedia and on other websites supplied me with this information.

But what should an image search engine return in response to the query "Mozartkugeln"? Is it obliged to make an effort to resolve the question of which is the "real" Mozartkugel? Or is it fine if it just returns images that users have uploaded a tagged with "Mozartkugel"?

Effectively, simply returning images tagged with "Mozartkugel" allows users to search by misconception. The search engine returns images who have been tagged by people like me, who have a certain view on the matter (based on conversations with an Austrian roommate now a couple decades old and several subsequent trips to Austria, none including Salzburg), which is not necessarily universal. I am not immediately convinced that I can be satisfied with such a search engine as a source of information. Although, it seems reasonable to assume that if enough voices are combined, a consensus will emerge. I noticed that if you search for "champagne" on Google images, the top hits (at least the ones that depict identifiable bottles) clearly hail from the Champagne region in France and don't include the large range of other bubbly wines from other corners of the world that are widely enjoyed under the name "champagne".

In short, allowing search by misconception seems relatively innocuous. But we should be careful about assuming that the Mozartkugeln example is the end of the story. What is unique about this example, is that the search engine is relatively transparent in the way that it works. The images are returned by seeking exact matches in their user-assigned tagsets; without such a match, the image is not relevant. Users of the search engine have a chance of being at least vaguely aware of the reason for the match and they can propagate their understanding of the reliability of the taggers to create an understanding about the reliability of the results.

However, when the search engine becomes more sophisticated, the situation quickly gets quite murky. For example, if I had a visual concept detector that was trained to detect Mozartkugeln in images and assign to them the appropriate tags. The design of the detector would require collecting examples of Mozartkugeln, which means that whoever trains the detector holds the ultimate control over deciding what a Mozartkugel actually is.

The example of Mozartkugeln is interesting. In some cases, one could argue that common sense knowledge will tell you what an object is, for example, a helicopter or a pram. Everyone can identify these objects, right? But in the case of the Mozartkugeln, there is no right answer. It depends on your perspective. A long discussion will arrive at the conclusion "It's complicated". (And you may already find yourself with the same issue for the pram, if not actually the helicopter.)

It seems like a good idea to do away with the central authority that collects the examples used to train the detectors. After all, no one really likes they guy who walks around the party reminding people, "Yes, but it's not real champagne".

But do we really want to admit search by misconception? I had quite an unsettling experience with Google's query suggestion. On 17 May 2011, I was looking for a news story on one of the Facebook founders have renounced his US citizenship. No sooner had I typed in "facebook founder" did Google present me with the following list of suggested queries:

facebook founder mark
facebook founder saverin
facebook founder buys new republic
facebook founder college
facebook founder gay
facebook founder bios
facebook founder dead
facebook founder movie

Did I really need to know about the existence of the circulating rumors? Do I go on to passively "believe" a query or do I dig deeper that find out if it is true? Do we really want our search engines to allow us to so easily flow down the same information paths worn by searchers before us who mis-received a rumor?

In the case of "facebook founder dead" I did dig deeper. That query led to a Fox News article on the death of Ilya Zhitomirskiy, one of the co-founders of Diaspora*, an alternative to Facebook. I was left wondering at how query suggestions have taken on an information dissemination (news broadcast, if you will) role of their own.

From the fun of searching for pictures of bonbons (...and wondering if round vs. flatbottom Mozartkugel relates to a real misconception or rather an alternate interpretation) we hit on a matter of true importance (Diaspora* upends the Facebook model because it is based on the idea that every member of the network should “own” his personal information). Suddenly, it gets extremely serious. In light of this seriousness, it looks like if we really do not want search engines to admit search by misconception at all.

This whole line of though was started while I was at a symposium entitled "Cultural Heritage Gets Social" of the SEALINCMedia project (Socially-enriched access to linked cultural media). Alice Warley (Public Catalogue Foundation, UK) gave a talk entitled "Your Paintings Tagger: Crowd-sourcing, art history and the UK's national oil painting collection" about a website where visitors collaborate to tag painters

Apparently, general public users tend to tag older paintings "formal wear" when what the people pictured in the paintings are wearing is not formal wear at all, but rather daily clothing.
The reason for this misconception is that today's formal wear evolved from what was worn on a daily basis in certain social circles in past eras. Wikipedia is rather silent on the history of formal wear. The "misconceptions" of the taggers are actually a source of information about something that is not widely known, but actually an real historical connection.

So we're back to the Mozartkugeln, considering whether Mozart is dressed for a concert, or is in his everyday work clothing that he uses for composing. It seems like misconceptions help us to uncover new and interesting information.

However, if we incorporate misconceptions, maybe we should call them 'exploration engines'. A 'search' engine should find answers or else gently reveal to us that our initial information need was ill conceived.