Monday, October 31, 2011

Visual concepts and Wittgenstein's language games (Halloween II)

Wittgenstein conceives of human language as an activity consisting of language games, that are related, but different. One of these games is the game that we play when we read picture books to kids. We point at images and name them. The kids are then supposed to gradually acquire this pointing and naming behavior. We generally happily consider the children to be acquiring human language during these sessions. However, if we apply our Wittgenstein, what we are doing is teaching kids how to play the "naming game". We notice this because two minutes later the young child is furiously indicating that it doesn't want to do something, whereby the concept "no" is being actively used. The concept of "no" or "no, I don't want" (we recognize while delicately shoving small, flailing hands into sweater arms) is not depictable as a nameable entity in a picture book. We're still using language of some sort, but we've switched to another, possibly more important game.

As multimedia retrieval researchers we generally fall into the same trap when developing multimedia retrieval indexing systems. We get the systems to annotate depictable visual concepts and some how forget that this is only one "language game" in the whole gamut of different games that humans use when they use language. The point is an important one. Visual content based retrieval systems are in their infancy. We, as, well, a species, are currently negotiating a system of conventions, of game moves as it were, that determine how we interact with these systems.

The danger is: if we start out by making very narrow assumptions about what people could possibly be looking for when they look for images and video the conventions of interacting with video search engines will become calcified into a very simplistic game. We'll be stuck in the picture book phase of multimedia retrieval childhood forever.

Actually, this Halloween I encountered a picture book that suggests that even picture books are trying to pop out of the "naming game". This one has a page with a picture of kids making jack-o-lanterns and an orange box asking the questions: "How many organize pumpkins can you count?" and "How many are jack-o-lanterns?"

Well, ahem. When does something stop being a pumpkin and become a jack-o-lantern? When you cut of the top? When you've fully emptied the inside? When you cut the first eye or when you have popped out the final piece around the teeth to complete the grin?

How about those jack-o-lanterns that have been drawn on the chalk board? Are those jack-o-lanterns or are they pictures of jack-o-lanterns? And maybe actually a jack-o-lantern still count as a pumpkin if it was made from a pumpkin in the first place?

In short, it is impossible to give a unique answer to the questions that this book is asking. We can either think that the people at Fischer-Price are corrupting our youth, or we can realize: kids don't need to have books that depict things that are uniquely identifiable. There is simply a huge ambiguity as to what exactly is a pumpkin and what is a jack-o-lantern. We can extend the 'naming-game' with this ambiguity and it is still truly a part of our human language. We don't need to (and generally do not) resolve ambiguity in order to use language effectively. The page of this books is not some sort of obscure philosophical exception: this is a situation that is frequent and highly characteristic of the situations we deal with on a daily basis.

Fischer-Price apparently now thinks that kids' books should not longer protect them against ambiguity in language. We shouldn't "baby" our multimedia systems either: Rather we should let them play as large and complex a language game as they can possibly handle: as large as technically possible and as users find helpful and interesting.

The next post makes another related point about this picture book...