A post that mixes the magic and delight of the holiday season with multimedia information retrieval? Let's try it and see what happens.
The past couple of weeks holiday cards have been dropping through the mail slot in the front door -- but also emails have been entering my inbox: greetings, photos, and yes, also videos. This morning it was an email with a greeting and a link to a music video "Peaceable Kingdom".
I watched the video for a while and pondered its relationship with Christmas: The music is melodious, soothing and the lyrics take the listener to the manger to make the connection with the adoring state of mind of those who gathered there the first Christmas Eve. Unexpected minor cadences highlight that this is no usual Christmas carol and invite consideration of the multiplicity of the Christmas experience -- how the holiday itself integrates traditions preceding Christianity and how, as each new group and generation reinvents it for their own spirit and needs, it will continue to develop into some future Christmas. From the perspective of the here and now, that future Christmas could seem full of sweetness, hope and light, but also distorted and distinctly pagan.
Of course, the strongest signal I get from the video is that of Margaret Atwood's dystopic visions. I haven't read The Year of the Flood, but what has been written and said about the book has so much fascinated and disturbed me, that the existence of the book as itself as a text seems somehow less important to me -- the setting is already so palpable that what it tells is, in a way, no longer left to be said.
In the end, maybe my personal Christmas feeling associated with the video is that it gives me a chance to spend some time feeling close to the person who sent it to me. The strength of this feeling of connection goes beyond -- indeed exists in a completely different life dimension -- than my reflections on meta-text usurping text or on the length of time that has transpired since I have sat down and read a worthwhile book not related to work.
Where is the multimedia information retrieval tie-in? Well, first, as a result of this video it has occurred to me for the umpteenth time that we need a verb other than "watch" to describe this kind of interaction with this video. It's a music video, so I am mainly listening to it and then looking at the visual stimuli. There could potentially be rather large changes in the visuals -- different pictures, different editing -- and these changes could possibly leave my watching experience largely untouched. I would argue, if I were only "watching", these elements would necessarily have a major defining impact on my experience. They don't. Here, I am rather "watch/listening", which I suppose could give us the new concept of "wistening".
There's a second tie-in as well: There is a little snowflake in the player bar, which I discovered after "wistening" for a while. I usually find snowflake icons ambiguous: especially on climate control units in strange hotel rooms -- do I turn the setting to "snowflake" if it's cold outside or is the "snowflake" setting going to cause the system to start producing cool? I've encountered both. So I've learned just to click on the snowflake and see what happens...
And lo and behold it started snowing. Right into the Peaceable Kingdom -- flakes floating down slowly -- different sorts of flakes at different speeds -- and accumulating at the bottom of the frame. I felt the smile spread on my face -- and grow wider as a realized that I was witnessing one little bit of a sort of world-wide holiday miracle as people in front of screens around the planet discover that you make it snow on YouTube. I thought about people watching this on their laptops and tables, using the mouse to play a bit in the snow and then gathering their friends, colleagues, family around their screens in one big Christmas "You gotta check this out!"
Apparently, you can't do this to every video: and this is where it really starts getting interesting to me. How did YouTube decide which videos to add this feature to? There must have been some multimedia classification algorithm that maybe looked for keywords in the title and description and something like music in the audio channel or colors in the visual channel and combined this with the upload date -- and then enabled "snow" for this video.
I want to make these kinds of algorithms! How do we put everything that we know how to do in terms of multimodal video processing and machine learning and figure out for which videos it needs to be able to snow?
And it's not just snow. There are other ways in which this could go -- and should go -- it has potential to cause so much joy. I am sitting here "wistening" and thinking about friends and family and playing in the snow, but it's clear that we need to go being "wistening" and we need a very for watching+listening+reflecting+playing. It's also clear that we need the technologies that support these activities. Imagine a search engine that can find videos that are appropriate for 'snow': that goes so far beyond user information needs as they are currently conceptualized for multimedia that it sort of takes your breath away.
How to enable the multimedia community to work at these new (from the perspective of this moment, utterly fantastic) frontiers?
The key to doing work in this direction, is to evaluating it. How do we know if we were right in presenting the snow option for a given video? YouTube is probably analyzing its interaction logs at this very moment. But I hate to think that I need to go to work for YouTube in order to ever be able to do the evaluation necessary to write a paper on this topic. Everyone loves the snow, so everyone should be able to work in order to make it better.
Note to self qua New Year's resolution: Keep up commitment to evaluation -- we need it to push ourselves forward into the unknown in a meaningful way. Maybe it's what actually makes the difference between what we call computer science and what we call art. But I'll leave that thought to another day.
In the meantime, the overall conclusion is that holidays and multimedia information retrieval do indeed mix well in a blog post. So happy holidays (ans enjoy the video):
Currently, I'm working to put together the survey for MediaEval 2012. This survey will be used to decide on the tasks that will run in 2012 and also will help to gather information that we will use to further refine the MediaEval model: the set of values and organizational principles that we use to run the benchmark.
At the workshop, someone came up to me and mentioned that he had made use of the model in a different setting, 'I hope you don't mind', he said. Mind? No. Quite to the contrary. We hope that other benchmarking initiatives pick up the MediaEval model and elements of it and put them to use.
I have resolved to be more articulate about exactly what the MediaEval model is. There's no secret sauce or anything -- it's just a set of points that are important to us and that we try to observe as a community.
The MediaEval Model The MediaEval model for benchmarking evaluation is an evolution of the classic model for an information retrieval benchmarking initiative (used by TREC, CLEF, TRECVid). It runs on a yearly cycle and culminates with a workshop where participants gather to discuss their results and plan future work.
The MediaEval attempts to push beyond existing practice, by maximizing the community involvement in the benchmark. Practically, we do this by emphasizing the following points:
Tasks are chosen using a survey, which gathers the opinion of the largest possible number of community members and potential MediaEval participants for the upcoming year.
Tasks follow the same overall schedule, but individual tasks are otherwise very autonomous and are managed by the Task Organizers.
The Tasks Organizers are encouraged to submit runs to their own tasks, but these runs do not count in the official ranking.
The Task Organizers are supported by a group of five core participants who pledge to complete the task "come hell or high water".
Each task has an official quantitative evaluation metric, which is used to rank the algorithms of the participating teams. The task also, however, promotes qualitative measures of algorithm goodness: i.e., the extent to which an algorithm embodies a creative and promising extension of the state of the art. These qualitative measures are recognized informally by awarding a set of prizes.
In interview footage from the MediaEval 2012 workshop, I discuss the challenge of forging consensus within the community.
One of the important parts of consensus building is collecting detailed and high-coverage information at the beginning of the year about what everyone in the community (and also potential members of the community) thinks. And so I am working here, going through not only the tasks proposals, but also other forms of feedback we've gotten from last year (questionnaires, emails) in order to make sure that we get the appropriate questions on the survey.
It always takes so much longer than I predict -- but it's definitely worth the effort.
Tonight I have something like meta-purchase fatigue. My train back from Brussels was canceled and I went to the news stand and bought an International Herald Tribune as a consolation prize: it cost me three Euros. It contained a very interesting article entitled "Disruptions: Privacy Fades in Facebook Era". When I finally came home, I decided to re-read this online and send it to a friend.
But aaargh. The IHT site informs me that I have hit my 20 article limit for the month.
Hey! What's this 20 article limit thing about anyway? I just laid down my 3 Euros -- why can't I see the digital copy of this article.
OK. That's a bad attitude -- that's purchase fatigue, I can overcome that. I care about the content in the IHT -- it's worth something to me so maybe it's time to get an actual subscription. A few fantasies of having a paper delivered to my door in the morning (...and having the time to read it). Really, yes, let's do it. I need to support the news -- creating good news takes money.
Alas, the website is not going to let me do that: My attempts to make an impulse buy of home delivery are met with an error message "Unknown SOA error". There's the meta-purchase fatigue. You try to do the right thing -- spend your money to get something you value -- and somehow that doesn't work either.
The purchase fatigue that faces us in the future will be caused by Facebook. My prediction: In about 10 years, Facebook will start selling us back our historical posts.
Remember those pictures from that college party? Weren't they all gone? Now for a mere $29.99 Facebook will dig them out of its archive and present them to you, labeled with the names of your friends that you have forgotten and festooned with their comments.
Maybe that is the rant of a tired blogger, but otherwise it's also a darn good long term business strategy for Facebook -- if they can somehow fight the "purchase fatigue" that will arise trying to sell people back their own stuff.
At least they should circumvent meta-purchase fatigue and get the subscription service right: when I decide to shell out the cash and sign up for a subscription to get my own past delivered back to me, it would be nice if I didn't an "Unknown SOA error".
And the whole thing distracted me from actually blogging about social multimedia sharing and privacy...
....or about the fact that Google doesn't love me and doesn't return anything useful for the query "purchase fatigue". "Purchase" is a modifier and not part of my search intent in this context, Google.
I know that not interpreting my query as an intent to purchase something is less likely to lead to ad clicks -- but please, really I'm tired of paying for stuff, humor me, really...
I divide my time between Radboud University Nijmegen and Delft University of Technology in the Netherlands. My research focuses on multimedia retrieval techniques that exploit speech and language and focus on human interpretations of meaning. I am particularly interested in internet video, in networked communities, and crowdsourcing techniques. Lately, I've been noticing how difficult it is to imagine life without search.