I got an email this morning from someone close to me, S., whose colleague, C., had sent them a message with a Dutch translation of these directions on how to turn the social advertising off. S. declared happily, "The community is really strong". LinkedIn pulled a now-classic social network move and the community moves to push back against it. If there wasn't a name for it already, we can now conveniently refer to it as a SlippedIn.
SlippedIn or slipped up? The fact that this changed behind my back really makes me angry at LinkedIn: Are they going to lose their community?
Well, no. Because actually in sending this mail C. is engaging, probably without her conscious knowledge, in the ultimate form of social advertising. By alerting us to the problem and letting us know how to fix it, C. is mediating between LinkedIn and the community that uses the LinkedIn platform. She is making it possible for all of us to be really p.o.ed at LinkedIn, but still not leave the LinkedIn network because we have the feeling that our community itself has created the solution that keeps us in control of our personal information.
S.'s attitude "The community is really strong" is natural. Because C. caught this feature being slipped in and let us know how to fight it, we now have the impression that we somehow have the power to band together and resist the erosion of the functionality that we signed up for when we joined LinkedIn. C.'s actions give us the impression that although what LinkedIn did is not ok, that LinkedIn is still an tolerable place to social network because we have friends there and that we are in control and can work it out together.
C. has really be used. She is unwitting broadcasting in her social circle a sense of security that everything will be all right. We completely overlook the point that we have no idea of what goes on beyond the scenes that might go on unnoticed by C. or the other C.-like people in the network. We are given the false impression, that whatever LinkedIn does that we find intolerable, that we will be able to notice it and work together to fix it.
We cannot forget that LinkedIn is a monolithic entity: they write the software, they control the servers. What ever feeling that we have that we can influence what is going on is supported only by our own human nature to simply trust that our friends will take care of us. LinkedIn is exploiting that trust to create a force of advocacy for their platform as they pursue a policy aimed at eroding our individual privacy.
Last week I spent a great deal of time last week writing on a proposal called "XNets". Basically, we're looking for a million Euros to help develop robust and productive networking technology that will help ensure that social networking unfolds to meet its full potential. Our vision is distributed social networking: let users build a social network platform where there is no central entity calling the shots.
However, it's not just the distributed system that we need it is the consciousness. I turned the social advertising functionality off and have for the moment the feeling that it is "fixed". But getting this fixed was not C.'s job. C. is not all-seeing nor can she help her friends protect themselves against all possible future SlippedIns. C. should not be doing damage control for LinkedIn. We the community are strong, but we are not omnipotent. The ultimate responsibility for safe-guarding our personal data lies with LinkedIn itself.
What I termed "Human computational relevance" in my previous blog post is probably more appropriately termed "Human computational semantics". The model in the figure in that post can be extended in a straightforward manner to accommodate "Human computational semantics". The model involves comparing multimedia items (again within a specific functional context and a specific demographic) and assigning them a pair-wise similarity value according to the proportion of human subjects that agree that they are similar.
Fig. 1: The similarity between two multimedia items is measured in terms of the the proportion of human subjects within a real-world functional context and drawn from a well-defined demographic that agree that they are similar. I claim that this is the only notion of semantic similarity that we need.
I hit the ceiling when I hear people describe multimedia items as "obviously related" or "clearly semantically similar". The notion of "obvious" is necessarily defined with respect to a perceiver. If you want to say "obvious", you must necessarily specify the assumption you make about "obvious to whom". Likewise, there is no ultimate notion of "similarity" that is floating around out there for everyone to access. If you want to say "similar", you must specify the assumption that you make about "similar in what context."
If you don't make these specifications, then you are sweeping an implicit assumption you are making right under the rug and it's sure to give you trouble later. It's dangerous to let ourselves lose sight of our unconscious assumptions of who our users are and what the functional context actually is in which we expect our algorithms to operate. Even if it is difficult to come up with a formal definition at least we can remind ourselves how slippery these notions are be. It seems that we naturally as humans like to emphasize universality and our own commonality, and that in most situations it's difficult to really convince people that "obvious to everyone" and "always similar" are not sufficiently formalized characterizations to be useful in multimedia research. However, in the case of multimedia content analysis the risks are too great and I feel obliged to at least try.
A common objection to the proposed model runs as follows: "So then you have a semantic system that consists of pairwise comparisons between elements, what about the global system?" My answer is: The model gives you local, example-based semantics. The global properties emerge from local interactions in the system. We do no require the system to be globally consistent, instead we gather pairwise comparisons until a useful level of consistency emerges.
Our insistence on a global semantics, I maintain, is a throwback to the days that we only had conventional books to store knowledge. Paper books are necessarily linear, necessarily of a restricted length and have no random access function. So, we began abstracting and organizing and ordering to back human understanding of the world into an encyclopedic or dictionary form. It's a fun and rewarding activity to construct compendiums of what we know. However, there is no a priori reason why a semantic system based on a global semantic model must necessarily be chosen for use by a search engine.
Language itself is quite naturally defined as a set of conventions that arise and are maintained via highly local acts of communication within a human population. Under this view, we can ask about Fig. 1, why I didn't draw in connections between the human subjects in order to indicate that the basis of their judgements rests in a common understanding -- a language pact as it were. This understanding is negotiated over years of interaction in a world that it exists beyond the immediate moment at which they are asked to answer the question. Our impression that we need an a prior global semantics arises from the fact that there is no practical way to integrate models language evolution or personal language variation into our system. Again, it's sort of comforting to see that when people think about these issues their first response is to emphasize universality and our human commonality.
It's going to hurt us a little inside to work with systems that represent meaning in a distributed, pairwise fashion. It goes against our feeling, perhaps, that everyone should listen to and understand everything we say. We might not want to think too hard about how our web search engines have actually already been using a form of ad hoc distributed semantics for years.
In closing: The model is there. The wider implications of its existence are that we should direct our efforts to solving the engineering and design problems necessary to be able to efficiently and economically generate estimations of human computational relevance and also of the reliability of these estimates. If we accomplish this task, we are in a position to be able to create better algorithms for our systems. Because we are using crowdsourcing -- computation carried out by individual humans -- we also need to address the ethics question: Can we generate such models without tipping the equilibrium of the crowdsroucing-universe so that it disadvantages (or fail to advantages) already fragile human populations?
This post is dedicated to my colleague David Tax: One of the perks of my job is an office on the floor with the guys from the Pattern Recognition Lab -- and one of the downsides is a low-level, but nagging sense of regret that we don't meet at the coffee machine and talk more often. This post articulates the larger story that I'd like to tell you.
In the field of multimedia, we spend so much time in discussions about semantic annotations (such as tags, or concept labels used for automatic concept detection) and whether they are objective or subjective. Usually the discourse runs along the lines of "Objective metadata is worth our effort, subjective metadata is too personal to either predict or be useful." Somehow the underlying assumption in these discussions is that we all have access to an a priori understanding of the distinction between "subjective" and "objective" and that this distinction is of some specific relevance to our field of research.
My position is that, as engineers building multimedia search engines, if we want to distinguish between subjective and objective we should do so using a model. We should avoid listening to our individual gut feelings on the issue (or wasting time talking about them). Instead, we should adopt a the more modern notion of "human computational relevance" which, since the rise of crowdsourcing, has entered into conceivable reach.
The underlying model is simple: Given a definition of a demographic that can be used to select a set of human subjects and a definition of a functional context in the real world inhabited by those subjects, the level of subjectivity or objectivity of an individual label is defined as the percentage of of human subjects who would say "yes, that label belongs with that multimedia item". The model can be visualized as follows:
Fig. 1: The relevance of a tag to an object is defined as the proportion of human subjects (pictured as circles) within a real-world functional context and drawn from a well-defined demographic that agree on a tag. I claim that this is the only notion of the objective/subjective distinction relevant for our work in developing multimedia search engines.
Under this view of the world, the distinction between subjective and objective reduces to the inter-annotator agreement under controlled conditions. I maintain that the level of inter-annotator agreement will also reflect the usefulness that the tag will have deployed within a multimedia search engine designed for use within the domain defined by the functional context by the people in the demographic. If we want to assimilate personalized multimedia search into this picture we can define it within a functional context for a demographic consisting only of one person.
This model reduces the subjective/objective difference to a estimation of the utility of a particular annotation within the system. The discussions we should be spending our time on are the ones about how to tackle the daunting task of implementing this model so as to generate a reliable estimates of human computational relevance.
As mentioned above, the model is intended to be implemented on a crowdsourcing platform that will produce an estimate of the relevance of each label for each multimedia item. I am as deeply involved as I am with crowdsourcing HIT design because am trying to find a principled manner to constrain worker pools with regard to demographic specifications and with regard to the specifications of a real-world function for multimedia objects. At the same time, we need useful estimators of the extent to which the worker pool deviates from the idealized conditions.
These are daunting tasks and will, without doubt, require well-motivated simplifications of the model. It should be clear that I don't claim that the model makes things suddenly 'easy'. However, it is clearly a more principled manner of moving forward than debate on the subjectivity vs. objectivity difference.
I was just amazed at the people involved in this contest: in their ability to develop their own idea and distinguish themselves, but at the same time support each other and collaborate as a community. It's nice to talk about crowdsourced innovation, but it's breathtaking to experience it in action.
The results are reflected in how far LikeLines has come since when I first posted on it at the beginning of June. Raynor looked at me one day and said, "It's an API"...and we realized that this is not just an intelligent video player it is a whole new paradigm for collecting user feedback that can be applied in an entire range of use cases.
From one day to the next we started talking about time-code specific video popularity, which we quickly shorted to "heatmap metadata".
Whatever happens next, whether Raynor proceeds to the next round, I already have an overpowering sense of having "won" at MoJo. It really solidified my belief in the power of collaborative competition as a source of innovation -- and a force for good.
I am an organizer in the MediaEval benchmark and this is the sort of effect that we aspire to: bringing people together to pull towards a common goal simultaneously as individuals and as a community.
There needs to be a multiplicity of such efforts: they should support and learn from each other. I can only encourage the students in our lab to get out there and get involved, both as participants and as organizers.
One day last week we were in the elevator heading down to lunch and Yue Shi turned to me and said. Do you realize that of the people standing in the elevator, there are five PhD students submitting entries in five different competitions?
True to usual style, my first reaction is, "Hey people, what happened to TRECVID?" We are also make an honest effort to submit to TRECVID this year. I watched that happen...and then not happen.
But then I gave myself permission, there in the elevator to turn off the bookkeeping/managing mechanism mechanism in my head -- and just go with my underlying feeling of what we were doing as a lab. It's the feeling of wow. Everybody doing their own thing, but at the same time being part of this amazing collaborative competitive community.
The elevator doors opened and as we passed through I thought, it seems like the normal daily ride that we're taking, but when you look a bit deeper you can see the world changing and how the people in my lab pool efforts to change it.
I divide my time between Radboud University Nijmegen and Delft University of Technology in the Netherlands. My research focuses on multimedia retrieval techniques that exploit speech and language and focus on human interpretations of meaning. I am particularly interested in internet video, in networked communities, and crowdsourcing techniques. Lately, I've been noticing how difficult it is to imagine life without search.