Saturday, March 10, 2012

Querying the Collective: Why search engines should (responsibly!) support analytics

My thoughts today turned towards the social responsibility of search engines. As search improves, it seems that the question is not moving towards being resolved, but rather is becoming more important. Simply put, as the helpfulness that search engines have in our lives increases, so does their power over us. In this post, I'd like to attempt to unpack that thought a bit and build a case for the importance of social responsibility of search engines.

Let's consider the question of the identity of the entity with which knowledge resides. In other words, let's have a look at, "who knows what" in some basic search scenarios. User information needs, I claim, differ along the who-knows-what dimension:
  • Known-item search: "I know what I want and my information need will be fulfilled when I can get my hands on this item." (Knowledge effectively already with me.)
  • Ad hoc search: "I don't know exactly what I want. But I know that there are other people who know. My information need will be fulfilled when I can get my hands on information sources created by people who do know." (Knowledge with other people, soon to be with me.)

As long as we stick to these two variants, the world is relatively well behaved. Social responsibility arguably remains with the searcher and with the individuals making the information sources available.

However, with the next step in this typology, things definitely become different.

  • Analytics search: "I don't know exactly what I want. In fact, I know that there is no individual person who knows. My information need will be fulfilled if I can get my hands on information that is created using analytics over a larger number of information sources." (Knowledge with no one, not yet.)

We do analytics search all the time, even without realizing it. For me, it's often for small things while writing research papers, for example, for finding the more common usage "crowd-sourcing" vs. "crowdsourcing" or to see if "internet" has finally overtaken "Internet". I actually just this moment carried out a search for "analytics", which is being red-underlined as a spelling mistake as I write. In response to my query, Google tells me "About 117,000,000 results (0.24 seconds)". I decide that there are a lot of other people using this word -- many in the same way that I am using it now, so I ignore the spellchecker and move on.

The point is that this 117,000,000 is information that was derived on the spot by analyzing and aggregating a huge number of data sources. As a result, the responsibility for this information has shifted and now lies elsewhere. It is not so clear that it lies only with me, the person asking the question, or with the information sources that are being aggregated. Rather, the responsible for creating this information lies with the algorithm that made the calculation. If we think that a non-human entity such as an algorithm cannot be responsible, then the conclusion must be that the responsibility lies with the people who created and control the algorithm that made the calculation, i.e., the minds and masters behind the search engine.

Of course, many times that responsibility is not a particularly heavy weight and the answers to analytics search queries can often be wildly off and still not be harmful. Give or take a million, I still will see that there are a lot of people using the word "analytics" and that answers my question.

However, I would argue that there are enough cases in which the results of analytics search queries have a large enough impact that they should force us to think carefully about the social responsibility that is borne directly by our search algorithms, their creators and the providers of our search services.

Recently, I spent some time in a house by a lake in the forest. The local news reported an incident in which a hunter reported a man in the forest, carrying a gun, but not wearing the blaze-orange of a hunter. The man had fired at him, and the hunter returned fire. Basically, I made my decision about when to go out of the house after hearing that news report by using a search engine to monitor real-time media (Twitter and local news). My queries were analytics queries because they relied on the entire collection of available information being scanned. My conclusion about the situation relied on a relatively subtle difference between no one mentioning it and it being mentioned by a handful of people (the forest being rather sparsely populated). I continued periodic query sessions, and the story died out relatively quickly. I walked out of the house with confidence that the incident was a fluke and not a rampage and that no one was going to take a pot shot at me.

One could argue that I was irresponsible for potentially putting my safety in the hands of a search engine. But one could also argue that I was irresponsible for being there in the hunting season. There are also those that would claim that the place is a bit weird anyhow, and should be completely avoided. The problem is, that place is where I'm from. I'm probably not about to stop going back and also I'm probably not about to stop looking for information by carrying out analytics search with a search engine.

It seems inevitable: People use analytics search to form opinions and make assessments that influence their behavior and lead them to make important decisions.

We have little choice but to admit that we would like search engines to offer us as users the possibility to satisfy our information needs using analytic search. It gives us a lens to view the world around us. It takes us a step in the direction pointed to recently by Doug Oard, who was quoted on Twitter as wanting an information retrieval system that is an exoskeleton for the mind. Personally, (and to the bemusement of my colleagues) I tend to talk about search, especially in the context of social networks, as providing us with a prosthesis. In the end, all the metaphors boil down to analytics search being just plain important to us and to what we want to do in our lives.

We are left with the conclusion: Search engines should support analytics search, but they should take careful regard of social responsibility.

Why am I thinking of analytics search today? Probably because I've come up against another problem where it is useful. This problem involves no guns, so it's not particularly life threatening -- at the most it threatens the productivity of our lab.

At the beginning of the year, there was a high-level decision to restrict access to our building on the weekends. The cited reasons were that ICT technology no longer requires 24-hour building access and that weekend closure would save energy and security costs. The net result has been that on the weekends our lab is completely empty, when there used to be at least one or two PhD students working there, when I'd go in.

I became curious about exactly how much electricity is being saved and realized that we can actually make a rough estimate using social media to calculate the number of weekends in which the building is actually powered down. For example, today (a Saturday) it wasn't. Today, I took a picture (above) in the cafeteria which reveals the fact that it wasn't. There were also other people in this group in the picture that were themselves taking pictures. If a search engine will allow me to find other pictures (e.g., on Flickr) of weekend events in the building, it will be possible to make an estimate the total number of days of electrical consumption actually saved by keeping the PhD students out of the lab.

None of the individual picture takers know this information, but if the information can be aggregated with the support of a search, then we can know -- calling the information into being, as it were, using a couple of queries linked to dates and locations. I'll leave it to another day to examine the question of if I actually have a right to know how much energy is being saved by the building closure policy. Here, I draw a different conclusion: if I am going to rely on a search engine to formulate an impression of what happens in the building on the weekends, then I would like that search engine to have assumed the responsibility of giving the best answer it possibly can.