Sunday, July 19, 2015

Teaching the First Steps in Data Science: Don't Simplify Out the Essentials

Teachers of Data Science are faced with the challenge of initiating students into a new way of thinking about the world. In my case, I teach multimedia analysis, which combines elements of speech and language technology, information retrieval and computer vision. Students of Data Science learn that data mining and analysis techniques can lead to knowledge and understanding that could not be gained from conventional observation, which is limited in its scope and ability to yield unanticipated insights.

When you stand in front of an audience who is being introduced to data science for the first time, it is very tempting to play the magician. You set up the expectations of what "should be" possible, and then blow them away with a cool algorithm that does the seemingly impossible. Your audience will go home and feel that the got a lot of bang for their buck---they have witnessed a rabbit being pulled from a hat.

However: will they be better data scientists as a result?

In fact, if you produce a rabbit from a hat, your audience has not been educated at all, they have been entertained. Worse case they have been un-educated, since the success of the rabbit trick involves misdirection of attention away from the essentials.

My position is that when teaching the first steps in data science, it is important not to simplify out the essentials. Here, two points are key:

First, students must learn to judge the worth of algorithms in terms of the real-world applications that they enable. With this I do not mean to say that all science must be applied science. Rather, the point is that data science does not exist in a vacuum. Instead, the data originally came from somewhere. It is connected to something that happened in the real-world. Ultimately, the analysis of the data scientist must be relevant to that "somewhere", be it a physical phenomenon or a group of people.

Second, students must learn the limitations of the algorithms. Understanding an algorithm means also understanding what it cannot be used for, where it necessarily breaks down.

At a magic show, it would be ridiculous if a magician announced that his magic trick is oriented towards the real-world application of creating a rabbit for rabbit soup. And no magician would display alternative hats from which no rabbit could possibly be pulled. And yet, as data science teachers, this is precisely what we need to do. It is essential that our students know exactly what an algorithm is attempting to accomplish, and the conditions that cause failure.

Yesterday, was the final day of the Multimedia Information Retrieval Workshop at CCRMA at Stanford, and Steve Tjoa gave a live demo of a simple music identification algorithm. It struck me as a great example of how to teach data science. As workshop participants we saw that they algorithm is tightly connected to reality (it was identifying excerpts of pieces that he played right there in the classroom on his violin), and his demo showed its limitations (it did not always work).

This exposition did not simplify out the essentials. Students experiencing such a live demo learn the algorithm, but they also learn how to use and how to extend it.

We were blown away not so much by the cool algorithm, but by the fact that we really grasped what was going on.

Experiences like this are solid first steps for data science students, and will lead to great places.


Postscript:

That evening, one of my colleagues asked me if I still wrote on my blog. No, I said, I had a bit of writer's block. I had been trying to write a post on Jeff Dean's keynote at ACM RecSys 2014, "Large Scale Machine Learning for Predictive Tasks", and failing miserably. The keynote troubled me, and I was attempting to formulate a post that could constructively explain why. Ten months past.

With the example of Steve's live demo it became clear why my main problem was with the keynote. It contained nothing that I could demonstrate was literally wrong. It was simply a huge missed opportunity. 

Since ACM RecSys is a recommender system conference, many people in the room were being thinking about natural language processing and computer vision problems for the first time. The keynote did not connect its algorithms to the source of the data and possible applications. Afterwards, the audience was none the wiser concerning the limitations of the algorithms it discussed.

I suppose some would try to convince me that when listening to a keynote (as opposed to a lecture) I need to stop being a teacher, and go into magic watching mode, meaning that I would, suspend my disbelief.  "That sort of makes sense, it looks pretty good" Dean said to wrap up his exposition of paragraph vectors of Wikipedia articles. 

https://www.youtube.com/watch?v=Zuwf6WXgffQ&feature=youtu.be&t=4m20s

If you watch at the deep link, you see that he would like to convince us that we should be happy because the algorithm has landed music articles far away from computer science. 

In the end, I can only hope that the YouTube video of the keynote is no one's first steps in data science.

Independently of a particular application, landing music far away from computer science is also just not my kind of magic trick.