Temptation to let computers do it all, sans human element, can lead...

Temptation to let computers do it all, sans human element, can lead to Big Data trouble

This post is also available in: heעברית (Hebrew)

15952269_sBig Data does not necessarily mean Good Data. And that, as an increasing number of experts are saying more insistently, means Big Data does not automatically yield good analytics.

If the data is incomplete, out of context or otherwise contaminated, it can lead to decisions that could undermine the competitiveness of an enterprise or damage the personal lives of individuals.

According to Security Leadership one of the classic stories of how data out of context can lead to distorted conclusions comes from Harvard University professor Gary King, director of the Institute for Quantitative Social Science. A Big Data project was attempting to use Twitter feeds and other social media posts to predict the U.S. unemployment rate, by monitoring key words like “jobs,” “unemployment,” and “classifieds.”. Using an analytics technique called sentiment analysis, the group collected tweets and other social media posts that included these words to see if there were correlations between an increase or decrease in them and the monthly unemployment rate.

While monitoring them, the researchers noticed a huge spike in the number of tweets containing one of those key words. But, as King noted, they later discovered it had nothing to do with unemployment. “What they hadn’t noticed was Steve Jobs died,” he said.

iHLS – Israel Homeland Security

In the telling, it’s a somewhat humorous story, outside of the tragedy of Jobs’ untimely passing. But the lesson is a deadly serious one for those looking to rely on the magic of Big Data to guide their decisions.

King said the mix-up over the dual meanings of “jobs” is, “just one of many similar anecdotes. Anyone working in this area has had similar experiences.” “Lists of keywords, curated by human beings, work OK for the short run, but tend to fail catastrophically over the long run,” he said. “You can fix it up by adding exceptions, but there’s a lot of human labor involved.”

He said it is easy for anyone to create their own example just by entering a keyword into the Bing Social page. “You’ll see some relevant things and some irrelevant. If you don’t change the query and watch over time, you will often find the conversation veering away in some way — sometimes a little, sometimes not at all for a while, and sometimes dramatically,” he said.

But King said that overall there are many examples of big data analytics producing useful things, “so failures tend not to appear in the literature.” Kim Jones, senior vice president and CSO of Vantiv, said this is not a new problem, but one that can be magnified if people think massive amounts of data are going to magically produce good analytics.