This post is also available in: heעברית (Hebrew)

20195618_sThe definitions may vary but they all say one thing – what is the trick of extracting useful, real time data from the super huge quantities of bits that are collected every minute by the different sensors or collecting and storing tools around us.

This mission is what keeps Dr. Aya Soffer, director, Big Data Analytics  at IBM very busy. The problem gets more complicated each day and answers are needed.

Talking with I-HLS she first defined the multitude of the problem by some facts prepared by IBM a

company that masters the handling of data.

i-HLS Israel Homeland Security

Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big data.

 Big data spans four dimensions: Volume, Velocity, Variety, and Veracity.

Volume: Enterprises are awash with ever-growing data of all types, easily amassing terabytes—even petabytes—of information.

  • Turn 12 terabytes of Tweets created each day into improved product sentiment analysis
  • Convert 350 billion annual meter readings to better predict power consumption

Velocity: Sometimes 2 minutes is too late. For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value.

  • Scrutinize 5 million trade events created each day to identify potential fraud
  • Analyze 500 million daily call detail records in real-time to predict customer churn faster

Variety: Big data is any type of data – structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together.

  • Monitor 100’s of live video feeds from surveillance cameras to target points of interest
  • Exploit the 80% data growth in images, video and documents to improve customer satisfaction

Veracity: 1 in 3 business leaders don’t trust the information they use to make decisions. How can you act upon information if you don’t trust it? Establishing trust in big data presents a huge challenge as the variety and number of sources grows.

Dr. Aya Sofer, i-HLS Big Data Conference
Dr. Aya Sofer, i-HLS Big Data Conference

Big data is more than simply a matter of size; it is an opportunity to find insights in new and emerging types of data and content, to make your business more agile, and to answer questions that were previously considered beyond your reach. Until now, there was no practical way to harvest this opportunity. Today, IBM’s platform for big data uses state of the art technologies including patented advanced analytics to open the door to a world of possibilities.

So this is the problem and here is where Dr. Soffer’s team is coming into the picture.

“The first problem we face is how to get pieces of data that are seemingly not connected in any way and build with them a picture, a complete one. We use an index that enable us , or the special software based systems we  developed to make the aggregation – the adding of all the smallest pieces into one coherent picture”.

And this index can vary according to the needs- one day is has to identify a person on the web and al little later it has to identify a pattern of behavior.

i-HLS Israel Homeland Security

The recent exposures on the U.S. administration effort to “dig” information from listening to cell phones is one example of the need of governments and organizations to find the “needle in the hay stack”.

“We are developing the tools to identify anomalies, tie names so that the connection makes sense and create the patterns that will allow the analyst of any sort to use the finding in building the needed profile”.

Can that be achieved by only computers?

Dr. Soffer says that the man in the loop is still needed and will probably be for many years. “Our systems extract the relevant data and put it in a way that is understandable to a professional analyst. The final understanding of what the different bits of data mean is still in the human brain”.