Open Source: “The New Normal” For Big Data

Open Source: “The New Normal” For Big Data

This post is also available in: heעברית (Hebrew)

Big Data is going open. Open-source, that is. Hadoop and Apache Spark are some of the biggest, hottest technologies in Big Data at this moment, but despite all the noise, not everyone knows or talks about the fact they’re both open source.

Mike Tuchen, CEO of Big Data purveyor Talend and former Microsoft executive, thinks this paradigm is about to change. “We’re seeing a changing of the guard,” he said. “We expect the entire next-generation data platform will be open source.”

Tuchen is talking about the expanded Hadoop ecosystem that is open source entirely. “It’s the new normal,” he said.

Tuchen’s remarks shouldn’t surprise anyone, seeing how Talend provides integration technologies for Hadoop. The company didn’t hedge its bet when it went the open source route, and the bet seems to be paying off. All of Talend’s offerings are open source and focused on Big Data, with customers ranging from GE, Lufthansa, Citi, Orange, and more, the company will soon celebrate its 10th anniversary that will coincide with a major expansion.

Talend was selling in five countries in 2015, and by end of 2016 will be selling in 15, according to Tuchen. To do that they will need to hire 200 new people, growing the company to 750 employees.

The great advantage of open source is that it allows for faster technology growth than the traditional, single-vendor model.

“The whole Hadoop ecosystem is moving faster than it could if it were just one vendor,” Tuchen said. “When you look at it that way, it’s hard to see how the world would ever change back.”

The big players are starting to take notice, too. Google recently announced that it’s open sourcing a major chunk of code through the Apache Incubator. The company will contribute its Cloud Dataflow to the Apache Software Foundation in what is a major move for the Mountain View giant.

“Cloud Dataflow is a platform for processing large amounts of data in the cloud. It features an open source, Java-based SDK, which makes it easy to integrate with other cloud-centric analytics and Big Data tools,” The Var Guy writes.

The platform allows for greater compatibility with new, emerging technologies while integrating into existing workflows. “That saves organizations from having to revamp their analytics infrastructure or code each time a new data processing framework appears,” writes Var.

Will open source kill off the closed-source competition? Probably not quite yet, but at this rate the giants are taking notice, and some seem to be worried.

bigdata2012