This post is also available in: heעברית (Hebrew)

There are about 40,000 sensors on an offshore drilling rig, and each collects information about how the rig’s many machines are operating. But sensors can fail or be miscalibrated and, in the relay between the rig and data scientists, data can pick up errors. 

A new development can automate a data quality check for oil and gas companies, thus speeding up the process. The technology that started at the energy industry is now expanded to fields like defense and healthcare, which also generate hundreds of thousands of data points that need it be checked. 

Pandata Tech uses machine learning to review data generated by drilling rigs — and the algorithms determine how likely that data can be trusted.  

Their software reduces the amount of time data scientists have to spend validating their data — from 80 percent of their time down to just 20 percent. It works by using models to generate a data quality score.

For example, a sensor that monitors pressure levels is paired with a computer model of what those levels should be — and the software checks for missing or incorrect data, then uses statistics to determine how likely that the sensors are picking up correct data. It creates a quality score for that data between 0 and 100 in the short and long term, according to Gustavo Sanchez, co-founder and CEO.

The unique challenges of working with large drilling rigs have translated well to working with aircrafts. And the healthcare field is similar — with the Texas Medical Center, Houston’s medical research centers can benefit from hastening the process of data validation, as reported by houston.innovationmap.com.