Big Data In Science

A retail store needs big data to track its merchandise. Another company leverages big data in order to streamline its manufacturing process, or looks at data and comes up with better marketing strategies. At the bottom of this lies one imperative: profit. How is science using big data?

Well, science needs big data for a lot of things, but the underlying thread is the search for truth. Getting there, though, can be a tricky business, because precision matters, and precision is the outcome of many many MANY observations. Take CERN, for instance.

The Large Hadron Collider at CERN produces 500 Exabytes daily. That’s 500 billion Gigabytes of data every single day. But 99.99% of this behemoth of information is discarded as noise, and the fraction that remains is not consolidated using large traditional arrays, but divided throughout many parallel processing nodes using technologies like Hadoop. This arrangement means that teams have to share resources. A contemporary enterprise rarely sees inter-department cooperation when it comes to crunching data. At CERN, the standard procedure is for everyone to globally „pitch in”.

Making use of computing power and sharing the findings are two different things. CERN might be an exception in doing both of these, but in science, the common use of data collected in a field is still a long way from becoming reality. In molecular biology and chemistry, for instance, this does not happen. Competition harbors secrecy. And at the heart of competition lies, again, the search for profit. However, a sense of community among scientists would do the rest of the world a lot of good. In Timo Hannay’s words,

“If institutions and funders were to give more credit to open sharing of research data, scientific progress would accelerate and we would all benefit.”

Another problem is addressability. While business enterprises enjoy polished and dedicated software, scientists represent an insignificant niche. A big software company doesn’t see the point in investing a lot of effort to build tools for such a small and specialized market. Many scientists still build their own software.

 “There are around 7 million researchers in the world, making them about 0.1% of the human population.”

Add to that that science, in the age of big data, requires constant scrutiny from peers to evaluate the quality of the research. Christie Aschwanden has an interesting story about how difficult it is to keep in check all those intentionally perpetuated frauds that give scientists a bad name. It’s easy to manipulate your data even if you don’t want to, so it’s no wonder that some less scrupulous people distort the results or even make them up as they go in order to follow their own agendas. Double checking every paper prior to it being published requires an insane amount of work.


“If we’re going to rely on science as a means for reaching the truth — and it’s still the best tool we have — it’s important that we understand and respect just how difficult it is to get a rigorous result.”

Leave a Reply

Your email address will not be published.