Timandra Harkness, writer
Kat - Big data is the big thing in science right now, with researchers around the world generating and trawling through ever-bigger data sets in search of answers to ever-bigger questions. And, unsurprisingly, advances in gene sequencing technology have produced some huge piles of data for the number-crunchers to play with. So what are they doing? Timandra Harkness is the author of the new book “Big Data: Does size matter?” and I started by asking her, what exactly do we mean by ‘big’, when it comes to data?
Timandra - Part of the idea of it being big is obviously that there is a lot of it. That’s one of the reasons we’re able to do so much with the human genome is that, we’re able to work through masses and masses of bits of information and computers are able to process it really fast and make sense of it. And that’s why we’ve gone from the human genome taking years and costing billions of dollars to doing it within weeks for a thousand dollars. So that’s one thing, but the other thing that a neuroscientist called Paul Matthews described to me as is the difference between large data and big data is, large data is just lots and lots of the same kind of data. But big data is different types of data that you could put together to get a more rounded picture. So for example, scientists like him might take genetic data but put it together with brain scans and even with the post codes of where people have lived, and weather reports from those areas so that you could get a picture of somebody’s health that takes into account their genetics but also the environment they've been in and what illnesses they developed and really start to see how those factors interact.
Kat - What sort of information can we get out then? How is this useful, being able to make all these network of links rather than just going, “Okay, that links to that”?
Timandra - Well obviously, there are some cases where there's one gene, you can identify it and say, “That will cause you to develop this disease.” But most of genetics doesn’t actually work like that, that what it does is gives you a propensity for something or a risk of something. And then you're starting really to look statistically in saying, “Okay, you're more at risk of this because of this gene” or “Well, we’ve noticed that a lot of people with this combination of genes are going to develop this but not all of them. So, maybe we need to look at other factors and see if perhaps there's some lifestyle thing that you could avoid and that would cut down your genetic risk or whether there's something else going on that we haven't found yet the same combination with the genes because it’s a nice idea that genes are just a digital thing and if the gene is there, something will happen, if it’s not, it won't. But that's really not at all how it works.”
Kat - Is there a problem with too much data in genetics? I talked to scientists and they say, “Oh God! We’re just getting so much sequencing data or so much data from our experiments that we don’t have time or the computing power to get through it.”
Timandra - I think that is a problem because it’s almost as if the ease of gathering data and the ability to store loads of data that we didn’t have before makes it the default – you think well I might as well collect this because I can. But then it doesn’t necessarily get you closer to the answer. So yes, there is the problem of actually being able to meaningfully process it. but the other thing is, a few years ago, I think there was a first flush of excitement about big data and people started saying, “Oh well! Theory is dead. We don’t theories now because we’ll basically just take all our data into a massive computer and the computer will do all the work and spit out the answers. We won't even have to ask the questions.”
Kat - Sequence all the things!
Timandra - Exactly! A correlation will just give us all the answers. We won't care why things happen because all we need to see is which things happen together and that will enable us to prevent things. I think people are calming down a bit now and going, “Well, when we said theory is dead, we just meant it was having a bit of a lie down. And obviously, we’ll still need causality and actually, we will sometimes still need a hypothesis. But the two can work together.” So, you might look at some data and you're – what's essentially an artificial intelligence driven computer which has been looking at it for you, you might say, “Here are some interesting patterns. You might want to look at these. There might be something here.” And then you go and use your human judgment and think, “Oh, well. Now actually, I can see why it’s flagged that up but there's a perfectly simple real world explanation why people who live in cold climates might have low vitamin D because there's less sunshine. And when the sun is out, it’s too cold to expose your skin to it.” Or there might be something genuinely new that you would never have spotted before and then you can form a hypothesis and go and investigate it, and then vice versa, you might think, “I have a feeling from my other research. There might be a link here. Now, let’s go back and look at the masses of data and let the computer do the trawling to see if there really is something there.”
Kat - Are we going to start to get closer to answering some of those really big questions like, how do you go from the code in DNA into building a baby?
Timandra - I think it certainly got massive potential and I think scientific research is one of the really big areas of potential for big data. And certainly, the people I've spoken to are very excited by it, being able to deal with a lot of information and also, to combine things that you could never practically have combined before. But as I say, I think the people who, a couple of years ago maybe were getting over excited and saying, “This will transform everything and you won't even really need scientists because the computers will just tell us everything” are now starting to say, “Actually, no. you still need the human minds to make sense of it. But now, we have some amazing new tools to get us there a lot quicker.”
Kat - Timandra Harkness, and her book “Big Data: Does size matter?” is out now, published by Bloomsbury Sigma. And also a quick reminder that my own book - Herding Hemingway’s Cats: understanding how our genes work - is also out now, available in hardback, e-book and audiobook, According to the journal Nature it’s “A witty, clued-up report from the front lines of genetics”, while one of my heroes, Radiolab presenter Robert Krulwich described it as “a gorgeously written, surprisingly gripping introduction to everything we've learned about genes since the famous Human Genome Project several years ago”, so why not give it a read?