The 'Big Data' Revolution: How Number Crunchers Can Predict Our Lives

Mar 7, 2013
Originally published on April 13, 2013 11:52 am

When the streaming video service Netflix decided to begin producing its own TV content, it chose House of Cards as its first big project. Based on a BBC series, the show stars Kevin Spacey and is directed by David Fincher, and it has quickly become the most watched series ever on Netflix.

The success of House of Cards is no accident. Netflix executives knew exactly what their millions of customers were watching; they knew precisely how popular the works of Fincher were, and how many of their customers were fans of Kevin Spacey, and how many people were streaming the British House of Cards. Sifting through that mountain of data, Netflix executives were able to predict that House of Cards would be just what Netflix viewers would want to watch.

That kind of decision-making is an example of Big Data: the decade-long explosion of digital information, much of it personal, that has become available to companies and governments. This trend in predictions and decisions is the topic of a new book, Big Data: A Revolution That Will Transform How We Live, Work and Think.

One of the book's authors, Kenneth Cukier, joins NPR's Steve Inskeep to talk about how Big Data helps Target detect pregnancies, the police track potential criminals — and has even changed the way he talks to his kids.


Interview Highlights

On how Target identifies pregnant customers

"The example comes from Charles Duhigg, who's a reporter at The New York Times, and he's the one who uncovered the story. What Target was doing was they were trying to find out what customers were likely to be pregnant or not. So what they were able to do was to look at all the different things that couples were buying prior to the pregnancy — such as vitamins at one point, unscented lotion at another point, lots of hand towels at another point — and with that, make a prediction, score the likelihood that this person was pregnant, so that they could then send coupons to the people involved... there might be a coupon for a stroller or for diapers ...

"There was an example of a father coming in to a store and complaining that the teenage daughter was receiving fliers in the mail for coupons for baby products. And he said, 'What are you trying to do? Trying to get my teenage daughter pregnant?' And of course the way it ends is that he comes back later and apologizes, and says, 'It turns out there were things in my house that I wasn't aware of.'

"But the fact is, this is the sort of universe that we are now going to be in — we're all going to be in — because of Big Data. And all stores will be doing this, and all governments will be doing this. Your doctor will do this. Your employer will do this. This is the new norm."

On why Big Data doesn't care about causes, just correlation

"They crunched the numbers, and they found out that cars that were orange tended to not have breakdowns compared to other colors of cars ... So why might this be? Well, we can sort of concoct different scenarios. One is that orange tends to be a custom color, and if you order an orange car, perhaps the rest of the car was made in a custom way, a little bit more care was taken into it. We don't know why, and it's frankly, it's not that important. It might just bring us down a rabbit hole for us to try to find out why. But, again, if you just want to buy a car that's not going to break down, go with the correlation."

On how Google tracks the flu

"Google stores all of its searches. What they were able to do was go through the database of previous searches to identify what was the likely predictor that there was going to be a flu outbreak in certain regions of America. Now, keep in mind, we pay for the [Centers for Disease Control and Prevention] to look at the United States and find out where flu outbreaks are taking place for the seasonal flu. But the difference is that it takes the CDC about two weeks to report the data. Google does it in real time simply on search queries."

On when Big Data crosses the line

"It goes too far when we start making predictions for things that we have not yet done but we have the propensity to do — for example, commit a crime. If Big Data correlations identify me as a 44-year-old male who's a journalist and who has grand eyes for things I can't afford, it may think that I'm going to be susceptible to embezzlement, and maybe I will get a knock on the door by the police, who say, 'We have reason to believe that you're about to commit a crime.' This is sort of like pre-crime in [the film] Minority Report.

"... Now, will that be the case, will we go down that route? Frankly, we already are. There's a whole branch of criminology called algorithmic criminology, and a dimension called predictive policing ... police forces in many cities in America are crunching the numbers and looking at where the likelihood of a crime is going to be, and when, based on the past patterns of crime. And now we can say not just that a crime is going to exist in an area, but that these people have a, say, 80 percent likelihood to be a felon.

"We have judicial systems that presume you have acted and therefore we are going to penalize. We've never had a system whereby we're making predictions about your likelihood, your propensity to do something, before you've actually acted, and therefore we're going to take remedial steps against you."

On how Big Data has changed how he talks to his kids ...

"I think about what it means to educate my children. I have a 5-year-old and a 2-year-old, and I talk to them about ... thinking in terms of how to understand the world and act in the world with imperfect information. This is going to sound very strange, but I ask my child questions ... which I know she doesn't have enough information to answer. And I absolutely reward her when she takes an educated guess, when she makes a decision based on imperfect information. And I find ... a way of explaining to her that that's great. You're living in a world in which you're never going to have enough information, but you're going to have to come to answers and conclusions and make decisions based on it. So it's really important that you take in as much information and come up, using your judgment and wisdom ... come up with a decision based on that."

... and how his daughter responds

"She says that she's writing a book — it's called Big Data for Ponies."

Copyright 2013 NPR. To see more, visit http://www.npr.org/.

Transcript

STEVE INSKEEP, HOST:

Let's explore some of the business implications of big data.

RENEE MONTAGNE, HOST:

That popular buzz phrase means analyzing the masses of information that are now available about your every purchase, your every online search, your every preference.

INSKEEP: And quite a bit more, along with information about millions of other people. The company that finds a pattern in that data may also find a profit. And in today's business bottom line, we talk through the implications with Kenneth Cukier. He's one of the authors of a new book called "Big Data: A Revolution That Will Transform How We Live, Work and Think." You write about, I believe it is, the store chain Target and pregnant women.

KENNETH CUKIER: Well, the example comes from Charles Duhigg, who's a reporter at The New York Times. He's the one who uncovered the story. And what Target was doing was they were trying to find out what customers were likely to be pregnant or not. So what they were able to do was to look at all the different things that couples were buying, such as vitamins at one point, unscented lotion at another point, lots of hand towels at another point. And with that make a prediction, score the likelihood that this person was pregnant, so that they could then send coupons to the people involved. So that you could say there might be a coupon for a stroller or for diapers.

INSKEEP: Wait a minute. You're saying that if someone's credit card shows that she bought unscented lotions on a particular date, it suggests that she might be a certain number of weeks pregnant?

CUKIER: You know, it's not going to be just one factor. It's probably going to be many factors and it's also probably going to be over time. So what we may find that at the outset of a pregnancy, a woman who discovers she's pregnant is going to start taking certain vitamins. Later on in the pregnancy, she's going to want to buy certain things like new lotions. Her body's getting bigger and so she wants to make sure her skin is smoother and doesn't crack as the tummy expands. Those sorts of thing we can now identify are predictors of pregnancy.

INSKEEP: And you even give an example of the book where Target seemed to know things that even the family did not know.

CUKIER: Well, that's exactly right. There was an example of a father coming in to a store and complaining that the teenage daughter was receiving fliers in the mail for coupons for baby products. And he said, what are you trying to do? Trying to get my teenage daughter pregnant? And of course the way it ends is that he comes back later and apologizes and says it turns out there were things in my house that I wasn't aware of. Now, the story may be - it may not be real. But the fact is, this is the sort of universe that we are now going to be in - we're all going to be in - because of big data. And all stores will be doing this, and all governments will be doing this. Your doctor will do this. Your employer will do this. This is the new norm.

INSKEEP: Now, I want to get at something that you underline in the book, which is that a lot of people who are mining data do not care why one fact means that another fact is likely to be true. They just care that it is. You give the example of a study that attempts to determine which used cars are more likely to run well.

CUKIER: Yes. So they crunched the numbers, and they found out that cars that were orange tended to not have breakdowns compared to other colors of cars.

INSKEEP: You're just talking about the paint, right? It's an orange car.

CUKIER: That's exactly right. So why might this be? Well, we can sort of concoct different scenarios. One is that orange tends to be a custom color, and if you order an orange car, perhaps the rest of the car was made in a custom way, a little bit more care was taken into it. We don't know why, and it's frankly, it's not that important. It might just bring us down a rabbit hole for us to try to find out why. But again, if you just want to buy a car that's not going to break down, go with the correlation.

INSKEEP: When I read this book, what I read is companies finding masses of information that was previously thought to be useless and finding new uses for it.

CUKIER: That's right. Take search queries, for example. Google stores all of its searches. What they were able to do was go through the database of previous searches to identify what was the likely predictor that there was going to be a flu outbreak in certain regions in America. Now, keep in mind, we pay for the Center of Disease Control to look at the United States and find out where flu outbreaks are taking place for the seasonal flu. But the difference is that it takes the CDC about two weeks to report the data. Google does it in real time simply on search queries.

INSKEEP: We've reported on this in the program in the past. You're saying that certain search queries are a predictor. They correlate with the spread of the flu. And it might be something obvious like people doing searches for flu remedies or flu vaccines. But whatever it is, there are search terms that can help you predict that the flu is about to spread in a certain area.

CUKIER: That's exactly right.

INSKEEP: When does all this go too far?

CUKIER: It goes too far when we start making predictions for things that we have not yet done but we have the propensity to do. So, for example, commit a crime. If big data correlations identify me as a 44-year-old male who's a journalist and has grand eyes for things I can't afford, it may think that I'm going to be susceptible to embezzlement, and maybe I'll get a knock on the door by the police saying we have reason to believe that you're about to commit a crime. This is sort of like pre-crime in "Minority Report."

INSKEEP: Yeah, the movie - the Tom Cruise movie from some years ago, right.

CUKIER: You know, will that be the case, will we go down that route? Frankly, we already are. There's a whole branch of criminology called algorithmic criminology, and a dimension called predictive policing. Police forces in many cities in America are crunching the numbers and looking at where the likelihood of a crime is going to be, and when, based on the past patterns of crime. And now we can say not just that a crime is going to exist in an area, but that these people have a, say, 80 percent likelihood to be a felon. We have judicial systems that presume you've acted and therefore we're going to penalize. We've never had a system whereby we're making predictions about your likelihood, your propensity to do something, before you've actually acted and therefore we're going to take remedial steps against you.

INSKEEP: Did the knowledge that you gained over the course of writing this book change the way that you lead your life?

CUKIER: Yeah, absolutely. I think about what it means to educate my children. I have a five-year-old and a two-year-old, and I talk to them about - in strategic language and thinking - in terms of how to understand the world and interact in the world with imperfect information. This is going to sound very strange, but I ask my child questions with which I know she doesn't have enough answers - enough information to answer. And I absolutely reward her when she takes an educated guess, when she makes a decision based on imperfect information. And I find, even though in the five-year-old language, a way of explaining to her that that's great. You're living in a world in which you're never going to have enough information, but you're going to have to come to answers and conclusions and make decisions based on it. So it's really important that you take in as much information and come up, using your judgment and wisdom - I don't use those words - but come up with a decision based on that.

INSKEEP: And that's what big data is, as you define it. It's not knowing everything. It's even though you have an enormous amount of information, you're figuring out just enough to act, even if you don't fully understand the conclusions that you're drawing.

CUKIER: That's exactly right.

INSKEEP: So does your five-year-old look at you like you're kind of weird when you do this?

CUKIER: No. She says that she's writing a book. It's called "Big Data for Ponies."

INSKEEP: Well, I never could have predicted that the interview would end this way but it has. Kenneth Cukier is the co-author of "Big Data: A Revolution that Will Transform How We Live, Work and Think." Thanks very much.

CUKIER: Thank you. Transcript provided by NPR, Copyright NPR.