Skip to content
Who's in the Video
Seth Stephens-Davidowitz has used data from the internet, particularly Google searches, to get new insights into the human psyche.  A book summarizing his research, Everybody Lies, was published in May 2017[…]
Sign up for Smart Faster newsletter
The most counterintuitive, surprising, and impactful new stories delivered to your inbox every Thursday.

SETH STEPHENS-DAVIDOWITZ: So there's a methodology called k-Nearest Neighbor in big data analysis where you can find a person who looks similar to another person. Who's the most similar on a number of traits?

But I kind of renamed the search a doppelganger search because I think that's a cooler name for it and also accurate. So you basically look in a huge data set, you take a person and say "Who is the person who looks most similar to that person?" So one way you might use this is if Amazon's looking for what books to recommend. They may find your book-reading doppelganger. So across the whole universe of Amazon customers, who's the person who tends to buy books like you have bought? And then what books has that person recently read and enjoyed that you haven't read and enjoyed? And that's sort of how they recommend books to you. And this can be used in a lot of other areas. People are just starting to use this in health where you can say, across the entire universe of patients who has symptoms very similar to your symptoms, and what has worked for those people, are your health doppelgangers. So it's a very powerful methodology and it gets more powerful the more data you have. Because the more data you have the more similar, the more likely you're going to find someone in that data set who's really, really similar to you.

Some of this stuff, some of the big data analysis are things we have always kind of done. That's kind of what doctors try to do. They try to say, "Who are you similar to? Of all the patients I've seen, which ones remind me of your case, and what worked for them?" But they've been doing this on a small number of patients, namely the ones they've seen. Whereas the potential for big data is you can do it over the entire universe of patients and get people who are, really, much, much more similar to you. Really zoom in on the tiny subset of people who have a very similar path to you. Instead of saying "You have the condition depression" which might remind a doctor of a hundred depressed patients that he's seen over the past couple of years, you can say maybe that "You have a particular type of depression." So you maybe sleep all the time whereas other depressed patients don't sleep all the time, and you feel guilty whereas other depressed patients don't feel guilty, and then really find these people who are really, really similar who's depression has taken a much more similar path to yours than have other people's depressions.


Related