How Google News was born, how it’s grown up, and what it’s still learning.
Question: What is Google News, and how and when did it come about?
Josh Cohen: Yeah. So, actually it's an interesting about Google News. It started really as an idea from one of the early, early engineers at Google, somebody named -- and engineer named Krishna Bharat. Krishna was one of the first hundred or so employees at Google, and people have probably heard Google has 20 percent time where engineers are really encouraged to do things in their 20 percent time, things that aren't necessarily their day jobs. And Krishna had worked on clustering technology in his graduate work, and he started thinking about how to apply that towards news. And it actually sort of came about shortly after September 11th, where Krishna is -- went to school in the U.S. but is originally from India -- and when the September 11th attacks hit -- I mean, obviously this was a story for New York, it was a national story, it was an international story. And Krishna had his sort of regular sources that he would go to to sort of check out the news of the day.
But this was this huge global story, and he really wanted to understand how the rest of the world was responding to it. I mean, what was the different coverage like? So beyond just the sources that he knew and would go to on a regular basis, what was the rest of the world saying about this? So he really just in his spare time put together this demo where you sort of crawled the Web looking for news information and clustered it by story topic as opposed to by sources. And so that way you could get all these different perspectives on a given story, whether it's a different political perspective, a different geographic perspective, in some cases different languages as well, and that was sort of the germ of the idea behind Google News. And so this was sort of -- he had a working demo in late 2001, and then it really launched in beta in the beginning of 2002. And obviously, it's sort of changed significantly since then, and now Google News is available in over about 30 different languages, and upwards of 50 different editions or domains.
Question: How has Google News improved or gotten smarter over the years?
Josh Cohen: Yeah, so I think one of the things I just mentioned is actually a pretty good example of that, which is user behavior. If you look at sort of a story cluster, it's probably not a surprise that the first link in that story cluster that has the headline and the actual snippet gets far more clicks than the second and third and fourth and so on. That's not a huge shock. That's pretty consistent with what you see on Web search results as well. But if you think about a user behavior, they're supposed to, they're supposed to go and click on that first link. When a user comes in and doesn't click on that first link, and instead clicks on that third or fourth link -- maybe it's just the source name -- they weren't supposed to do that. You know, they weren't supposed to click on that link. Over time, as you aggregate that information and normalize it for click position, it can become a really, really strong signal for us to try and determine what a user thinks a trusted source is.
And just, you know, giving an example, if you look at a business story, and you've got a cluster of stories, and maybe you've got The Wall Street Journal or Reuters or Bloomberg, and they're ranked in a third or fourth position. A user may come in and say, I don't care that Google is telling me that this is the third most important link; this is a business story, and that's the source that I want to go to, and they'll click on that. You flip that around to, let's say it's a sports story, and maybe we've got The Wall Street Journal ranked first, and a used may say, I don't care that this is ranked first; I want ESPN or Sports Illustrated. And they'll bypass that first link from the Journal. So you can really begin, edition by edition and section by section, to understand a user's trust of a given source. And that becomes a really good signal for us to use, again, separate from the story variable rankings, but just all things being equal, how do you make some sort of distinctions between the sources? So that's one that we've really made much better use of in the last year or so.