The revolutionary idea of building an open-source software search engine.
Question: What is Wikia Search?
Jimmy Wales: Yeah, so at Wikia Search, what we’re really trying to do is build a freely licensed, or open source software search engine, a general search engine for the web. We’re building on efforts that have been going on for several years in the Nutch project. But we’re committing a lot of infrastructure to it and programmer time. And the basic idea here is to say that, you know, search is the one key piece of infrastructure, the internet that is still very much proprietary, closed, secretive, and we want to try to change that. We want to make it open. So we’re publishing all the algorithms, publishing all the software, so people can see it. But additionally, what we’re trying to do is bring in the element of mass participation, user participation. The idea is to take the normal editorial decisions that are made within the search engine company, and push those out into the community as much as possible, so that, you know, the search results reflect whatever this community thinks should be ranked in certain ways. So we just launched a very preliminary alpha version a couple months ago. We’re currently very hard at work at Version 0.2, which we’ll be releasing within the next month or two. We haven’t really fixed a date yet. That’s actually something I need to be working on right now is my release schedule. But yeah, we’re really looking forward to getting that out, because there’s a lot of cool new tools and things.
Question: What would that mean for the non-coding user?
Jimmy Wales: So in terms of the user experience, we expect it to be really quite a bit similar, I mean, in other words you come and you type something you’re looking for, and you hit “Search,” and you get some stuff back. The one major difference at that point is that you’ll be able to edit the results, in essence. So you’ll be able to delete things, add things. And those are going to be public actions, just as the public, like when you edit Wikipedia that’s a public action. And there’ll be a whole community of people monitoring and overseeing all that. That’s the main difference from the end user point of view, but for me, I think the deeper implication here is not about sort of just the user experience, but about the open source nature, the free software nature. The way I look at that for the average user, obviously, you’re not going to go and download all our source code and read it, that’s really not that important, but what is important-- I view this as being very similar to-- in a free society, in an open society we insist that our court systems operate in a public fashion. So you can go down to the courthouse today and go in and watch a trial unfold, and that’s done in public. And that’s a valuable safeguard for human rights, democracy, even though most of us never actually bother going down to the courthouse. But it’s important that there’re people who can, and people who do. And they’ll raise the alarm if something untoward is going on, and that’s really the way it works with open source software. There’s also a lot of practical benefits to open source software which I think we can realize here. You know, right now, hands down the best web browser is Firefox, which is an open source project. And you know, that development model for software has proven to be highly effective. One of the things we’re looking at is that there’re lot of parts of the search business that are duplicated efforts. Lots of different smaller competitors are wasting a lot of money duplicating a lot of effort to doing some very basic infrastructure things that are really not necessary to do multiple times. I mean, crawling the web, it’s a big job, but what it means is you have to go and fetch, you know, lots and lots and lots of pages from all over the internet, and it’s a bit of a commodity item. I mean, anyone can crawl the web, it just takes, well, a lot of engineers and money and time. And you know, the techniques for doing it are pretty robust and well-known. So what we’re all hoping to do is using the social model of free licensing, which has already allowed lots of different companies to come together. You know, IBM contributes a lot to free software, Red Hat, you know, all these different companies are working in this space, and they’re able to do it because the free licensing model creates a level playing field for that. So we’re hoping to see the same kinds of things start to develop with search data. In other words having different players share their crawl results with others freely. Well, we’re working on that.
Another part of it is one of the things-- one of the really interesting things that’s going on right now in search is that all of the-- <coughs> sorry. One of the interesting things about search that’s going on right now is that all of the research and development, the vast majority of it is going on inside companies in proprietary research, behind closed doors, very secretive. At Google, Yahoo! Ask and Microsoft and so forth. And right now if you are a PhD student in computer science and you’re really interested in search algorithms, it’s kind of hard for you to really get involved and to work on things, because you don’t have access to the kind of resources that you would like to have. And so a big part of what I want to do is say, “Well, look, if we make all this source code open source, any of those people can download it, they can run out on a university cluster. We’re gonna try to make available research machines for people to run things. Just to do experimentation. Just to sort of say, “Let’s try to use the time-honored and tested methods of academic research where things are done publicly under peer review, and really sort of let’s provide some infrastructure for that. So that’s kind of the idea there as well.
Recorded on: 4/30/08