Alexa WebSearch launches

The Alexa WebSearch Platform announcement (everybody kees pointing here, but I just happened to read about it on Wall Street Journal late at night) is one of those deals that might cause the next revolution in the field of Web search, or might turn out to be not a big deal at all. Om Malik apparently thinks it’s a big deal, and Phil Wainewright doesn’t think it will be.Alexa WebSearch

Historically all components of the search engine

  • index
  • ranking algorithms
  • interface to present results

has been controlled by the search engines themselves, where index was used for bragging rights, ranking algorithms defined the search engine efficiency and usefulness for the user, and interface was a way to serve nice ads to pay the bills.

Which all made it really hard to build a really good search engine. First you had to come up with idea of efficiently spidering millions of Web sites out there, then it was the time for the trip to the store to get more hard drives, then it was the matter of deciding what goes and what doesn’t go into the index, then the database server for this huge chunk of data had to be built, then the power bill arrived… That’s why there’s a bunch of startups building blogging tools and streaming wiki via api mashups, but not a whole lot building yet another search engine. Since even if something decent is build, it gets hard to scale it.

Alexa, it seems, wants to commoditize the index and build an open corpus out of crawled Web sites. Computer scientists enjoy having large collections of texts available to them, since it encourages a whole bunch of projects that are interesting, like how many times a phrase you know is used in everyday English language. When I was in grad school (old fart alert!) all we had access to was Reuters Corpus and similar others, but nothing to the extend of 100 TB that Alexa is offering today. One can simply SSH into a server and start compiling the code with gcc. For academia the WebSearch platform will be a pretty good playground to experiment with.

In targeting commercial developers Alexa would have to put in some more effort. Current problems with starting a startup that’s built upon WebSearch platform include:

  • relatively small index, compared to what Google, Yahoo! and others have to offer. That is, if you’re building a search pure-play on top of Alexa, you already lost the competition
  • lack of information about frequency of updates – can I build a blog search, for example, that would refresh certain sites in the index several times a day?
  • once the project becomes popular, it’s not clear how well it can scale on Alexa-managed back-end. They will probably add more boxes, if paying customers ask them, but where exactly you will be on their priority list remains to be seen

Nevertheless, the API is pretty interesting and deserves some time. However, the only early commercial applications that I can see arriving in a short time frame is Alexa-based content generators for spam blogs and keyword-stuffed Web sites displaying contextual ads of a third-party network.

Posted Tuesday, December 13th, 2005 under Technology.

Leave a Reply