Exit Powerset, Enter Deep Web


shyam - Posted on 23 February 2009

Now that we are done with the Powerset hype (also the Cuil hype), The New York Times takes it up on itself to find another avenue to generate hype in the search domain. In all honesty, the avenue is not all that new, we have been hearing about even before the Powerset mania had taken over the world, but I thought we were over and done with it. At least that was the case till the NYT decided to bring out the old skeleton.

Deep web searching, apparently, is the ability to query into pages and databases that are there in the web's "hidden corners." The article does nothing to explain what exactly is a "hidden corner" in the web. So, I tried exploring the deep web, using Kosmix, which is one of the examples cited in the article. I tried searching for "flights to Delhi" and guess what, the web search for the results are from Google. The rest is a mash up of various crawled structured data sources organized on the same page. In fact, Kosmix does not even say they are a search engine, they claim to only organize the web for you.

In fact, the article itself goes on to demonstrate that Kosmix has nothing to do with the "hidden corners" of the internet.

Most search engines try to help you find a needle in a haystack,” Mr. Rajaraman said, “but what we’re trying to do is help you explore the haystack. Anand Rajaraman, co-founder of Kosmix

The article also brings up the example of DeepPeep, a research project that promises to help you "discover the hidden web," by searching through web-forms. If searching through web forms is the next generation then it is with much sadness that I will point towards the dime-a-dozen screen-scraping websites that have been indexing airline, hotel and other data sources for a while now. We can really safely assume, thus, that the newfangled Deep Web is mostly bunk.

The writer also manages to drag in semantic web also into the equation, ensuring complete buzzword compliance in the article.

Deep Web technologies hold the promise of achieving similar benefits at a much lower cost, by automating the process of analyzing database structures and cross-referencing the results.

No kidding!

Edit: Thanks to Mark in the comment below for pointing out the error in the headline

..