Bright Planet, Deep Web

By: Sam Vaknin, Ph.D.

www.allwatchers.com and www.allreaders.com are web sites in the sense that a file is downloaded to the user's browser when he or she surfs to these addresses. But that's where the similarity ends. These web pages are front-ends, gates to underlying databases. The databases contain records regarding the plots, themes, characters and other features of, respectively, movies and books. Every user-query generates a unique web page whose contents are determined by the query parameters. The number of singular pages thus capable of being generated is mind boggling. Search engines operate on the same principle - vary the search parameters slightly and totally new pages are generated. It is a dynamic, user-responsive and chimerical sort of web.

These are good examples of what www.brightplanet.com call the "Deep Web" (previously inaccurately described as the "Unknown or Invisible Internet"). They believe that the Deep Web is 500 times the size of the "Surface Internet" (a portion of which is spidered by traditional search engines). This translates to c. 7500 TERAbytes of data (versus 19 terabytes in the whole known web, excluding the databases of the search engines themselves) - or 550 billion documents organized in 100,000 deep web sites. By comparison, Google, the most comprehensive search engine ever, stores 1.4 billion documents in its immense caches at www.google.com.

The natural inclination to dismiss these pages of data as mere re-arrangements of the same information is wrong. Actually, this underground ocean of covert intelligence is often more valuable than the information freely available or easily accessible on the surface. Hence the ability of c. 5% of these databases to charge their users subscription and membership fees. The average deep web site receives 50% more traffic than a typical surface site and is much more linked to by other sites. Yet it is transparent to classic search engines and little known to the surfing public.

It was only a question of time before someone came up with a search technology to tap these depths (www.completeplanet.com).

LexiBot, in the words of its inventors, is...

"...the first and only search technology capable of identifying, retrieving, qualifying, classifying and organizing "deep" and "surface" content from the World Wide Web. The LexiBot allows searchers to dive deep and explore hidden data from multiple sources simultaneously using directed queries. Businesses, researchers and consumers now have access to the most valuable and hard-to-find information on the Web and can retrieve it with pinpoint accuracy."

It places dozens of queries, in dozens of threads simultaneously and spiders the results (rather as a "first generation" search engine would do). This could prove very useful with massive databases such as the human genome, weather patterns, simulations of nuclear explosions, thematic, multi-featured databases, intelligent agents (e.g., shopping bots) and third generation search engines. It could also have implications on the wireless internet (for instance, in analysing and generating location-specific advertising) and on e-commerce (which amounts to the dynamic serving of web documents).

This transition from the static to the dynamic, from the given to the generated, from the one-dimensionally linked to the multi-dimensionally hyperlinked, from the deterministic content to the contingent, heuristically-created and uncertain content - is the real revolution and the future of the web. Search engines have lost their efficacy as gateways. Portals have taken over but most people now use internal links (within the same web site) to get from one place to another. This is where the deep web comes in. Databases are about internal links. Hitherto they existed in splendid isolation, universes closed but to the most persistent and knowledgeable. This may be about to change. The flood of quality relevant information this will unleash will dramatically dwarf anything that preceded it.

Top Searches on
Computers and The Internet
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 

» More on Computers and The Internet