These software robots used by the search engines are called crawler programs. They don't wait for Web site owners to submit their sites for listing. Instead, the crawlers automatically browse the Web, returning the analyzed full text of every Web page they can find, whether it's within a Web site with just a few pages or a Web site with thousands of pages. Typically, the crawler indexes words so that it can search for phrases or sentences in a logical order.
These search engines crawlers are basically attempts to include each word from every page of the entire WWW in their databases. Due to the enormous quantity of Web pages and their daily increase, this is an impossible task.
Research, by the NEC Research Institute in 1999 however, has shown that the eleven top search robots together only cover about 42% of the Web, or 335 million pages. Thus, more than half the Web is not explored by search robots.
The Northern Light robot achieved the best results: it covered 16% of all Web pages. Snap and Alta Vista came second. It seems fair to say that the search robots manage to cover only a tiny section of the Web: the tip of the iceberg! Nevertheless, they remain the only instruments that allow us to retrieve `direct' information from the Web by means of search terms.
Return to Table of Contents Page
This file prepared and presented as an aid to help students understand the web. Send questions or comments to Royce Shook