Online Resources
 Internet Technology 
 HTML 
 Web Design 
 Business/Web Admin 
 Perl/CGI/DHTML 
How to Search the World Wide Web: An Introduction

Introduction
Indices or catalogs
Spider or 'Bot-Based Search Engines
Web Page Content considerations
Metaindices
Meta Search Engines
Web Portals
How to start your search...
More on these Topics 


Introduction
The available sites that allow you to search for information on the World Wide Web are divided into two major types: Sites that index or catalog the Web (indicies) or sites that search the Web or anexhaustive internal index of the Web (search engines). Other sites that may give you access to the same types of services include meta-indicies and portals. 

Indices or catalogs
Indices or catalogs are topically organized collections of URLs. While they may be searchable, they are not "search engines"; they only list sites that have been submitted by users or collected by their staff. (Many of the search engine sites, in a move to become Web portals, are adding indicies which is beginning to blur the distinction; see more on Web portals below...). In most of the indicies, sites are reviewed and there is a "quality cut"; poor and weak sites are normally not listed.

While all of the catalogs are searchable, their intended use is for browsing. Browsable indices normally are hierarchically organized, starting with very broad topics on to and working down to specificity.

Examples of indices/catalogs

Using indices/catalogs
These may be a little more work for the user as they require you to start at or near the top and work your way down. This is a very close cognate to traditional library and book based research. It is the best type of search is your area is ill defined or broad, as the browsable pages give you a good opportunity to start with a general overview and narrow your effort down from there. Some indicies are maintained by specific subject-matter experts (i.e. the World Wide Web Virtual Library) and may have content with a very high level of relevancy; others may be a little broader and may not contain as much that is relevent to your quest.

Spider or 'Bot-Based Search Engines
This type of site sends out programs to travel the Web, indexing pages; these programs are called 'bots (short for robots) or Web spiders. Spiders/'bots frequently index every word on a webpage (AltaVista does, for example); after indexing, the 'bots will visit every link off of the page and do the same. These sites tend to be all-inclusive and return potentially very large numbers of sites in response to queries. They normally have NO quality cut and may have no "browsing" capability although many now offer indicies as well as the search engines. Some offer additional search features such as image or audio file searches.

Examples:

Using search engines
Search engines are ideal of you have a very specific item or even a specific topic. If you are not exactly sure what you are looking for, these can be very frustrating. Posing good queries take knowledge, practice, and experience.

Web Page Content considerations
Search engines tend to classify pages based on a fixed set of factors; this strongly affects which pages rise to the top in search results.

First place: the page's title
Frequently overlooked, this is what appears in the blue bar at the top of your browser. The title needs to clearly, quickly, and succinctly summarize what the page is about. If you search for a business and find lots of pages that mention them but don't find their pages until way down in your results, odds are they don't even have the name of the business in the page title (they probably have "Welcome" as the titile of the page....)

Second place: META tags
Two tags that are invisible to those viewing the Webpage strongly affect a page's position in search results—the Keyword and the Description META tags. Keywords are used by many search engines to help them classify the relevancy of a page to a particular search. The description is what appears as the description of the page in most search engines (otherwise, the first 25 words or so of the page appear).

Third place: the first twenty to thirty words of the page
This can really leave out pages that are principally graphical in nature.

Other Page Ranking methods
Google ranks pages based on the number of pages that link to that page, treating each link as a "vote" for that page. GoTo.com (now defunct) ranked results based on advertising fees paid for the ranking (they even told you how much was paid for the ranking...).

Metaindices
Some pages on the Web serve as a meta index: an index of indices. These are particularly valuable as starting points for Web exploration, i.e. "surfing".

Examples:

Meta Search Engines
Meta search engines are aggregators; they send your query to multiple search engines and indicies, then take the results and combine them. The value of the results depends on which engines are searched, the number of engines searched, and the ability of the meta engine to rank results for relevancy.

Examples:

Web Portals
Portals are intended to be "one stop" entry points to the Web. They were built to function as replacements for online services such as CompuServe and Prodigy. They're typically advertiser-supported with all services provided at no cost to users but indication are that some may start charging for some services.

Portals normally include at least some of a standard set of elements; some have them all:

  • Ability to personalize content
  • Web index
  • Search engine (may only search the index but in many cases make use of strategic agreements with major search engines, i.e. Yahoo & AltaVista)
  • Free email
  • Chat rooms
  • Free Web page space
  • News, weather, and stock reports
  • Games

Blurring the distinction between indicies and search engines, many index and search engine sites are becoming portals and now are offering both. Existing indicies or search engine sites striving to be major portal sites include:

Other portal sites are conversions of existing high-traffic sites; examples are:

Some are just new (built as portals from the ground up...)

How to start your search...
Is your topic very specific?

Go to AltaVista, Google, or Fast

Why? Largest number of pages indexed increases chances of coming up with an exact match

Is your topic very broad (i.e. children)?

Start at the metaindices; find an applicable topic and see if there is a specific catalog on index that fits the path your search is resolving to

Is your topic general (i.e. government spending)?

Try starting at a catalog type site; just as an example, Wired Cybrarian has a good government section.

As you narrow your topic...

  • look for lists of relevant links on content pages
  • look for key words and phrases on useful content pages that might help define a good search query (i.e. build your own "keyword" list)

Copyright 1998, 1999, 2000, 2001, 2002, 2004, 2006 Webmaster Sources LLC, Naperville, Illinois: Used by permission

More on these Topics 
Search engines, portals and other such stuff.
* Powersearching the Web
http://itwebmaster.iit.edu/webmaster/resources/powersearch.html
How to do a power search of the World Wide Web and more resources.
* Search Engine Showdown
http://www.notess.com/search/
Up-to-the minute comparisons of the search engines.
* 260 Search Engines from refdesk.com
http://www.refdesk.com/newsrch.html
Organized by topic, by pages indexed and all sorts of other ways.
  © Page content Copyright 2004 Webmaster Sources LLC; used by permission.


| Home | About... | How do I...? | Modules | Resources |
Last Updated by Ray Trygstad on 09/20/04 | Copyright 2004 Illinois Institute of Technology