Metadata & search engine

�

Search Engine

Internet is the major stage for Metadata to play. There are more than one billion Web pages on the Internet and every day more than one million new pages are created. This exciting number also makes it difficult for users to retrieve the high quality information. Although there are also hundreds of Internet search engines acting inside to help users, everyone has experiences that face the large number of irrelevant hits. This situation becomes worse when multi-format, multi-language resources are created and available there.

World Wide Web Consortium (W3C) has developed dozens of specifications for the Web's infrastructure including HTML, RDF and XML. Starting from HTML 3.2, Meta tags are recommended to be used in the header of a HTML document, which let authors provide the information about a document such as author, description of content, and keywords. The role of Meta tag is providing a method to describe the site. let search engine index the contents of Meta tags and ensure accurate and reliable search result available for users.

Currently a variety of Metadata standards and Metadata schemas exist that create a huge barrier for user accessing different systems simultaneously, and sharing the their resources. At the same time, the number of players in this field is continuing increasing. To solve this problem and improve the inter-operability of Metadata, setting guidelines for new comers becomes very important and urgent. There are too many factors involved, including technology and policy.

Anything has two side edges, and Meta tag could not be exceptional. Internet is open to everybody with limited control. Anyone can contribute any information. You may not get what you expect because not everyone follows the rule.

All the Meta tags used in HTML are optional and depend on user's knowledge of coding and their tendency to do so. We could not force them, or expect everyone to obey W3C's rules. s. Currently there are only several search engines index Meta elements, such as Ultraseek, Microsoft's Index Server. All of them are not very popular and known by most users. Actually it's not hard to index those several Meta tags by search engine. The major problem here is how to prevent spamming from some ugly designed Web pages, which were embedded with inappropriate words, or arrange dozens of same word in the same position to increase their ranking. There is not a good solution yet and most search engines refuse to index Meta elements from Web, or treat Meta elements as normal word, which omits the designed function of Meta tags.

Things are getting more complicated if we expend the scope of Meta elements to include those Metadata schemas such as Dublin Core. The challenges are how to harvest the metadata automatically by search engine, map and store different Metadata schemas to the central repository. As Andrew Wood pointed, "The lack of precision in automatically generated metadata makes the Internet too imprecise for rigorous use, not just by people wishing to find information, but for people publishing information.

If all producers of Web-based resources create metadata according to one rich and wide accepted standard and embed this metadata into the HTML header of their document, the "automatic" creation of metadata will become a reality, just as MARC well accepted by library communities around the world. Because librarians are trained to organize all types of information materials with well-designed systems, those traditional cataloging methods are still the main streams of organizing Internet resources to ensure the accurate and fair of information catalogued. OCLC Cooperative Online Resource Catalog (CORC) is the good sample for organizing, transferring and sharing the catalog records of Internet resources with MARC or DC format. But this type of metadata generation is very labor-intensive, and requires tremendous contributions of time and money. The collection of records could not be free available to library users unless the participating libraries import the records to their local library OPAC.

For the major resources on the Internet, the information creators normally are better candidates at generating metadata than anyone else. Most of them are not information professionals, and lack of knowledge of cataloguing rules. A simple, less elements set, easily to be created and harvested, and for general use Metadata standard is highly recommended on the Internet. Dublin Core may become the best solution for this purpose.

<< Standard Issues �Rights Control >>�

Home� � About Author� � References� � Articles
Contact Us� � Glossary� � Site map

Hosted by www.Geocities.ws