|
�
Search
Engine
Internet is the major stage for Metadata
to play. There are more than one billion Web pages on
the Internet and every day more than one million new pages
are created. This exciting number also makes it difficult
for users to retrieve the high quality information. Although
there are also hundreds of Internet search engines acting
inside to help users, everyone has experiences that face
the large number of irrelevant hits. This situation becomes
worse when multi-format, multi-language resources are
created and available there.
World Wide Web Consortium
(W3C) has developed dozens of specifications for the
Web's infrastructure including HTML, RDF and XML. Starting
from HTML 3.2, Meta tags are recommended to be used in
the header of a HTML document, which let authors provide
the information about a document such as author, description
of content, and keywords. The role of Meta tag is providing
a method to describe the site. let search engine index
the contents of Meta tags and ensure accurate and reliable
search result available for users.
Currently a variety of Metadata standards
and Metadata schemas exist that create a huge barrier
for user accessing different systems simultaneously, and
sharing the their resources. At the same time, the number
of players in this field is continuing increasing. To
solve this problem and improve the inter-operability of
Metadata, setting guidelines for new comers becomes very
important and urgent. There are too many factors involved,
including technology and policy.
Anything has two side edges, and Meta tag
could not be exceptional. Internet is open to everybody
with limited control. Anyone can contribute any information.
You may not get what you expect because not everyone follows
the rule.
All the Meta tags used in HTML are optional
and depend on user's knowledge of coding and their tendency
to do so. We could not force them, or expect everyone
to obey W3C's rules. s. Currently there are only several
search engines index Meta elements, such as Ultraseek,
Microsoft's Index Server. All of them are not very popular
and known by most users. Actually it's not hard to index
those several Meta tags by search engine. The major problem
here is how to prevent spamming from some ugly designed
Web pages, which were embedded with inappropriate words,
or arrange dozens of same word in the same position to
increase their ranking. There is not a good solution yet
and most search engines refuse to index Meta elements
from Web, or treat Meta elements as normal word, which
omits the designed function of Meta tags.
Things are getting more complicated if we
expend the scope of Meta elements to include those Metadata
schemas such as Dublin Core. The challenges are how to
harvest the metadata automatically by search engine, map
and store different Metadata schemas to the central repository.
As Andrew Wood pointed,
"The lack of precision in automatically generated
metadata makes the Internet too imprecise for rigorous
use, not just by people wishing to find information, but
for people publishing information.
If all producers of Web-based resources
create metadata according to one rich and wide accepted
standard and embed this metadata into the HTML header
of their document, the "automatic" creation
of metadata will become a reality, just as MARC well accepted
by library communities around the world. Because librarians
are trained to organize all types of information materials
with well-designed systems, those traditional cataloging
methods are still the main streams of organizing Internet
resources to ensure the accurate and fair of information
catalogued. OCLC Cooperative
Online Resource Catalog (CORC) is the good sample
for organizing, transferring and sharing the catalog records
of Internet resources with MARC or DC format. But this
type of metadata generation is very labor-intensive, and
requires tremendous contributions of time and money. The
collection of records could not be free available to library
users unless the participating libraries import the records
to their local library OPAC.
For the major resources on the Internet,
the information creators normally are better candidates
at generating metadata than anyone else. Most of them
are not information professionals, and lack of knowledge
of cataloguing rules. A simple, less elements set, easily
to be created and harvested, and for general use Metadata
standard is highly recommended on the Internet. Dublin
Core may become the best solution for this purpose.
|