AUTOMATIC SYSTEM FOR DOCUMENTARY
RESEARCH AND LIBRARY MANAGEMENT
Berar
Sanda,
University
of Cluj-Napoca
1.
INTRODUCTION
The problem
of documentary research is one of the main problems that influenced the
development of data base systems. The problem can be stated like this: a person
goes to a documentary research center in order to find all the publications
that satisfy a certain condition (have thew same author, refer to a certain
problem etc.). The list of these publications must be complete, accurate and
generated as fast as possible.
There are
two types of documentary research: punctual and systematic. Punctual
documentary research refers to satisfying the questions of each individual
reader. Systematic documentary research is the activity of periodical
elaboration of catalogues and information publications on different themes.
This paper refers to the problem of punctual documentary research.
In order to
solve this problem, publications must be coded depending on their main field.
The classic codification system is decimal universal codification (DUC).
In this system, each field is regarded as a part of a larger field. The whole
universe is field zero, each theme being a fraction [2]. The code of subdomains
is obtained from the code of the main domain followed by a specific code,
formed generally of three digits. This codification system is widely spread
because it is universally recognized and it doesn't depend on the language of
the document or its origin. The disadvantage of this codification appears when
the subdomains of a certain field proliferate, the assignated codes become thus
insufficient. In order to eliminate these disadvantages, a new coding system
has been defined, the codification through key words. Key words
[2] express the content of a document. The extraction of these words from the
document's text is called indexing and can be done automatically or manualy.
Indexing has appeared as a result of the proliferation of documents from new
scientific fields or situated at the border of several fields, and it
eliminates the disadvantages of DUC. Its major disadvantage is its dependence
of the language in wich the document is written. Documentary centers and
libraries around the world ussualy use both codification systems.
2. GENERAL
PREZENTATION
This paper
presents an automatic library management system that covers, besides
documentary research, the other two fields of library activity: updating of the
publication fund and reader evidence.
This system
allows a double codification of the documents through DUC and key words,
because DUC is used in most romanian libraries and key words are more efficient
in documentary research. The main purpose of the package has been to ensure
both the efficient processing of a great volume of information and a friendly
interface with the user, that usually doesn't have computers knowledge.
The
codification uses a thesaurus of key words, containing two types of
relationship: hierarchy (domain - subdomain or parent-son) and synonimy. The
synonimy is used in order to ensure the flexibility of the communication with
the user, considering the fact that we generally use several terms for the same
notions. Generally, the librarian or the documentarist extract key words from
the document and then search in compendium the coresponding DUC codes. In order
to eliminate this work, the thesaurus also contains the correspondance between
key words (domain names) and the afferent DUC codes. This allows the automatic
generation of the DUC code of the document from key words, the only operation
performed by the librarian being the indexing. The hierarchy and synonimy
relationship generate silence and noise phenomens. Silence is the lack of
response of documents that satisfy the interrogation condition. Noise is
opposite of silence and represents the apparition in response to an
interrogation of documents that don't satisfy the condition. For exemple, an
interrogation with the key word "sinus" determins a response from
documents that use the mathematical notion, but also the anatomical part of the
body with the same name. Thus, the interrogation is ambiguous. This phenomenon
can be eliminated through a strict syntax, but this increases silence, that is
often more dangerous than noise. It can be theoretically proved that in a
system using this type of thesaurus and procesing a lot of information noise
and silence appear in equal proportions, approximately 10% each.
Besides the
automatic generation of the DCU code, the librarian is also offered other
facilities concerning the publications management, such as the automatic
generation of inventory numbers for new documents, the elimination of documents
and printing of library files in different formats.
The system
ensures assistance for the borrowers management, through a friendly interface
for the borrow specification, the obtention of information concerning the
readers and the books they ask, the automatic calculation of the fees for being
late, etc.
A third
module ensures the interface with the readers, providing answers to his
interrogations (solving the documentary research problem). The interrogation
can use several types of questions: theme, author, title, editorial house, year
of publication and combinations of these information.
All three
modules perform interrogations of data bases. It is well known that the problem
of the interrogation is
np-complete. This implies the use of inverse fields in
order to ensure efficiency. The structure of the information memoratted in the
data base has been established after a documentation at the Tehnical University
from Cluj-Napoca Library, so it conforms to the classical files in this
library.
3.
IMPLEMENTATION CONSIDERATIONS
Implementation
was made by using the relational data base management system FoxPro2, system
that provides facilities concerning the interrogation of the data base and the
design of menus and screens for the user interface.
The
information used to differentiate the documents is the pressmark. In this
implementation, this is a code indicating the place of the document in the
library (room, row, position). The information differentiating the readers is
the number of the library license.
The system
has three different modules.
a. The reader
interface module
It allows
the reader to obtain information concerning the existing publications. The
reader completes a screen with the information that he knows about the
document. As result, he obtains all the documents satisfying the information.
These documents can be inspected one by one, the reader heaving the possibility
to stop when he found what he was looking for. This way, he can obtain the
pressmark of the document, wich can be used later for borrowing it, or he can
consult the complete file of item of the library, in order to see the
connection between the different domains and subdomains of library's fund.
b. The
borrow evidence module
This module
includes the management of readers data base, the register of borrows and
restitutions and several statistics. It allows both term and hall borrowing.
Ussualy, the readers are looked up in library through the license number or
through the name. A supplementary facility of this program is the interrogation
through other criteria: town, profession, work place, or combinations of these.
Several readers are selected this way, the right one being picked up
afterwards. After a reader's selection, the librarian can obtain information
about the books he borrowed, the type of the loan, the expiry date etc. Similar
information can be obtained about books: readers borrowing them, type of the
loan. The system also updates daily the fee due for depassing the expiry date
of the loan.
c. The
module for updating the library fund
This module
ensures the manipulation of the data bases containing book information, the
printing of usual library cards and the management of the key words thesaurus.
The thesaurus is created by adding the domains and subdomains covered by the
publications of the library. At the insertion of a new document, the DCU code
is computed from its key words, and an inverse file is created in order to
provide domain search of the documents. Authors are also memorated in an
inverse file . Inventory numbers for new documents are automatically allocated
in increasing order. In order to optimize the memory space for each document,
only the interval of its inventory numbers is retained, through its margins.
The operations of eliminating a document is also provided, for one exemplary,
in case of loss, and for all the exemplaries, in case of donation, inter-library exchange.
The search
process uses both classical methods for indexed data bases and the
"Query" facility of FoxPro. The whole implementation is based on the
work with the screens automatically generated by FoxPro2 [3,4], screens that
trigger most of the actions.
4.
CONCLUSIONS
The system
presented provides a quick and efficient management of the library activities,
flexible and easy to use. Efficiency has been increased, where possible, by reducing
the search time and memory space. The main development involves the
pressmarking system. There are libraries that use both the position in library
and the DUC index of the document in order to compute the pressmark. Another
interesting development possibility is the automatization of the indexing
process, extracting key words from the title or the text of the document
through certatin criteria [1].
REFERENCES
1. Kris K. Abel, Daniel Berry, A System for Generating
Indexes for Ditroff Documents, Software-practice and experience, January 1989
2. The collection of publications for information INID
3. Edward Jones, FoxPro2 Made Easy,
Osbourne/McGraw-Hill, 1991
4. Les Pinter, FoxPro programming,
Windcrest/McGraw-Hill, 1992