AUTOMATIC SYSTEM FOR DOCUMENTARY  


      RESEARCH AND LIBRARY MANAGEMENT

 

                        Berar Sanda,

                 University of Cluj-Napoca

 

 

 

     1. INTRODUCTION

     The problem of documentary research is one of the main problems that influenced the development of data base systems. The problem can be stated like this: a person goes to a documentary research center in order to find all the publications that satisfy a certain condition (have thew same author, refer to a certain problem etc.). The list of these publications must be complete, accurate and generated as fast as possible.

     There are two types of documentary research: punctual and systematic. Punctual documentary research refers to satisfying the questions of each individual reader. Systematic documentary research is the activity of periodical elaboration of catalogues and information publications on different themes. This paper refers to the problem of punctual documentary research.

     In order to solve this problem, publications must be coded depending on their main field. The classic codification system is decimal universal codification (DUC). In this system, each field is regarded as a part of a larger field. The whole universe is field zero, each theme being a fraction [2]. The code of subdomains is obtained from the code of the main domain followed by a specific code, formed generally of three digits. This codification system is widely spread because it is universally recognized and it doesn't depend on the language of the document or its origin. The disadvantage of this codification appears when the subdomains of a certain field proliferate, the assignated codes become thus insufficient. In order to eliminate these disadvantages, a new coding system has been defined, the codification through key words. Key words [2] express the content of a document. The extraction of these words from the document's text is called indexing and can be done automatically or manualy. Indexing has appeared as a result of the proliferation of documents from new scientific fields or situated at the border of several fields, and it eliminates the disadvantages of DUC. Its major disadvantage is its dependence of the language in wich the document is written. Documentary centers and libraries around the world ussualy use both codification systems.

 

     2. GENERAL PREZENTATION

     This paper presents an automatic library management system that covers, besides documentary research, the other two fields of library activity: updating of the publication fund and reader evidence.

     This system allows a double codification of the documents through DUC and key words, because DUC is used in most romanian libraries and key words are more efficient in documentary research. The main purpose of the package has been to ensure both the efficient processing of a great volume of information and a friendly interface with the user, that usually doesn't have computers knowledge.

     The codification uses a thesaurus of key words, containing two types of relationship: hierarchy (domain - subdomain or parent-son) and synonimy. The synonimy is used in order to ensure the flexibility of the communication with the user, considering the fact that we generally use several terms for the same notions. Generally, the librarian or the documentarist extract key words from the document and then search in compendium the coresponding DUC codes. In order to eliminate this work, the thesaurus also contains the correspondance between key words (domain names) and the afferent DUC codes. This allows the automatic generation of the DUC code of the document from key words, the only operation performed by the librarian being the indexing. The hierarchy and synonimy relationship generate silence and noise phenomens. Silence is the lack of response of documents that satisfy the interrogation condition. Noise is opposite of silence and represents the apparition in response to an interrogation of documents that don't satisfy the condition. For exemple, an interrogation with the key word "sinus" determins a response from documents that use the mathematical notion, but also the anatomical part of the body with the same name. Thus, the interrogation is ambiguous. This phenomenon can be eliminated through a strict syntax, but this increases silence, that is often more dangerous than noise. It can be theoretically proved that in a system using this type of thesaurus and procesing a lot of information noise and silence appear in equal proportions, approximately 10% each.

     Besides the automatic generation of the DCU code, the librarian is also offered other facilities concerning the publications management, such as the automatic generation of inventory numbers for new documents, the elimination of documents and printing of library files in different formats.

     The system ensures assistance for the borrowers management, through a friendly interface for the borrow specification, the obtention of information concerning the readers and the books they ask, the automatic calculation of the fees for being late, etc.

     A third module ensures the interface with the readers, providing answers to his interrogations (solving the documentary research problem). The interrogation can use several types of questions: theme, author, title, editorial house, year of publication and combinations of these information.

     All three modules perform interrogations of data bases. It is well known that the problem of the interrogation is

np-complete. This implies the use of inverse fields in order to ensure efficiency. The structure of the information memoratted in the data base has been established after a documentation at the Tehnical University from Cluj-Napoca Library, so it conforms to the classical files in this library.

 

     3. IMPLEMENTATION CONSIDERATIONS

     Implementation was made by using the relational data base management system FoxPro2, system that provides facilities concerning the interrogation of the data base and the design of menus and screens for the user interface.

     The information used to differentiate the documents is the pressmark. In this implementation, this is a code indicating the place of the document in the library (room, row, position). The information differentiating the readers is the number of the library license.

     The system has three different modules.

     a. The reader interface module

     It allows the reader to obtain information concerning the existing publications. The reader completes a screen with the information that he knows about the document. As result, he obtains all the documents satisfying the information. These documents can be inspected one by one, the reader heaving the possibility to stop when he found what he was looking for. This way, he can obtain the pressmark of the document, wich can be used later for borrowing it, or he can consult the complete file of item of the library, in order to see the connection between the different domains and subdomains of library's fund.

     b. The borrow evidence module

     This module includes the management of readers data base, the register of borrows and restitutions and several statistics. It allows both term and hall borrowing. Ussualy, the readers are looked up in library through the license number or through the name. A supplementary facility of this program is the interrogation through other criteria: town, profession, work place, or combinations of these. Several readers are selected this way, the right one being picked up afterwards. After a reader's selection, the librarian can obtain information about the books he borrowed, the type of the loan, the expiry date etc. Similar information can be obtained about books: readers borrowing them, type of the loan. The system also updates daily the fee due for depassing the expiry date of the loan.

     c. The module for updating the library fund

     This module ensures the manipulation of the data bases containing book information, the printing of usual library cards and the management of the key words thesaurus. The thesaurus is created by adding the domains and subdomains covered by the publications of the library. At the insertion of a new document, the DCU code is computed from its key words, and an inverse file is created in order to provide domain search of the documents. Authors are also memorated in an inverse file . Inventory numbers for new documents are automatically allocated in increasing order. In order to optimize the memory space for each document, only the interval of its inventory numbers is retained, through its margins. The operations of eliminating a document is also provided, for one exemplary, in case of loss, and for all the exemplaries, in  case of donation, inter-library exchange.

     The search process uses both classical methods for indexed data bases and the "Query" facility of FoxPro. The whole implementation is based on the work with the screens automatically generated by FoxPro2 [3,4], screens that trigger most of the actions.

 

     4. CONCLUSIONS

     The system presented provides a quick and efficient management of the library activities, flexible and easy to use. Efficiency has been increased, where possible, by reducing the search time and memory space. The main development involves the pressmarking system. There are libraries that use both the position in library and the DUC index of the document in order to compute the pressmark. Another interesting development possibility is the automatization of the indexing process, extracting key words from the title or the text of the document through certatin criteria [1].

 

 

 

 

             REFERENCES

1. Kris K. Abel, Daniel Berry, A System for Generating Indexes for Ditroff Documents, Software-practice and experience, January 1989

2. The collection of publications for information INID

3. Edward Jones, FoxPro2 Made Easy, Osbourne/McGraw-Hill, 1991

4. Les Pinter, FoxPro programming, Windcrest/McGraw-Hill, 1992

 

      

 

 

 

 

 

 

 

 

 

 

 

 

1