CHAPTER 5

Pre-Coordinate Indexing Systems

5.1          Introduction

          Now a days most of the documents deal with complex  and compound subjects, each comprising a number of components or concepts. The coordination of these component terms is either done at the input stage or at the output stage. The index in which the coordination of components (index terms) is done at the input stage, is known as pre-coordinate index.  Coordination of index terms at the input stage means coordination of index terms at the time of preparation of the index by the indexer. In pre-coordinate indexing a number of selected terms or keywords are coordinated by the indexer and the cards are prepared for display to the users.

          Examples: Ranganathan’s Chain Indexing, G. Bhattacharya’s POPSI  and Derek Austin’s PRECIS,  COMPASS,  etc.

          Pre-coordinate indexing systems are conventional systems mostly found in printed indexes. In this type of system, a document is represented in the index by a heading or headings comprising of a chain or string of terms. These terms taken together are expected to define the subject content of the document. The leading term determines the position of the entry in the catalogue or index, while the other (qualifying) terms are subordinated to it. Since the coordination of terms in the index description is decided before any particular  request is made, the index is known as pre-coordinate index. Pre-coordinate indexes are mostly prevalent as printed indexes. For example, the indexes to abstracting  and indexing journals, national bibliographies and subject indexes to library catalogues apply principles of pre-coordinate indexing in varying measures. Such indexes are compiled both manually as well as with the help of a computer.

          Thus, the pre-coordinate index constitutes a collection of index entries in which concepts from documents are co-ordinated according to a plan  using a linear sequence at the time of the index headings are prepared. These concepts are then represented either by symbols (when using a scheme of classification) or words of the indexing language in use. The next step is to synthesize or to put the components in an order recommended by the rules of the language. This means that the concepts are pre-coordinated and the index file consisting of a collection of such pre-coordinated concepts that are available in the library’s collection of documents. These pre-coordinated index when arranged alphabetically are known as alphabetical subject indexes or alphabetical subject catalogues.  When arranged according to a scheme of classification they are known as classified indexes or classified catalogues.

5.2   Chain Indexing

          Chain Indexing  or chain procedure is a mechanical method to derive subject index entries or subject headings from the Class Number of the  document. It was developed by Dr. S.R. Ranganathan. He first mentioned this in his book “Theory of Library Catalogue” in 1938.

        In Chain Procedure the indexer or cataloguer is supposed to start from where the classifier has left. No duplication of work is to be done. He/she has to derive subject headings or class index entries from the digit by digit interpretation of the class number of the document in the reverse direction, to provide alphabetical approach to the subject of the document.

          Ranganathan designed this new method of deriving verbal subject heading in 1934 to provide subject approach to documents through the alphabetical part of a classified catalogue. This method was distinctly different from the enumerated subject heading  systems like LCSH  or  SLSH. He discerned that classification and subject indexing were two sides of the same coin. Classifying a document is the translation of its specific subject into an artificial language of ordinal numbers which results in the formation of a class number linking together all he isolate ideas in the form of  a chain. This chain of class numbers is retranslated into its  verbal equivalent to formulate a subject heading that represents the subject contents of the document. The class number itself is the result of subject analysis of a document into its facet ideas and  linked together by a set of indicator digits, particularly when a classification system like colon Classification is used for the purpose. As this chain is used for deriving subject entries on the basis of a set of rules and procedures, this new system was called ‘Chain Procedure’. This approach inspired in many other models of subject indexing developed afterwards, based upon classificatory principles and postulates.

          Chain Indexing was originally intended for use with Colon Classification. However, it may be applied to any scheme of classification whose notation follows hierarchical pattern.

5.2.1  Step in Chain Indexing

According to Bhattacharya, there are eleven steps involved in Chain Procedure:

1.                Determination of the specific subject of the document.

2.                Expressive name of the subject

3.                Kernel terms

4.                Analysed name of subject

5.                Transformed  name of subject

6.                Standard terms

7.                Determination of links and construction of chain.

8.                Determination of different kinds of links

9.                Derivation of subject headings

10.           Preparation of cross reference entries

11.           Arrangement.

1.    Determination of specific subject of the document

       It is done with the help of the title of the document, its table of contents and by a careful perusal of the text. By analysing the subject contents of a document one arrives at its specific subject.

2. Expressive name of the subject

          Naming the specific subject of the document expressively in the natural language.

3.  Kernel terms

          Representation of the name of the specific subject in Kernel terms (fundamental components). It is done by removing all the auxiliary words from the title.

4. Analyzed name of subject

          Determination of the category of each fundamental component according to a set of postulates and principles formulated for this purpose.

5. Transformed name of subject

          Transforming of the analysed name of subject by rearranging, if necessary, the fundamental components, according to a few additional postulates and principles formulated for the purpose of governing the syntax.

6.     Standard terms

          Standardization of each term, in the transformed name of the subject, in accordance with the standard terms used in the preferred scheme of classification.

7.    Determination of links and construction of chain

          Representation f class number in the form of a chain in which each link consists of two part-the class number and its translation in natural language.

          The class number and its translation is joined by “=” sign, and these signs are joined by downward arrows.

8.    Determination of the different  kinds of links

          Determination of different kinds of links such as Sought Link (SL), False Link (FL), Unsought Link (USL) and Missing Link (ML).

FL : A  link is a false link, if it ends with a connecting symbol or relation device, etc.

USL:  A link in which a user is not likely to approach a document.

ML:   A link in a chain-with-gap, corresponding to the missing isolate in the chain.

SL:    A link in which a user is likely to approach a document.

9.    Derivation of subject heading

          Derivation of the subject heading from each of the sought links in the chain in a reverse rendering process.

10.          Preparation of cross reference entries

          In this step subject reference entry is prepared for specific subject entries.

11.          Arrangement

          In this last step all entries are merged and arranged in a single alphabetical sequence.

Example: The document entitled ‘Macbeth’ by William Shakespeare, having class number O111,2J64,M will generate the following chain.

          O                          = Literature (SL)

          O1                        = Indo European literature (USL)

          O11                      = Teutonic literature (USL)

        O111                             = English literature (SL)

          O111,                  = (FL)

          O111,2                = English drama (SL)

          O111,2J64                   = Shakespeare (SL)

          O111,2J64,         = (FL)

          O111,2J64,M      = Macbeth (SL)

Corresponding to these five sought links, the following subject heading or class index entries will be generated by the above chain:

DRAMA, ENGLISH                                                    O11,2

ENGLISH, LITERATURE                                           O111

LITERATURE                                                             O

MACBETH, SHAKESPEARE (William) (1564)          O111,2J64, M

SHAKESPEARE (William) (1564)                              O111,2J64

5.2.2  Merits of Chain Indexing

1.    This procedure, i.e., chain indexing can be applied with ease to any classification scheme whose notational symbols indicate the subordination of each step of division e.g. CC,DDC, etc.

2.    Chain indexing saves the time of the indexer, as  he makes use of the class number provided by the classifier, thus, avoiding duplication of work, in analysing the document and the formulation of class number.

3.    Chain indexing provides alternative approaches through reverse rendering to its classified file.

4.    As chain procedure is based on the structure of the classification scheme and on the terminology found in the schedules, its operation is speedy and semi-mechanical.

5.    Chain procedure is economical, as it drops each term after it has been indexed, thus, avoiding the permutation of component terms.

6.    In case of chain indexing, only one index heading with complete subject formulation is prepared for a specific document. Other entries are prepared by successive dropping of terms serve successfully larger number of specific subjects. This provides the facility for generic as well as specific searches.

7.    Chain procedure is amenable to computerization. Programmes are being successfully written to generate subject headings both from class numbers and feature headings following the reverse rendering method.

8.    Chain procedure may be used to derive indexes to classification schemes and books. Similarly, it may be used in formulating headings necessary for guide cards on catalogue, stock room guides, shelf guides, etc., in a systematic way.

5.2.3    Demerits of Chain Indexing

1.                It is totally dependent on a scheme of classification, as a result it tends t suffer demerits related to the scheme of classification automatically.

2.                The entries prepared through chain indexing has only one specific entry, others are all broad entries.

3.                In chain indexing, sometimes a step of division may go un-represented, by a further digit of the class number. This creates the problem of missing chain.

4.                Reverse rendering of terms, while preparing the entries is confusing to the user.

5.2.4    Conclusion

          Chain indexing was first used by the Madras University Library in 1936. It has been widely accepted and used by BNB from 1950-1970, LISA is based on Chain Indexing, INB has been practicing chain indexing since 1958.

          DRTC has lately found that chain procedure is fuly amenable to computerization. Programmes are being written to generate subject heading from class numbers following reverse rendering method.

5.3          POPSI (Postulate-Based Permuted Subject Indexing)

          The inherent weakness of chain indexing has been its dependence on a scheme of classification. Another weakness was its disappearing chain. In view of this situation, the information scientists at the Documentation Research and Training Centre (DRTC), Banglore, directed themselves from these limitations. the Postulate Based Permuted Subject Indexing (POPSI) is the results of these efforts. It was developed by Ganesh Bhattacharya.

          POPSI does not depend on the Class Number but is based on Ranganathan’s postulates and principles of general theory of classification.

POPSI  is specifically based on:

(a)              a set of postulated Elementary  Categories (ECs) of the elements fit to form component of subject proposition.

Elementary Categories are:

Discipline (D)      - It covers conventional field of study, e.g. Chemistry, Physics, etc.,

Entity (E)    - e.g. Plant, Lens, Eye, Book, etc.,

Action (A)   - e.g. Treatment, Migration, etc; and

Property (P)         - It includes ideas denoting the concept of ‘attribute’ – qualitative or quantitative. e.g. Power, Capacity, Property, etc.

(b)     a set of rules of syntax with reference to ECs

The Syntax is based on the Ranganathan’s general theory of classification.

(c)               a set of indicator digits or notations to denote the ECs and their subdivisions.

          It is got by POPSI table.

(d)  a vocabulary control device designated as ‘classaurus’.

5.3.1    Format

          If A,B,C,D  are subject headings (using  each of the sought terms) then it will generate the following subject entries.

A

  ABCD

 

B

  ABCD

 

C

  ABCD

 

D

  ABCD

 

          The above format is exactly like KWOC  index, in which the user is required to read the entire chain every time to get the correct context.

5.3.2    Steps in POPSI

          The index entries according to this system are generated in a systematic manner with the help of following steps of operation.

1.                Analysis

2.                Formalisation

3.                Modulation

4.                Standardisation

5.                Preparation of EOC

6.                Decision about TA

7.                Preparation of EAC

8.                Alphabetisation

Let us examine these stages with the help of a sample title, ‘Chemical treatment of tuberculosis of lungs’.

1.                Analysis

          Subject indicative expression, the starting point of index generation, may be the title of a paper, a book or any other document. According to the first stage of operation, the expression is analysed to identify the facets in terms of concepts and modifiers. Analysis of the above mentioned example will lead to the following:

D       -        Medicine

E       -        Lungs

A       -        Chemical Treatment

P       -        Tuberculosis

2.     Formalisation

          In the stage of formalisation the sequence of components derived by analysis has to be decided. It involves the arrangement of component terms according to the principles of sequence of components indicating the status of each component term. Applying this principle, the components are sequenced in the following manner to obtain the basic chain:

Medicine (D), Lungs (E), Tuberculosis (P of E), Chemical treatment (A on P)

3.     Modulation

          Each of the component terms in the analysed and formalised subject headings is added some terms (if necessary) to make their understanding more clear. The above chain after modulation will be:

Medicine (D), Man. Respiratory System. Lungs (E), Disease. Tuberculosis (P of E), Chemical treatment (A on P)

4.  Standardisation

          It is concerned with semantics. It helps in the decision of standard terms for synonyms and the terms for reference generation. It is done vocabulary control. In step 3 and 4, classaurus has been suggested to be used. The above chain after this step will be:

Medicine (D), Man. Respiratory System. Lungs (E), Disease. Tuberculosis (P of E), Chemotherapy (=Chemical treatment) (A on P)

5.                Preparation of the EOC(Entry for Organising Classification)

          It consists of preparing the entry for generating organising classification by inserting appropriate notations from the POPSI table. The above chain after this step will take the following shape.

7Medicine, 6 Man. Respiratory System. Lungs, 6.2 Disease. Tuberculosis, 6.2.1 Chemotherapy (=Chemical treatment)

6.  Decision about TA (terms of approach)

          This step is concerned with the decision regarding terms of approach for generating successive index entries and references.

          In this step ‘Lungs’, ‘Tuberculosis’ and ‘Chemotherapy’ are selected as terms of approach and a cross reference entry is decided to be made for ‘Chemotherapy’.

7. Preparation of EAC (Entries for Associative Classification)

          This step consists of preparation of entries under each approach terms and references. This step will result in the following entries.

Lungs

          7 Medicine, 6 Man. Respiratory  System. Lungs,

          6.2 Disease. Tuberculosis, 6.2.1 Chemotherapy

 

Tuberculosis

          7 Medicine, 6 Man. Respiratory  System. Lungs,

          6.2 Disease. Tuberculosis, 6.2.1 Chemotherapy

 

Chemotherapy

          7 Medicine, 6 Man. Respiratory  System. Lungs,

          6.2 Disease. Tuberculosis, 6.2.1 Chemotherapy

 

Chemical treatment

          See Chemotherapy

 

8.   Alphabetization

          In this step all the index entries including references are arranged in a word by word sequence

(i)     Chemical treatment

                   See Chemotherapy

(ii) Chemotherapy

                   7 Medicine…      

(iv)            Lungs

          7 Medicine        

(iv)  Tuberculosis

                   7 Medicine        

5.3.3   Conclusion

          POPSI is certainly an extension of Chain Indexing,  though they differ from each other. POPSI has successfully solved the problem of disappearing chain which was a major criticism against chain indexing. POPSI made the indexing system free from classification scheme because this system  is based on general theory of classification and is not tagged with any classification scheme.

5.4   PRECIS (Preserved Context Indexing System)

          Preserved  Context Indexing System (PRECIS)  was developed by Derek Austin in 1968 as a result of long research which the Classification Research Group (CRG) undertook to give a new general classification for information control. This system is considered as the most important development in alphabetical approach to subject specification in recent years.

          The system aims at providing an alphabetical subject index which is able to cater to the variant approaches of the users along with their context. In order to achieve this objective, the system arranges the components of a document,  into a significant sequence, thus, all the important components in the string are used as approach points. Simultaneously, the terms are displayed in such a fashion that every term is related to the next term in a context dependent way. Moreover, the system is amenable to computer operation, which further adds to the advantage of the system as the entries will be prepared and arranged automatically by the computer.

5.4.1.  Essential Features of PRECIS

          PRECIS  has the following important features:-

1.                The system derives headings that are co-extensive with the subject at all access points.

2.                It is not bound to any classification scheme .

3.                The terms are context dependent in nature, which enables the users to identify the entries correctly.

4.                The entries are generated automatically by the computer references between semantically related terms.

5.                It also provides adequate arrangement of references between semantically related terms.

6.                It is a flexible system, as it is able to incorporate newly emerging terms accordingly.

7.                It has introduced the PRECIS  table which puts forth a set pattern for the preparation of entries, thus bringing about  consistency in work.

5.4.2   Concept of PRECIS

The concept of PRECIS deals with terms, strings, and role  operators

Term: A term is a verbal representation of a concept. It may consist of one or more words.

String:       An ordered sequence of component terms, excluding articles connectives, prepositions, etc., proceded by role operators is called a string. The string represents the subject of the document.

Role Operators: The Operators are the code  symbols which show the function of the component term and fix its position in the strings. These role operators are meant  for the guidance of the indexers only and do not appear in the index entry.

(a)      Preparation of String

          The main or the most important activity in PRECIS indexing is the formation of the string. The preparation of string constitutes the following points:

(i)                 Context dependence

(ii)              One-to-one relationship

(iii)            Provision of role operators

          The component terms are arranged in such a way that they are context dependant, at the same time they are interrelated to each other.

5.4.3   Format of Entry

          There are three formats of making index entries through PRECIS

1.                Standard format

2.                predicate transformation format

3.                Inverted format

1.  Standard Format

        In order to achieve the goal of context dependency and one-to-one relation, PRECIS has adopted a display format, which constitutes three parts:

(i)                 Lead:  ‘Lead’ position serves as the users’ approach term, by which a user may search the index.

(ii)              Qualifier: It represent the term or set of terms which qualifies the lead term to bring it into its proper context.It provides wider context to the lead term.

(iii)            Display: It is the remaining part of the string which helps to preserve the context.

          All the terms in the string are prepared using the PRECIS table, are then rotated according to a process known as ‘Shunting’. The structure adopted for the process is as follows:

          Lead Term                                                 Qualifier

                                           Display

          The approach term is placed one by one in the lead term section, with the succeeding terms (if any) as qualifier and the preceding terms (if any) in the display section, displaying the context of the terms.

Example: Computerisation of libraries in India

(0)   Indian

(1)  Libraries

(2)  Computerisation

1.                INDIA

                   Libraries. Computerisation

2.         LIBRARIES                    India

                   Computerisation

3.      COMPUTERISATION             Libraries. India

2.  Predicate Transformation Format

          The Predicate Transformation Format is used when the teem representing an agent (3) appears as a lead term profixed by one of the operators 2 or  s or  t.  When such a situation arises, 2 or s or t is shifted to Display position from the Qualifier position.

3.  Inverted Format

          PRECIS makes the use of inverted format when any term is provided the role operators (4), (5) or (6) and these terms appear as Lead terms. When it happens so, the dependant elements  are presented in italics (or underlined if handwritten) after a hyphen and the terms in the Qualifier position are printed in Display position.

5.4.4    Filing Order

          PRECIS follows a two-line format for the display of  its entries, as a result it follows a distinct filing order, within broad alphabetisation. When a number of entries appear under similar lead terms, they are further arranged by the qualifiers as follows.

                    LIBRARIES                    Bangladesh

                             Personnel.  Recruitment

                   LIBRARIES                    India

                             Inter-Library Loans

5.4.5    Conclusion

          PRECIS was first adopted by BNB, later on a number of agencies went to accept the system. Among the other national bibliographies that adopted PRECIS are  Australia, Malayshia and South Africa. Besides these, a number of libraries in Britain are practicing it. A number of pilot projects are also practicing and for creating indexes to statistical, public and other records.

5.5          COMPASS (Computer Aided Subject System)

          PRECIS  was intended to be a complete subject statement in a form suitable for a printed bibliography, and this was not necessarily the best format for  online searching. Its complex system of coding and role operators served to produce the  output strings for printing which appear to be  unnecessary  in an online system. It did not appear to make any difference whether a concept is coded with the role operator (1) or (2). Place name was treated in several ways with the role operators (O), (1), (5) and occasionally (3) as part of the subject string. The use of role operators in such a manner was not of much help for online searching. In 1990, it was decided to revise UKMARK and to replace  PRECIS  by a more simplified system of subject indexing in order to reduce the  unit cost of cataloguing of the British Library. As a result Computer Aided Subject System (COMPASS) was introduced  for BNB in 1991 and PRECIS  was dropped.

          COMPASS is a simplified restructuring of PRECIS. The index string is organised by the PRECIS principles of context dependency and role operators. In order to minimize the complexity of PRECIS role operators, primary role operators (O), (4), (5) and (6) are not used. Dates as difference (coded with $d) are not used in all cases like PRECIS. The indexer who writes the COMPASS input string also assigns the appropriate DDC number in the field 082 of the worksheet meant for BNB. The initial step of subject analysis is done only once while preparing the  COMPASS  input string for a document and this input string is taken  as the basis for all latter decisions relating document, and their incorporation in the relevant fields of the worksheet.

          DDC number is also used as a source of feature heading. Prior to the introduction of COMPASS, the  PRECIS  strings were used to generate the DDC numbers and also the  feature headings for the BNB classified sequence. The methods associated with the generation of COMPASS  index entries are same as that of PRECIS  index entries. The index entry drawn according to  COMPASS  appears in italics at  the end of the entry for bibliographic record of a document in the classified/main part of the BNB. DDC numbers are now directly linked to the bibliographic records rather than  through the subject strings. The subject index of  BNB refers to a class number in the following manner:

Library Operations

          Classification compared with indexing 025

          In the classified part of BNB a number of entries or bibliographic records have been arranged under the class number 025. The above mentioned subject under directs the user to  scan the entries under the class number 025 in the classified/main  part of BNB in order to find out the one which has at the end the subject heading “classification compared with indexing”.

5.5.1   Merits and Deficiencies

          With the introduction of COMPASS,  the printed subject index of BNB appears to be much more shorter than the earlier one codes and role opeators used in  COMPASS are very  simple in comparison to PRECIS. COMPASS is used not only for the generation of printed indexes for  BNB, it is also  amenable for online searching.

          For generating feature headings in the BNB classified sequence, up to five levels of headings from the DDC numbers are given. The aforesaid system of producing feature heading has been reported to be unsatisfactory from the users’ point of view. Feature headings constructed from the terms in PRECIS string prior to the introduction of COMPASS was appeared to be more user-friendly.

          Any system needs time for its testing and development with the introduction of a COMPASS, BNB stopped including LCSH headings until protests from the users finally led to their reintroduction in 1995. With  the substitution of LCSH for COMPASS  in 1995 the classified arrangement has no index at all. As a result, BNB no longer shows any direct translation of the notations. The further development in the application of the British Library subject system in online searching might be possible once the necessary preconditions in the field of data and retrieval technology are created.

 

1