Problem statement : read through  opportunity_embedded_linkdictionary.txt,

                                  opportunity_enterprise_knowledge_management.txt


Opportunity : document 'searching and indexing' is a feature used by various classes or pieces of independent software,

              ranging from Operating-system( the host OS on which the documents reside), Knowledg-managment, Database, web-portals   

              given the context ...from the point of view of re-usability of standardized, generic api (a re-usable shared library)

               for the purpose of implementing  a 'data search' feature on a document-store-house (any of the pieces of software as envisaged above).


              given that indexing of data needs to be done ...on all such document's i.e. in turn data store-houses,

              ability for such software ( i.e. software implementing the indexing of documents for enabling the search feature),

              to leverage a bundled or embedded dictionary (customized compiliation for the purpose, i.e. one that eliminates ordinary

              language elements or words, phrases  )

              or

              the converse there of i.e. a dictionary or file containing  compilition of what can be termed recurring or common language elements 
              that need to be excluded.

 
              An understanding of  find, diff, grep utilities , various content-types (document, content packing)  would make the task easier.
              ================================================================================================================================

 
              a modified version of 'grep' that takes the list of words  or patter from a file, and gives out the output as 

              those lines, viz. words  not contained in the input-file (custom compiled common language elements, words)

              
              ....for implementing efficient search-api (i.e. apart from keywords as defined in the meta-tags of the

              documents, ability to generate itz own keywords for indexing of the documents)


             the key-word-dictionary,reference's to occurrences of these keyword's viz. file,page, ...results of grep dumped into
             a perl-anonymous-hash ...makes a compact storage and retrival of the data.


             a efficient api that power's free-text search  on databases (indexing of binary-data), a judicious partial-indexing
             viz. generation of keywords for meta-tags of the data (files) on the client-side[2 tier architechture where possible] , 
             before the data(files) are archived or stored in repositories of application software such as CRM, KM .
             
           
-----------------------------------------------------------------------------------------------------------------------------------
Note: The above problem statement having been encountered in various scenarios
      and detailed in various 'Proof of concepts' as mentioned in 
      
       http://uk.geocities.com/ravivenkatus/projects.pdf
       http://ravishankarkv.tripod.com/projects.pdf
        ....apply appropriate
      'use-case' modeling, rationalize and arrive at a workable and feasible 
       solution both commercially and techinically viable.