PhD Thesis

Dr. Andrew Broad
Computer Science
PhD Project
PhD Thesis

Here is my PhD thesis, available for downloading as a gzipped (use gunzip or WinAce) PostScript file, and as LaTeX source-files.

PostScript files can be viewed using Ghostscript and GSview.


Comparative Code Understanding of Information Models

This thesis concerns comparative code understanding, applied to information models (written in the EXPRESS modelling language) as a particular type of code. Information models are similar to database schemata (they contain entities, attributes, integrity constraints, etc.), but are intended for communicating the semantics of a domain, rather than the efficient physical representation of data.

A method is presented for comparing pairs of EXPRESS information models, with particular emphasis on the constraints specified in those models. This method is embodied in an experimental comparative code-understanding system (CCUS). Four aspects of the CCUS's behaviour constitute contributions to knowledge. Firstly, it infers correspondences between the two models (using name equivalence and semantic equivalence correspondence heuristics). Secondly, it assesses the semantic equivalence and relative semantic strength between the constraints in the models. Thirdly, it combines comparison with the extraction of higher-level knowledge about the constraints, in order to assess semantic equivalence at a higher level of abstraction when it fails to do so at the code level. Fourthly, this extraction is selective in that the CCUS only extracts those higher-level constraints (HLCs) that it needs, rather than extracting all the HLCs that it knows how to extract.

The CCUS is evaluated with respect to a specific application context: reasoning about potential violations of constraints when instances are mapped from one data repository to another, where the two repositories conform to different EXPRESS models. Comparative code understanding could also have applications in such areas as versioning, integration, and transformation.

Although the implemented CCUS is limited to EXPRESS information models, the four contributions listed above should generalise to all kinds of code (including program code in particular). By bringing these ideas together, this work aspires to establish comparative code understanding as a field in its own right.


  • The Thesis (Special Edition) as a PostScript file (gzipped)
  • The Thesis (Special Edition) as LaTeX source-files (zipped - see Makefile)
  • The slides I used in my Viva (gzipped)
  • N.B. The date on the front page is March 2003 in the hard-bound copies of the thesis, but June 2003 in the final PostScript file because that was when I made the final minor corrections, which do not constitute a resubmission but LaTeX generates the date automatically.


    The Special Edition

    The Special Edition of my thesis differs from the version for which I was awarded my PhD in the following ways:

    1. It corrects a number of typographical errors, corrects a technical error in Section 7.2, and adds various bits of text (click here for precise details).
    2. It retains the term "semantically incomparable", which I had to replace at every occurrence in the thesis with the phrase "overlapping or disjoint" in order to satisfy the examiners, who disagreed with my use of the word "incomparable" to mean that neither of two Boolean-valued expressions logically implies the other. This is not a standard dictionary-meaning of "incomparable", but I feel strongly that it is more elegant to have a single word for the concept, and that it is the researcher's prerogative to help language evolve.
    3. I have completed Appendix C (the pseudocode-algorithms).

    It is my firm intention that the Special Edition be considered the definitive version of my thesis.


    Email me
    Hosted by www.Geocities.ws

    1