Here is my PhD thesis, available for downloading as a gzipped (use gunzip or WinAce) PostScript file, and as LaTeX source-files.
PostScript files can be viewed using Ghostscript and GSview.
This thesis concerns comparative code understanding, applied to information models (written in the EXPRESS modelling language) as a particular type of code. Information models are similar to database schemata (they contain entities, attributes, integrity constraints, etc.), but are intended for communicating the semantics of a domain, rather than the efficient physical representation of data.
A method is presented for comparing pairs of EXPRESS information models, with particular emphasis on the constraints specified in those models. This method is embodied in an experimental comparative code-understanding system (CCUS). Four aspects of the CCUS's behaviour constitute contributions to knowledge. Firstly, it infers correspondences between the two models (using name equivalence and semantic equivalence correspondence heuristics). Secondly, it assesses the semantic equivalence and relative semantic strength between the constraints in the models. Thirdly, it combines comparison with the extraction of higher-level knowledge about the constraints, in order to assess semantic equivalence at a higher level of abstraction when it fails to do so at the code level. Fourthly, this extraction is selective in that the CCUS only extracts those higher-level constraints (HLCs) that it needs, rather than extracting all the HLCs that it knows how to extract.
The CCUS is evaluated with respect to a specific application context: reasoning about potential violations of constraints when instances are mapped from one data repository to another, where the two repositories conform to different EXPRESS models. Comparative code understanding could also have applications in such areas as versioning, integration, and transformation.
Although the implemented CCUS is limited to EXPRESS information models, the four contributions listed above should generalise to all kinds of code (including program code in particular). By bringing these ideas together, this work aspires to establish comparative code understanding as a field in its own right.
N.B. The date on the front page is March 2003 in the hard-bound copies of the thesis, but June 2003 in the final PostScript file because that was when I made the final minor corrections, which do not constitute a resubmission but LaTeX generates the date automatically.
The Special Edition of my thesis differs from the version for which I was awarded my PhD in the following ways:
It is my firm intention that the Special Edition be considered the definitive version of my thesis.