CS710 Summary (again)

ANDREW BROAD'S CS710 SUMMARY (AGAIN)

Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 9

Week 1 (7th February 2001)

CS710 is for MPhil students and PhD students who are writing up their theses this year. I am taking this course for the second time, as a PhD student, having taken it as an MPhil student in 1998. I am taking it again because I need all the help I can get with my PhD, I feel that doing CS710 will motivate me to write my thesis, and I think it is a very experimental, evolutionary course: I expect to learn much more by taking it twice - closer to 2.0 than to 1.0 times as much as from taking it once, I hope! :-)

I am keeping my 1998 summary of CS710 online, because I'm sure that the union of CS710 1998 and CS710 2001 will be considerably greater than either on their own, so I suggest that you read the 1998 summary too.

CS710 is a very interactive course, in which we each have to present the work we've done on our thesis so that we can get feedback on it. Don't think of CS710 as a course to be passed (even though it does carry credits) - the aim of CS710 is to help us with our theses rather than to be a hurdle in itself.

CS710 aims to teach us what we want out of the course, rather than just teaching us what David Br�e thinks we need to learn, so he is very receptive to suggestions.

The course has two major parts:

Teaching sessions
Thesis presentations

The Teaching Sessions

The first six weeks or so (four, according to the plan below) of CS710 are for discussing various aspects of thesis-writing. They are driven by David Br�e, but are more interactive than most taught courses, given that CS710 is about us trying to take what we want out of it.

The plan for the teaching sessions of this year's CS710 are as follows:

The criteria by which examiners judge a thesis. (Today's seminar)
Issues at the tactical level (word usage, word order, joining sentences, etc.).
Issues at the strategic level (composition: putting things together).

What should be in an abstract?
The table of contents
The first chapter

Someone who did an outstanding thesis will come in to give a talk. Nobody from Manchester has ever won a Distinguished Dissertation award, and so, I understand that the talk will be given by Jacques Fleuriot.

Impressing the examiner/reader: The first things (and in many cases, the only things ;-)) that a reader tends to look at are the abstract, the table of contents, and the first and last chapters (i.e. the introduction and the conclusion), and also - and this surprised me - the references (they may prejudge you according to whether you cite reputable people).

Who are you writing for? You need to consider your intended audience as you write your thesis. It may be someone like you, starting in the field. So be explicit (don't assume they are already an expert in your field).

The Thesis Presentations

After the teaching sessions come the thesis presentation sessions. I've now moved this section to a page of its own.

What do we want out of CS710?

We discussed this in the seminar today, and the following suggestions were made as to what issues should be covered in CS710:

How to impress the examiner. This will be covered in the section about abstracts (in Seminar 3).
How to reduce the time spent on writing. The rule of thumb is that writing a thesis takes a page a day, on average - this includes proof-reading and editing. How to write quicker? Do not underestimate the time it will take, especially for proof-reading and rewriting! A PhD thesis usually takes six to eight months to write; my MPhil thesis, which was exceptionally long, took five months.
How many pages?
- As many as it takes?
- Hilary Kahn expects no less than 200 pages of a PhD thesis (her area is CAD and information modelling, which entails a lot of standards, which need a lot of description).
- Cliff Jones used to expect no more than 100 pages (his area was theoretical, and theoretical theses should be neat and relatively short).
- Most theses are 100-200 pages long.
- Most theses are too long! The longer a thesis is, the more difficult it is to keep hold of the complete picture and how everything fits into it. Like a piece of music, you should know instantly where each bit fits into the whole.
What about transfer reports? A transfer report is equivalent to an MPhil thesis, but it is not examined externally, and of course you don't get a degree for it. So transfer reports are generally not up to the standard of an MPhil. They are intended only for students who are well on their way to a PhD, to minimise the distraction of having to write an MPhil thesis. However, my supervisor and I are strongly opposed to transfer reports, because at least by doing an MPhil you get a degree out of your research, rather than leaving with nothing if the PhD doesn't work out or you decide (or are advised) to leave after the first year because independent research (research done on your own, which is the nature of a PhD) isn't your scene. Anyone with a first or upper-second class BSc should be able to get an MPhil, but it is easier for a camel to pass through the eye of a needle than to get a PhD.

What I want out of CS710 this year is much more focus on the strategic level than in 1998, when there was too much emphasis on the tactical level. It degenerated into a high-school English class at times! ;-) I want CS710 to be about how to structure the thesis in terms of the contributions (and how to identify what the contributions are so that you can do this effectively), how to write it up coherently in a contribution-oriented (as opposed to implementation-oriented) style, and so on.

I also want a week on evaluation, which is a very important and very difficult aspect of doing a successful PhD.

Advice on the PhD viva (oral exam where you defend your thesis - there usually isn't one for an MPhil, although the examiners could call you up - most unlikely) should be included somewhere, too - templates of typical nasty viva questions and how to tackle them.

Examiners' Criteria for an MPhil

David Br�e gave us the examiners' criteria for an MPhil on an OHP slide. He said that the form was from an MPhil thesis which he had examined, and that the candidate was actually sitting in the room. I wondered who it was, and got quite a shock when I looked up and saw ANDREW P. BROAD on the slide! (David Br�e was the internal examiner for my MPhil - for the record, I passed without needing to resubmit, or even having to make minor corrections, which is unusual! ;-))

The examiners' criteria for an MPhil thesis are:

Does the thesis give satisfactory evidence of experience of methods of research such as can normally be gained by a student in one year's work following initial graduation?
Is there evidence that the candidate has carried out the experimental and/or theoretical work in a satisfactory manner?
Is there satisfactory discussion of the purpose of the investigation, its significance, and of any relevant previous work?
Is the thesis clearly written and presented in a satisfactory form, with the necessary diagrams, formulae, bibliography, etc.?
It's important to tell the story well, bringing the strands together, like a detective novel!

Literature Review

(This pertains to MPhil Examiners' Criterion 3.)

There are two extreme ways to do a literature review:

Start with a clean slate.
Make it oriented to your problem, only using the literature to underpin your opinion.

The literature is there to help you, but you are there to help the literature!

What is the significance of your research in the general field? How does it link in with everything else?

Examiners' Criteria for a PhD

Examiners are asked to provide a full report on the thesis, including:

a statement of how successful the candidate has been in achieving his or her aims and objectives;
It's very important to state your aims and objectives clearly in the thesis, otherwise you will be judged by the wrong criteria! (For example, you don't want to be judged by `engineering' criteria if yours is a theoretical thesis). In particular, it's important to state clearly what the contributions of the thesis are.
a statement that the results of the research, as reported in the thesis, show evidence of originality and independent critical judgement;
Originality: They can't be someone else's results! You can't just reproduce someone else's experiments! You can for an MPhil, although this is rare - reproducing existing experiments is much more of a problem for a subject like physics than for computer science.
Independent critical judgement: You can interpret your results, and link their significance to the general field. It's even possible for a PhD to have a negative result: if your approach fails, then this serves as a warning post to other researchers - "abandon hope all ye who enter here".
a statement that the research which is reported in the thesis constitutes an addition to knowledge.
"Addition to knowledge" is a vague term. For a PhD, it means a useful, significant contribution, not just some minor addition to knowledge. It obviously implies that someone could derive something from reading your thesis that they couldn't get from reading all the other literature in your field. It must be very difficult for the examiners to judge this, of course (it might help if you published a paper, and someone cited you in their paper, before you submit your thesis!).

Next Week: Tactical-Level Issues

By next week's seminar, we are to read two chapters of William Strunk's The Elements of Style, which are available online:

Write down any questions that are not answered in Strunk, so that we can discuss them in the seminar.

Week 2 (14th February 2001)

This week's seminar covered tactical considerations (at the sentence level and below). I must admit that I didn't find this session terribly useful (English is my first language, most of the advice in this seminar wasn't specific to English anyway, the tactical level is bread-and-butter to me, and it's the strategic level that I really need help with).

"I'd love not to be able to give this session!" (David Br�e)

Its/It's

These two are commonly confused:

`Its' is a possessive pronoun, e.g. `its best result'. Writing `it's best result' is as stupid as writing `hi's best result'.
`It's' is an abbreviation of `it is'.

Parentheses and Punctuation

If the sentence ends in parentheses, put the full stop after the right parenthesis.

`John likes fast algorithms (provided they are easy to write).'

If an entire sentence is put in parentheses (i.e. the left parenthesis follows a full stop), then the full stop goes before the right parenthesis.

`John likes fast algorithms. (He also likes easy-to-write algorithms.)'

Put Book and Journal Titles in Italics

`Have you read Which?'

However, if the title were enclosed in quotation marks, it would require two question marks:

`Have you read "Which?"?

If you don't know something like this, look it up in:

Kate L. Turabian. A manual for writers of term papers, theses and dissertations. 5th edition. University of Chicago Press, 1987.
Judith Butcher. Copy editing: the Cambridge handbook for editors, authors and publishers. 3rd edition. Cambridge: Cambridge University Press, 1992.

Use a Style-Sheet

A style-sheet contains all the decisions you have made when there are choices (e.g. whether to put titles in italics). If you use LaTeX, it's easy to incorporate a style-sheet - put the decisions in your header file so that they're up front.

Abbreviations

Modern editors use `eg', but `e.g.' is better. Some journals will accept the latter but not the former, and some will accept neither - they insist on `for example' instead.

In LaTeX, it's easy to do replacements:
\newcommand{\eg}{e.g.}

Another abbreviation is `cf', which means `compare' (Latin); more modern editors use `cp'.

Whether to Hyphenate or Concatenate Words

`brain wave'
`brainwave'
`brain-wave'

There's no hard-and-fast rule - you have to make a decision and stick to it consistently.

For example, `co-occurrence' looks better than `coocurrence' or `co occurrence', does not it?

`Occurrence' is a commonly-misspelled word, by the way.

Commas

`birds, bees and fleas'

`birds, bees, and fleas'

In UK English, the second comma is optional; in US English it is mandatory.

The golden rule is clarity. (Clarity takes precedence over consistency.) Remember that the function of a comma is to group related words together, and each comma goes either `in' or `out':

`birds, bees and wasps, and other flying insects'

Lists

If the items in the list are short (one clause), then the list may be treated as a sentence, separated by semicolons and terminated by a full stop.

`The algorithm is as follows:
1. set i = 0;
2. while i < n do ...;
3. print j.'

The alternative, especially if each list item contains more than one sentence, is to treat each list item as a paragraph of sentences, ending in full stops.

`The algorithm is as follows.
1. Set i = 0. Then ... .
2. While i < n do ... .
3. Print j.'

Strictly speaking, it should be `as follows.' in the second case, but `as follows:' looks better IMO.

For guidance on such issues in mathematical and scientific writing, see:

Nicholas J. Higham. Handbook of writing for the mathematical sciences. Philadelphia; SIAM, 1993.
Chapter 13 of Butcher.

Brackets

The preferred order is ||<{[(...)]}>|| - from inner to outer: parentheses, square brackets, braces, angled brackets, and fancy double-line brackets that I don't know how to get in HTML ;-)

Hyphens and Dashes

The one in `pre-cache' is a hyphen.

The one in `pp. 2-5' is an em-dash, which is slightly longer.

The one in `Mr. T-' is an en-dash.

Apostrophes

`John's book'

The issue of whether to use 's or just ' at the end of a name ending in s is resolved by how it is pronounced:

`Charles's book'
`James's book'

`Moses' Law'
`In Jesus' Name'

Chapter/Section Headings

`1 Introduction

Do you know that NP = P?

Well, neither do I. That's because...'

The first paragraph of a section is never indented. Second and subsequent paragraphs may either:

be indented, with no blank lines between paragraphs;
have a blank line before each paragraph, with no indentation.

I prefer the latter, perhaps using the former for nested paragraphs (I wouldn't recommend using nested paragraphs because they are just a construct that I have invented! ;-)).

Splitting Infinitives

As a general, it is a heinous practice to ever split an infinitive. ;-) In languages other than English, it is morphologically impossible to split infinitives because each infinitive is a single word.

However, there are occasions on which it is appropriate to break this rule. For example, rewriting the Star Trek catch-phrase `to boldly go where no man has gone before' to avoid the split infinitive would rob it of intensity.

For a discussion of rules that can be broken sometimes, see Chapter 11 of:

Steven Pinker: The Language Instinct. Penguin, 1994.

I read this book in the summer of 1997, BTW, and thoroughly enjoyed it. I have no hesitation in recommending it, especially if you are interested in linguistics.

Sentences Shouldn't End with a Preposition

This is another example of a rule which should be broken on occasions, especially in spoken rather than written English (which is more conservative): `Which town did you go to?' `To which town did you go?'

In General

Be consistent, unless it affects clarity!
Make decisions and stick to them!
If you're not sure, look it up in one of the books cited above!

Next Week

We will look at how to compose sentences, how to order words in a sentence, how to build paragraphs out of sentences, and how to build sections out of paragraphs. So we're moving towards the strategic level, although I consider these issues to be at the tactical level, still. In my book, the strategic level is about how you plan the thesis structure in terms of the contributions, the points which add up to those contributions, and the dependencies (pre+postconditions) between those points, which determine the order.

CS710 is thus taking a bottom-up approach to teaching thesis-writing, whereas I am taking a top-down approach to writing my thesis, planning it at the strategic level before writing the actual text of the thesis. Some people prefer to just dive into writing the text, with perhaps only a list of the chapters and sections of the thesis, but the pitfall with this is that it tends to lead to a badly-structured, unfocused thesis with the points in the wrong order (because the dependencies haven't been planned out), which may have to be radically rewritten.

Week 3 (21st February 2001)

This seminar covered what I would regard as the upper tactical level - a level up from last week, but below the strategic level. This seminar concerns sentences and paragraphs.

The sentence level is especially difficult for non-native speakers of English, when they try to translate thoughts from their mother tongue to English. For example, the German expression "seit f�nf Jahren" translates literally to "since five years", but we say "for five years" (it's the same for Dutch).

Useful books for sentence-level and paragraph-level considerations are:

Justin Zobel. Writing for Computer Science. Singapore: Springer, 1997.
Linda Flower. Problem-solving strategies for writing. 4th edition. Fort Worth: Harcourt Brace Jovanovich, 1993. (This used to be the set book for CS710 - I read it in 1998, but I don't think it affected my writing in any way! ;-))

The Sentence Level

There are no hard-and-fast rules for constructing sentences - only guidelines/heuristics/rules of thumb. (Actually, there are some hard-and-fast rules - for example, a sentence must have a verb - but these are at the grammatical level rather than the upper tactical level.)

In general, you should do what your supervisor tells you, as it's much harder for a supervisor to support someone who rejects their advice. Bear in mind, however, that times have moved on since your supervisor was a student!

Rules of thumb for constructing sentences:

Keep it simple! The longer a sentence is, the harder it is to keep control of - I must admit I have a habit of writing very long sentences; in fact, I can go on and on, not giving the reader a chance to pause for breath (although I suppose they can pause for breath because they're reading rather than speaking) - especially for non-native English speakers. The trouble with long sentences is that they tend to make the reader wait for a long time, and have to put a lot on their mental stack - the previous sentence is a prime example of this. If you write a long sentence. Try breaking it into two sentences, and see what happens (it won't always work! ;-)). I think that the pitfall with writing short sentences. Is that you lose the connections that intra-sentence punctuation provides.
The active voice (e.g. "The code-understanding system extracts higher-level knowledge") is preferred to the passive voice ("Higher-level knowledge is extracted (by the code-understanding system)"). So the passive voice either loses impact by being too wordy, or it avoids assigning responsibility (e.g. "Mistakes were made"). See Strunk for more on the issue of active v passive voice.
I versus We. I never use I in serious academic writing, because for some reason I don't understand, a lot of people consider it `not kosher' (and some journals won't accept it). I try to avoid we for the same reason, but we seems to be more acceptable than I. Anyway, we in academic writing means you and the reader, and is used for actions which can include the reader (e.g. "We can now conclude..."). Whereas I is for when you personally did something (e.g. "I ran these five experiments"). I would use "the Author" instead. Of course, avoiding the use of I and we often tends to force you to use the passive voice (e.g. "It can be concluded...").
The Given before the New (Given>New). Start a sentence with information related to what the reader knows already, then give new information. This helps to build up your argument in the mind of your reader. For example, in an article entitled "Buying a Turkey", Flower writes, "When buying a turkey, you get more value for money if you buy a whole bird than if you buy a boned, rolled bird. If you buy a whole bird, ...". The given>new rule is not something that comes naturally because you already know both the given and the new. So you should put yourself in the mind of your reader (this relates to the need to write explicit preconditions when planning your thesis at the strategic level).

A further issue is where to place inserted phrases:

At the beginning: "After he left, we went for a walk"
At the end: "We went for a walk after he left"
In the middle: "We, after he left, went for a walk" (unorthodox, but can be effective)

The Paragraph Level

One idea per paragraph is the basic guideline. I personally don't like very long paragraphs - if I find I have written a long paragraph, I look for a way to split it. But don't use anaphora ("he", "she", "it", "this", "they", etc.) when referring to something in another paragraph, because a paragraph should be a self-contained chunk that the reader can break off from, and understand the next paragraph based on their understanding of the previous paragraphs rather than the syntactic details of those paragraphs.

There are three types of paragraph that readers are used to, expect to see, and find easier to understand:

Topic-Refine-Illustrate (TRI): Introduce a topic (in the first sentence of the paragraph), refine that topic, then illustrate it. Optionally, such a paragraph may end with a Conclusion.
Statement-Consequences: Make a statement, then list the consequences.
List. e.g. By revealing what is on his mind, Soros has placed himself in triple jeopardy. His profits depended on him "staying ahead of the curve" (i.e., the crowd). Now, he admits, "the glory days are over: Too many people have read my books, and I lost my edge." The Quantum Fund has been turned into a "more conservative vehicle." His economic theorizing exposed him to academic attack. His policies for world improvement were dismissed as naive or utopian by critics who saw themselves as hardheaded realists. [The World on a String, Robert Skidelsky - on today's handout]
I find this paragraph very difficult to read because it's hard to identify the three points. It's much clearer to write sentences beginning "Firstly, ...", "Secondly, ..." and "Finally, ...". By far the best option is to rewrite list paragraphs using actual bullet points, which makes it much more readable.

Criteria for a Distinguished Dissertation

[from http://www.cphc.org.uk/dissertations.html]

To be considered, a dissertation should:

make a noteworthy contribution to the subject,
reach a high standard of exposition,
place its results clearly in the context of computer science as a whole, and
enable a computer scientist with significantly different interests to grasp its essentials.

The Abstract of a PhD Thesis

The abstract of Georgios Paliouras's PhD thesis, Refinement of Temporal Constraints in an Event Recognition System using Small Data Sets (1997, supervised by David Br�e), was given as a handout today.

We were asked to read this abstract in class before discussing it with respect to the sentence and paragraph levels. The main thing I noticed as I did so was that the paragraphs seemed to be of type Topic-Refinement (there was nothing I could see that appeared to be Illustration).

We analysed the first paragraph in class:

> The central aim of this thesis is to develop novel approaches to the representation and the
> refinement of event recognition models.

Who are you writing for? In general, you are writing for a scientific audience. Your thesis should be understandable by a computer scientist who is not a specialist in your area.

Note that this thesis is concerned with recognising events, not with classifying events (cf face recognition/classification).

In this context, refinement means learning the parameters of the model, but a general computer scientist wouldn't know this!

He writes "novel approaches", as in two of them! In fact, one subsumes the other, but this is not stated here.

There's no mention of evaluation, or of small data sets (which is a key feature of this work - it's in the thesis title).

> The event recognition system is viewed as a temporal expert system, which searches for
> interesting patterns in a stream of temporally indexed data.

This sentence uses the passive voice.

Given>New: "event recognition system" is given in the title.

"viewed as a temporal expert system": it isn't a temporal expert system, so why not just drop this phrase?

The word "interesting" is redundant.

This sentence should be the first sentence, and so the existing first sentence - the claim - should become the second sentence.

i.e. The event recognition system searches for patterns in a stream of temporally indexed data. A novel approach to the representation and refinement of event recognition models is presented.

> The format of the input stream is unusual in comparison to standard work on event recognition,
> such as speech and sound recognition. It consists of time-stamped events, rather than a set of
> signal properties measured at fixed time intervals. This format has only recently been studied
> in the area of temporal event recognition.

Given>new: it should say what the standard usually is (analogue), and then what the new format is.

The World on a String

The second side of today's handout was a review by Robert Skidelsky of a book called Open Society: Reforming Global Capitalism by George Soros. We studied the first paragraph as an example which makes a big impact, reads wonderfully, is exciting and catchy, is written by an academic, and makes good use of the given>new rule:

> George Soros is the best-known financial speculator of our time, godfather of hedge funds,
> those fast-moving and largely unregulated raiders in the corporate jungle that make their killings
> from fluctuations in the prices of stocks, commodities, currencies.

Apart from the fact that I don't like the subject matter, I think this sentence is too long. It also needs to have the word "and" before "currencies".

The good points about it are:

it doesn't assume the reader knows who George Soros is;
it uses `exciting' words such as "godfather", "raiders", "jungle" and "killings";
The use of "those" rather than "which are" is unorthodox and extremely effective.

> When he writes books readers might reasonably expect tips on how to make money.
> They will be disappointed.

This is the killer sentence. There's a lot to be said for making the key sentence short and strong like that.

> Soros's ambitions are altogether more exalted. Having become a billionaire, he has set himself
> up as the philosopher and statesman of global capitalism, tirelessly telling the world that it now
> needs to remove the ladder by which he himself climbed to fame and fortune. On January 2 he
> was reported predicting a "hard and bouncy" landing for the US economy.

How do we start writing like that? (Well, I wouldn't want to write like that, but P.G. Wodehouse springs to mind as an author whose writing style I admire.) We should read good literature (novels, as well as (of course!) scientific literature), so that the style rubs off. "If you read better, you'll write better" [David Br�e]

Next Week: Strategic Considerations (at last!)

How to construct your whole thesis. David Br�e has just uploaded some strategic-level hints by Aaron Sloman, which are very AI-like. We should read them by next week's seminar.

Week 4 (28th February 2001)

The mailing list for this course has at last been set up: you can send email to [email protected], which should reach everyone in the class. Let's get some added value out of this course by discussing it by email between seminars! :-) :-)

Those who are writing PhD as opposed to MPhil theses, note that the CS710 thesis presentations begin in two weeks (i.e. Week 5 will be the last teaching session). All of the PhD students will be expected to present before the MPhil students (in both senses of the word `before'! ;-)).

Ian Horrocks's PhD Thesis

We went over the abstract and table of contents of this thesis in today's seminar, thereby covering his thesis at a level of detail commensurate with one hour.

Ian Horrocks's PhD thesis:

won the departmental Best Thesis prize in 1998;
was nominated for a CPHC Distinguished Dissertation, getting as far as the shortlist
combines theory and practice (he implemented a system, which entails engineering as well as theoretical considerations);
is only 170 pages long (which is good!).

I've had dealings with Ian Horrocks before, as he gave guest lectures in CS2422 and CS3411 in 1996, when I was a BSc student and he was a PhD student here - he's now a lecturer in this department!

I also selected his PhD thesis as an example when I was trying to derive my thesis structure: I took three or four PhD theses, read their first and last chapters to analyse what their contributions were, and analysed how their contributions were reflected in the structure of their thesis (as given by the table of contents). I concluded from this exercise that most successful PhD theses tend to have an introduction chapter, one or two background chapters, a chapter for each contribution, and finally a conclusion chapter. None of them adopted an implementation-oriented thesis structure (my first attempt at my PhD thesis plan was organised in terms of the implemented systems; I had to radically rewrite it to make it contribution-oriented).

Ian Horrocks's Thesis Title

Optimising Tableaux Decision Procedures for Description Logics

This sets up an expectation that the abstract will explain what tableaux decision procedures and description logics are, for the benefit of non-formal-methods computer scientists.

Ian Horrocks's Abstract

See the BCTCS page (theoretical computer science) for a great example of a readable abstract.

A Distinguished Dissertation must be readable outside its own area (an ordinary thesis need only be readable within its own area, except presumably for my thesis, as comparative code understanding isn't an established area! ;-)).

Anyway, let's analyse Ian Horrocks's abstract, paragraph by paragraph (bold indicates the bits that David Br�e highlighted in class, and the subheadings are his):

Explanation of area

> {Paragraph 1} Description Logics form a family of formalisms closely related to semantic
> networks but with the distinguishing characteristic that the semantics of the concept description
> language is formally defined, so that the subsumption relationship between two concept
> descriptions can be computed by a suitable algorithm.

This says what DLs are, assuming that the reader knows about semantic networks. It would have been nice if it had also said what subsumption is (it's the DL community's term for the a-kind-of, or superclass/subclass, relationship, e.g. person subsumes lecturer).

Problem

> Description logics have proved useful in a range of applications but their wider acceptance has
> been hindered by their limited expressiveness and the intractability of their subsumption
> algorithms.

This tells you what the problem is that he's going to tackle.

Claim + Method used

> {Paragraph 2} This thesis investigates the practicability of providing sound, complete and
> empirically tractable subsumption reasoning for a Description Logic with an expressive
> concept description language. It suggests that, while subsumption reasoning in such languages is
> known to be intractable in the worst case, a suitably optimised algorithm can provide acceptable
> performance with a realistic knowledge base. This claim is supported by the implementation and
> testing of the FaCT system

This paragraph claims to have solved the problem specified in the first paragraph, and then supports that claim by saying what the method of solution is.

Method + Proof

> {Paragraph 3} FaCT is a Description Logic classifier for an expressive concept description
> language which includes support for both transitive roles and a role hierarchy.

This defines what FaCT is.

> A tableaux calculus style algorithm for subsumption reasoning in this language is presented
> along with a proof of its soundness and completeness. The wide range of novel and adapted
> optimisation techniques employed by FaCT is also described and their effectiveness is evaluated
> by extensive empirical testing using both a large realistic knowledge base (from the GALEN
> project) and randomly generated satisfiability problems.

Now we're into the proof of the claim (evaluation).

> These tests demonstrate that the optimisation techniques improve FaCT's performance by
> at least three orders of magnitude, and that as a result, FaCT provides acceptable performance
> when used with the GALEN knowledge base.

This is the nitty-gritty part of the claim. The paragraph tells you how the claim is substantiated.

Usefulness (the hard sell)

> {Paragraph 4} The work presented in this thesis should be of value to both users of Description
> Logics, to whom the FaCT system has been made available, and to implementors of Description
> Logic systems, who will be able to incorporate some or all of the optimisation techniques in their
> algorithms. The optimisation techniques may also be of interest to a wider audience of Automated
> Deduction and Artificial Intelligence researchers.

This paragraph says for whom this work is useful (very important for a PhD), including the wider audience. The paragraph also contains a hidden claim, namely that the method will be good for other DLs.

An idea of my own that I should mention here is that there are clearly two types of evaluation that you need to do:

Intrinsic evaluation: Does your system work? How do you measure its performance?
Extrinsic evaluation: Who are your (potential) users? To whom would this work be useful?

This is an excellent abstract. Abstracts don't have to be like this, but it's a very good framework to follow!

Notice how, just as the thesis title set up expectations of the abstract, the abstract sets up expectations for the structure of the thesis. It tells you what chapters you can expect to find in the thesis, namely:

a theory chapter (for the soundness and completeness proofs);
description of the FaCT system;
experimental results.

In fact, a good abstract tells you almost everything that's going to be in the thesis!

I think this was a very useful strategic-level exercise, but it lacked emphasis on the contributions of the thesis. Ian Horrocks hasn't explicitly stated that the contributions are x, y and z - you've got to dig them out of the abstract and the table of contents (I also read Chapter 1 and the last one in order to identify the contributions). Be very clear about listing your contributions, because that's all the examiner cares about at the end of the day! It makes the examiner's job harder, and makes them more likely to bounce the thesis, if they have to dig deep for the contributions.

Ian Horrocks's Table of Contents

First, we looked at just the chapter titles:

Introduction
Formal Foundations of Description Logics
Tableaux Algorithms
The ALCHR+ Description Logic
Optimising Tableaux Algorithms
The FaCT Classifier
Test Methodology and Data
Discussion

The table of contents should make clear the distinction between background material and your original work. In this case, the boundary is between Chapters 3 and 4.

Never call a chapter "Literature Review" - it's very boring! ;-)

It's naughty to use acronyms in the table of contents, but at least he says that ALCHR+ (sorry, no fancy fonts! ;-)) is a description logic! (Chapter 4 title).

You don't get a thesis through by making it longer. You get a thesis through by making it clear.

Nowadays, you can deposit stuff on the Web rather than putting it in appendices (Ian Horrocks's thesis has one appendix, "FaCT Reference Manual"). For my MPhil thesis, for example, I put my program code and other gubbins that wouldn't have physically fitted in my thesis, but which I wanted to include for completeness, on a website which I referred to in the thesis. I'm not sure if I can do that for my PhD if I'll be leaving, though.

We then looked at each Chapter in detail (only in the table of contents, not the chapters themselves!).

Chapter 1: Introduction

1.1 Description Logics
1.2 DL Applications
- 1.2.1 GALEN and GRAIL
1.3 Subsumption Reasoning in DLs
1.4 ALCHfR+ and the FaCT System
1.5 Thesis Outline

Chapter 1 must describe the area, state the problems to be solved, and give applications (for DLs, the problems are in their application). Bear in mind that Chapter 1 is crucial for extrinsic evaluation, i.e. evaluating your work in the context of a specific application.

The contents for Chapter 1 don't say what ALCHfR+ is - a table of contents should be understandable as a stand-alone document.

It's also not clear to me that the aims and objectives are in there - I assume they're in Section 1.5 (I would have one section for declaring the aims and objectives, and another to outline the following chapters in the thesis).

It's very important to clearly and correctly state your aims, objectives, and contributions to knowledge in Chapter 1, and to link back to them in the final chapter to discuss how well they have been achieved. Your claims must be strong enough to get a PhD (or MPhil) of course, but don't claim more than you have achieved or you will fail.

It's particularly important to plan Chapter 1 properly before you write it up, more so than the other chapters. Chapter 1 is where you lay down the philosophy of your thesis, and it's important to get the philosophy absolutely correct, with no contradictions (this is also crucial in the PhD viva, BTW, or they'll run rings around you).

Chapter 2: Formal Foundations of Description Logics

2.1 Syntax
- 2.1.1 Formal Syntax
2.2 Model Theoretic Semantics
- 2.2.1 Concept Expressions
- 2.2.2 Role and Attribute Expressions
- 2.2.3 Introduction Axioms
- 2.2.4 General Terminological Axioms
- 2.2.5 Functional and Transitive Role Axioms
2.3 Subsumption and Classification
- 2.3.1 Subsumption in General Terminologies
- 2.3.2 Subsumption in Unfoldable Terminologies
2.4 Theoretical and Implemented DLs
- 2.4.1 The ALC Family of DLs
- 2.4.2 Implemented DLKRSs

Chapter 3: Tableaux Algorithms

3.1 Tableaux Subsumption Testing
3.2 General Method
- 3.2.1 A Tableaux Algorithm for ALC
3.3 Dealing with General Terminologies
- 3.3.1 Meta Constraints
- 3.3.2 Blocking
- 3.3.2 Semi-unfoldable Terminologies

Chapters 2 and 3 are background (literature review) chapters. Where do your ideas originally come from? What are your intellectual roots? It's important to recognise your roots in others' work and highlight them - it's a major part of deriving your contributions. It's crucial to set your work in relation to others'. A literature review must critique the state of the art, but must show how each reference is relevant to you, rather than just discussing a list of references for their own sake.

Ian Horrocks' thesis combines two ideas to improve performance, hence the two background chapters above. Such cross-fertilisations are great for deriving novel contributions, but you must explain both areas to people in the other area.

Chapter 4: The ALCHR+ Description Logic

4.1 Transitive Extensions to ALC
4.2 A Tableaux Algorithm for ALCHR+
- 4.2.1 Constructing an ALCHR+ Tableau
- 4.2.2 Soundness and Completeness
- 4.2.3 Worked Examples
4.3 Internalising GCIs
4.4 ALCHR+ Extended with Attributes

Chapter 5: Optimising Tableaux Algorithms

5.1 Reducing Storage Requirements
5.2 Lazy Unfolding
5.3 Normalisation and Encoding
5.4 GCI Absorption
5.5 Semantic Branching Search
- 5.5.1 Boolean Constraint Propagation
- 5.5.2 Heuristic Guided Search
- 5.5.3 The Optimised Search Algorithm
5.6 Dependency Directed Backtracking
5.7 Caching
- 5.7.1 Using Caching in Sub-problems
5.8 Interactions Between Optimisations

Use lots and lots of examples. Examples are terribly important to illustrate your work, and make it much easier to understand algorithms. You want to minimise the strain on your reader. Don't expect the reader to just read the algorithm and understand, expect them to use the algorithm to check the examples.

You can put the examples first and then generalise, or generalise first and then give examples. I think it's pretty much almost always better to put the examples first and then generalise (Ian Pratt is a great example of this), although it can be done the other way round, particularly for abstract concepts.

Put boring, tedious algorithms/proofs in appendices, and only put exciting algorithms/proofs that you are proud of in the main text.

Chapter 6: The FaCT Classifier

6.1 FaCT Overview
- 6.1.1 Concept Description Syntax
- 6.1.2 Knowledge Base Syntax
6.2 Pre-processing Terminological Axioms
- 6.2.1 Terminological Cycles
- 6.2.2 Pre-processing Roles and Attributes
- 6.2.3 Absorbing GCIs
- 6.2.4 Normalisation and Encoding
6.3 Classifying the Knowledge Base
6.4 Configuring FaCT
- 6.4.1 Configuring Optimisations
- 6.4.2 Configuring Reasoning Power

FaCT is his implemented system. Chapters about your implemented system are very dangerous, because you know too much about your system for the good of the reader, and there's a tendency to want to give too much detail about the implementation. You should just give the essence, emphasising contribution over implementation. If you write too much detail, you may have to move it to appendices (I had to consign copious chunks of my MPhil thesis to appendices, because the main text was originally 240 pages long, which is ridiculous for an MPhil - I have a compulsion to `brain-dump' everything I know into a thesis, which is something I'll have to avoid in my PhD thesis).

The purpose of describing your implementation in the thesis is to help the reader, not to defend your program (if you are challenged to do so, you can refer to appendices or on-line code).

Chapter 7: Test Methodology and Data

7.1 The GALEN Ontology
- 7.1.1 Translating GRAIL into ALCHfR+
- 7.1.2 Test Knowledge Bases
7.2 Pre-processing Terminological Axioms
- 7.2.1 Percentile Plots
- 7.2.2 Data Gathering
- 7.2.3 System Specification
7.3 FaCT Overview
- 7.3.1 Testing Optimisation Techniques
7.4 Comparing FaCT and KRIS
7.5 Solving Satisfiability Problems
- 7.5.1 Using the Giunchiglia and Sebastiani Generator
- 7.5.2 Using the Hustadt and Schmidt Generator
7.6 Testing for Correctness
- 7.6.1 Hard Problems
- 7.6.2 Classification
- 7.6.3 Satisfiability Testing
7.7 Summary
- 7.7.1 GCIs and Absorption
- 7.7.2 Backjumping and MOMS Heuristic
- 7.7.3 Other Optimisations

Choose appropriate test data - if possible, use data that other people have tested on, so that you can compare your results with theirs. Many domains have standard sets of test data.

Don't include all test cases in your thesis - select the important points.

Chapter 8: Discussion

8.1 Thesis Overview
8.2 Significance of Major Results
- 8.2.1 Satisfiability Testing Algorithms
- 8.2.2 Optimisation Techniques
- 8.2.3 Empirical Evaluation
- 8.2.4 The FaCT System
8.3 Outstanding Issues
- 8.3.1 The Spectre of Intractability
- 8.3.2 GALEN and GRAIL
- 8.3.3 Tools and Environments
8.4 Future Work

The conclusion is not just a summary - you need much more than that for a PhD! Section 8.1 is only one page long (if you've written a summary of each chapter in that chapter, you can just refer back). Section 8.2 is the most important section of the thesis: the significance of your results. Future work is a must, too, and it's useful to point out the outstanding issues to help future researchers in the field.

General Points

Ian Horrocks' PhD thesis is an excellent example - everything unfolds naturally, and it has an elegant simplicity about it.

In the thesis, try not to refer forwards, only backwards - except in the overview of the thesis at the end of Chapter 1.

David Br�e says you don't need to go beyond the third level of subsections in the table of contents (e.g. 3.4.2), but I think every numbered subsection should be included.

Read your examiners' writing, if only to see what their style is - they are bound to accept it the way they do it.

Next Week

John Bainbridge will give a presentation on how he won the Best Thesis prize last year. It will be the last teaching session, then the thesis presentations will begin on 14th March.

Week 5 (7th March 2001)

John Bainbridge of the AMULET Group won the departmental Best Thesis prize for 2000, and his thesis has been nominated for a Distinguished Dissertation. Today in CS710 he gave a talk on how he did it.

His thesis is about asynchronous design, and connecting modules together in particular. I understand that he invented a bus called MARBLE to do this.

He didn't expect to win Best Thesis, let alone be nominated for a Distinguished Dissertation. "You go in wanting a degree, and that's all you go in wanting!"

Writing my PhD Thesis (John Bainbridge)

Know your tools and choose one! You don't want to change to a different one in the middle of writing your thesis, because converting what you have written so far would be very tedious! ;-)

LaTeX: An unorthodox and extremely effective text preparation system which produces printable documents from plain-text source files with embedded formatting commands. It doesn't have a GUI, but being able to edit the text files anywhere is a huge advantage IMO. There's even a special University of Manchester thesis class! :-)
Lout: Like LaTeX, but generates PostScript files directly, whereas LaTeX generates DVI files which have to be converted.
FrameMaker: An advanced word processing/DTP package with an almost WYSIWYG GUI. Has a wide range of features, but can be quite painful and fiddly to use in my experience, and you can't just go away and edit the files using a text editor. I used it for my MPhil thesis, but have decided to use LaTeX for my PhD.
Micro$oft Word: ugh!

Time:

Don't be surprised if you're still trying to get practical results while you're writing up the thesis, but bear in mind it's more difficult to keep the thesis coherent if you interleave practical work with writing up.
The pressure builds towards the end, but stick with it!
Make the most of it and take your time, even if you overrun. Don't rush the thesis just to get it finished within three years (John Bainbridge had a six-month overrun).

Keep it short and concise!

Don't use any more words than you have to!
Figures help (remember the old saying, "A picture is worth a thousand words")
John Bainbridge's thesis has 183 pages of which 155 constitute the main text, which is divided into ten chapters (which is a lot for a thesis, but you have to divide it into chapters in the way that fits best), with 67 figures and 14 tables.

Tell a coherent story. Keep the reader interested. Start with what the story is, and why you did it. David Br�e suggests thinking of your thesis as a detective story, with suspense to be resolved later. If you confront the reader with such tension, it's much more interesting to read than pure slog-work! :-)

Make sure your chapters relate to one another, so that the reader can see the big picture throughout the thesis (e.g. John Bainbridge adopted a layered model). Find a nice way of presenting and summarising your story. It's a good idea to include an actual diagram in Chapter 1 to show how the chapters stand in relation to each other.

Keep it structured, make the chapters fit together nicely. John Bainbridge structured his thesis as follows:

Chapter 1: Introduction
Chapters 2,3: background/literature review
Chapters 4,5,6,7: the layers (theory; discuss alternatives; design decisions)
Chapter 8: the exemplar he actually implemented, that demonstrates the principles (MARBLE)
Chapter 9: Evaluation
Chapter 10: Conclusion

Have an introduction at the start of each chapter, situating it within the thesis, and a conclusion at the end of each chapter which links to the next chapter.

Literature review: Don't cover everything, just what you need, building up to why you did it your way.

Evaluation is hard:

You need to convince the examiner that it works.
John Bainbridge resorted to annotating tables.
Compare your work to others. Does your system do better than other systems? Compare your work to industry standards if applicable. If there's no system that does the same thing as yours (I'm in this position myself), then you have a novel contribution, but you also have the problem of what systems to compare your work to. Look for systems that do similar things, or consider what would happen if you adapted your system to do what they do or vice versa. You have to convince the examiner that it's a fair comparison.
Criticise yourself! How could you do better?
If you have the luxury to try out different designs and see which is best, then do so.
Say in Chapter 1 how you're going to evaluate it. It's important that the examiners have your evaluation criteria in their mind, otherwise they will tend to evaluate it by their own, probably inappropriate criteria.

Conclusion:

Not just a summary! Also a criticism of your work (more evaluation). John Bainbridge's conclusion chapter is about eight pages - short, but very difficult to write!
I presume that his Chapter 9 corresponds to what I call intrinsic evaluation, and that he covered extrinsic evaluation in Chapter 10. He did say he had problems with what to put in which of these two chapters.
Bring the strands of your thesis together.
Sell your argument! (one last time)
Don't introduce new material here.
Think of our subject as a big tree, with Computer Science at the top, and your work (e.g. MARBLE) as one of the many leaf nodes. Try to go up the tree a level at a time, discussing your work at that level, until you reach a level that's too general for you to say anything. John Bainbridge went as far up as Computer Architecture. The higher you can go (legitimately - don't force it), the better. A key feature of a Distinguished Dissertation is that the contributions are very general, whereas for ordinary PhDs they can be more specific, and even more specific for MPhils.

Don't be surprised if your final structure is dissimilar to your initial plan. Therefore, don't be too rigid initially; feel free to change.

It's a good idea to get others to proofread your thesis, but make sure you thoroughly proofread it yourself! You understand your own work better than anyone, so in one way you're in a better position to see the weaknesses, but on the other hand, you're the worst judge of how intelligible your thesis is to someone reading it `cold'. You should show each chapter to your supervisor as you write it. If you work with a peer, you should proofread each other's writing.

If you have time, it's helpful to build:

A glossary of terms, with references to where each term is formally defined. This is best done as you go along.
An index. This is best done at the end, and usually takes a week.

Next Week

The CS710 presentations should begin next week, but if everyone refuses to go first, we'll either look at another thesis like we looked at Ian Horrocks's in Week 4, or we'll look at Ian Horrocks's Chapter 1 in detail.

Week 6 (14th March 2001)

Today, we looked in detail at Chapter 1 of Ian Horrocks's PhD thesis, having read this chapter before the seminar. You really need to have a copy to hand as you read these notes, which you can download here.

"It's by criticising other people's work that you learn to see how other people look at your work." (David Br�e)

In the seminar, we discussed:

the structure of the chapter as a whole;
the construction of each section.

(We were also going to look at the use of English in some paragraphs - how to make them more `punchy' - but we didn't have time.)

BTW, it's a common convention to use capital letters for software systems, e.g. FaCT.

Contents of Chapter 1

1.1 Description Logics
1.2 DL Applications
- 1.2.1 GALEN and GRAIL
1.3 Subsumption Reasoning in DLs
1.4 ALCHfR+ and the FaCT System
1.5 Thesis Outline

The first chapter of a thesis should contain the following:

The problem you're going to solve;
How you're going to solve it;
How you're going to evaluate (test) your solution (Ian Horrocks's Chapter 1 says nothing about the experiments he did, at least not on the surface).

It's time to get out your copy of Ian Horrocks's Chapter 1! :-)

Section 1.0 (the text after the chapter heading and before Section 1.1)

The introductory text sets up the reader's expectations of what's going to be in the chapter.

> {Paragraph 1} Description Logics form a family of formalisms...

This paragraph sets up the problem to be solved. It starts by giving some background about DLs, and then says what the problem is.

> {Paragraph 2} This thesis investigates...

This paragraph states the major claim of the thesis. In fact, in my view it continues saying what the problem is from the previous paragraph, and then states the claim: "a suitably optimised algorithm can provide acceptable performance with a realistic knowledge base."

> {Paragraph 3} This claim has been strongly supported by...

This paragraph gives the support for the claim, and lists the contributions of the thesis up front. It mentions evaluation.

Section 1.1: Description Logics

This section gives the background to DLs in some detail, defining terms, establishing preconditions for later sections.

Look at the last paragraph of each section for the `take-home' message.

The last paragraph of Section 1.1 (This thesis addresses...) is a statement of the thesis claim, but it doesn't really belong in Section 1.1! It's the penultimate paragraph (The expressiveness of the concept description language...) that contains the take-home message, and really should be the last paragraph itself! It gives the conclusion of the section (problems with existing systems) - the whole section should substantiate that claim.

We reverse-engineered the following plan from the text of Section 1.1:

{Paragraph 1} Many computer applications... - what DLs are, why they are needed
{Paragraph 2} Description Logics (DLs)... - where they come from/characteristics
{Paragraph 3} The use of DLs... - history of the field - lists past DLs
{Paragraph 4} DL Knowledge Representation Systems (DLKRSs)... - example of what they do
{Paragraph 5} A typical DLKRS will provide a range of reasoning services... - what they actually do (TBoxes, ABoxes)
{Paragraph 6} The expressiveness of the concept description language... - shortcomings/problems
{Paragraph 7} This thesis addresses... - thesis claim

Reverse-engineering a plan from the text is a very valuable skill to have, as it's much easier to appreciate the strategic level from a plan than from the text itself.

For example, we can see from the plan that Paragraphs 3 and 7 feel really out of place. What's Paragraph 3 doing here? It's just a list of DLs, with no comments. It should either be moved, expanded (e.g. Why all these different DLs? How do they differ?), or just deleted. Paragraph 3 disrupts the flow of Section 1.1; it doesn't fit in with what DLs are. A couple of them are mentioned later in the chapter, but the reader isn't likely to remember them from here!

You should read your own thesis like this, and other people's theses - extracting structure from the text. What does each paragraph say, why is it there, what are the connections between paragraphs? The last sentence should be a culmination.

Section 1.2: DL Applications

This is actually the weakest section of the whole thesis. Section 1.2.0 is just a list, with no conclusions of any kind. It doesn't say which systems are toy systems and which are actually fielded. The whole Section 1.2 is very minimal, with no criticism at all. It's not bad enough to fail the thesis, but it could be improved! ;-)

You need to impose your authority on your area - show that you're the expert! This section fails to do that.

Anyway, it's important to establish your application context in Chapter 1 - it's a vital part of extrinsic evaluation.

Section 1.2.1: GALEN and GRAIL

{Paragraph 1} A particularly promising application domain... - application area; benefits (in the particular application area)
{Paragraph 2} However, the usefulness... - shortcomings - expressiveness
{Paragraph 3} To satisfy these requirements - shortcomings - incompleteness

The final paragraph is about the limitation of GRAIL - not of DL applications in general. There's no take-home message - in fact, the last paragraph of Section 1.1 would be much better at the end of Section 1.2! You have to sum up the state-of-the-art in the application area. There's a danger that you know more about local implementations than you do about the rest of the world!

It's also not obvious whether Ian Horrocks wrote GRAIL himself (he didn't), although the fact that the reference [GBS+94] is three years before he submitted his thesis suggests that he didn't. However, he did show GRAIL's subsumption reasoning to be incomplete [Hor95].

Section 1.3: Subsumption Reasoning in DLs

David Br�e just skipped over this section in class, saying it was excellent and preferring to focus on the weaker parts, but I have some comments of my own to make!

> {Paragraph 2} The advantage with these algorithms...

This paragraph discusses the limitations of previous approaches such that you can make a contribution.

> {Paragraph 3} An alternative approach is to transpose...

Discuss the alternative approaches. There will often be tradeoffs (in the case of DLs, there are tradeoffs between expressiveness, soundness, completeness, and tractability). A good PhD contribution can often be made by combining the best features/properties of these approaches. For example, Ian Horrocks's FaCT system combines an expressive DL with proven sound and complete subsumption testing, which is computationally tractable in practice (although there's always the theoretical possibility of combinatorial explosion because sound and complete subsumption testing for such an expressive DL is inherently an NP-complete problem).

You must compare your system to others, with respect to various capabilities/properties. It helps to summarise this as a table - I have derived the following table from Section 1.3 to illustrate the idea:

Method	expressive?	sound?	complete?	tractable?
structural algorithms	yes	yes	no	yes
satisfiability	no	yes	yes	no
FaCT	yes	yes	yes	yes

Section 1.4: ALCHfR+ and the FaCT System

This is another weak and very minimalistic section - it says nothing about how he's going to do the evaluation! It merely says that the ALCHfR+ description logic is used as a test-bed for the subsumption-testing algorithm (intrinsic evaluation, I think).

It does mention the application context (GALEN) again, which is good (IMO).

Section 1.5: Thesis Outline

Every Chapter 1 ends with a section outlining each of the following chapters, but merely listing the chapters, with a potted summary of each, is the most boring way to do it!

Try and tell a story out of it (fleshed out with references to the chapters). Include the plot, and the rationale behind the thesis structure.

You should include the interdependencies, especially if you're going for a Distinguished Dissertation! You could even literally draw a dependency graph of the chapters!

When I read Chapter 1 of Ian Horrocks's thesis, I tried to focus on what his contributions are, and looked for a one-to-one correspondence from the contributions to chapters (to see if he had done a contribution-oriented thesis structure). I identified four contributions:

provably sound and complete subsumption testing for an expressive description logic (Chapter 4);
optimisation techniques to make the subsumption-testing algorithm tractable (Chapter 5);
the FaCT system, as (a) a DL knowledge-representation tool, and (b) a propositional modal logic theorem-prover (Chapter 6);
evaluation through empirical testing, including detailed comparisons with other systems (Chapter 7).

Next Time

There will be no CS710 seminar next week (21st March), because David Br�e is away at a conference. Thesis presentations will therefore begin on 28th March. I'm not going to embarrass students by writing up summaries of their thesis presentations! ;-) I will merely transfer any useful general points that come out of these sessions to my advice page or whatever.

Week 9 (25th April 2001)

Dr. Jacques Fleuriot, who won a Distinguished Dissertation in 2000 for his PhD thesis, A Combination of Geometry Theorem Proving and Nonstandard Analysis with Application to Newton's Principia, gave a departmental seminar on his research today, and David Br�e got him to give a talk to CS710 before that, about how he wrote his thesis.

His research is about arithmetic decision procedures, temporal logic, automated reasoning, mechanical theorem proving, and Newton's Principia. His contributions are mathematical theories, and some implementation of tools. His supervisor was Larry Paulson.

On Writing My PhD Thesis (Jacques Fleuriot)

Writing a thesis is a very personal experience, so all Jacques Fleuriot could do was to share his experience with us.

It's important to keep making progress, but don't compare your progress to other students in your year, because everyone's different, and every research project is different.

You should gather as much material about your field as possible, to get as broad a perspective as possible. It's incredible what you can find on the Internet in a short time! Never stop reading until after your PhD viva. As your expertise grows, you can see links to more areas, you make new connections, and more literature can become relevant (even if you thought it was irrelevant earlier in your research).

Fleuriot's Rules of Thesis Writing

It always takes longer than expected!
Never put off until tomorrow what you can write today. You might lose your train of thought, as well as making less progress today than you could do. On the other hand, if you're not in a good position to write something up yet but you force it out anyway, it's hard to throw it away if it's so bad that it ought to be rewritten. In that case, it may be better to jot down your ideas in point form, and write it up properly later.
If someone doesn't understand, it's your fault, not theirs. It's very important to be clear, and to think about which things a reader might not know but you take for granted. Model the reader, and question your assumptions.
You should enjoy reading what you've written.

Fleuriot's thesis is 130 pages long, plus a 5-page bibliography, with 7 chapters, 30 figures, and a short glossary of mathematical terms. It was typeset using LaTeX, and was printed double-sided, with a font size of 10pt - double-sided and 10pt are allowed by Cambridge University, but not Manchester!

You do have to shut yourself away while you're writing a thesis. It's hard to mix practical work with writing up, because implementation is an excuse to put off thesis writing. Don't rush the thesis. Don't take a job while you're writing the thesis, and don't write papers while you're writing your thesis.

Thesis Structure

Fleuriot's thesis is written in a `detective' style, from problem to solution.

The first chapter is a non-technical introduction, which sets the scene (history of the field, motivating examples). The first chapter should ease the reader into the thesis, not scare them off! Say what the challenges are, the objectives, how the goals will be achieved. Highlight your own contributions and main research components.

You should write a thesis message, as suggested in Aaron Sloman's Notes on Presenting Theses {8.1}, which should be reflected in the title, abstract, the introduction and conclusion chapters, and the overall structure of the thesis.

The introduction and conclusion chapters both summarise the thesis, but they are very different kinds of summaries!

Give a diagrammatic overview of the thesis if possible (which doesn't necessarily reflect the chapters, but I think it definitely should).

        +-----+
        |     |
        +-----+
           |
        +-----+
        |     |
        +-----+
       /       \
+-----+         +-----+     +-----+
|     |         |     |-----|     |
+-----+         +-----+     +-----+
       \       /               |
        +-----+                |
        |     |----------------/
        +-----+

Fleuriot's chapters:

Introduction (10 pages)
Geometry theorem proving (intro, history, review of existing methods)
Constructing the hyperreals (Isabelle system, overview of tools/mathematical)
Infinitesimal and analytic geometry (builds on Chapters 2 and 3)
Mechanizing Newton's Principia (how Newton's terminology is translated)
Nonstandard Real Analysis (builds on Chapter 3, as a combined related work+conclusion section)
Conclusion (5 pages)

The Structure of a Chapter

Each chapter starts with some motivation, and a brief summary (not detailed contents with section numbers), to let the reader know what's ahead.

In the body of the chapter, keep things concise and clear! Explain things (especially equations) intuitively. Use diagrams and examples wherever possible. Have a smooth transition between sections to keep the flow.

Each chapter should have a conclusion at the end, in which you briefly review what's been achieved in that chapter, put the work in context, and highlight your contributions and any other important points. Is there anything you might have done differently?

Each chapter tells its own story.

The Conclusion Chapter

This should be short (Fleuriot's is five pages), and needs to be written carefully.

You need to pull the contributions of the thesis together, draw general lessons, and outline your main achievements.

You need to criticise your whole thesis, taking a step back from your work. Bear in mind that your criticism here is likely to give the examiners questions for the viva! Make sure you anticipate these questions and know the answers. Know what the weaknesses of your work are, and how they could be improved.

You need to include a short section on further work - no need to describe it in detail, otherwise it would not be further work! (In my MPhil thesis, I actually described future work in detail in an appendix, with a short Future Work section in my conclusion chapter, with references to that appendix).

Literature

Fleuriot took a `dynamic approach' to literature review, using the literature as and when, rather than doing a literature review chapter. I agree with this approach - I think it's much better to do a distributed, contribution-oriented literature review, showing its relevance to your work, rather than keeping the literature separate from your work, which tends to make the literature review unfocused and the presentation of your own work too inward-looking. Use references to support the points you make.

Fleuriot did, however, discuss previous work beforehand in one chapter. Since his work was based on several existing methods, this warranted a historical and technical discussion of previous work. It makes it easier to justify your choices.

Quotations can also lend strong support to your arguments (especially if they're from someone famous! ;-)), but do not overuse them - you need to say things in your own words.

Consider whether you have any particular audience in mind for your thesis. A PhD is expected to make a contribution to a specific area, but it also has to be accessible to computer scientists in general, especially for a Distinguished Dissertation!

The Process of Thesis Writing

Prepare yourself before you start writing up - relevant papers, notes you have made, your thesis message.

Different people need to plan at different levels of detail before they are ready to actually write something up. At the minimum, you should plan the overall organisation of the thesis (what the chapters are).

Writing and rewriting should be constant activities.

Fleuriot's supervisor only read his thesis once, when it was almost completed, and he then incorporated his supervisor's comments and suggestions. I find this shocking - I think you need to at least show each chapter to your supervisor once it's written up.

In a typical day of thesis-writing, you should:

start by making the corrections from what you proof-read yesterday (a nice, gentle introduction :-));
continue writing from where you left off (if you get bored, you could write another bit elsewhere in the thesis, but I never do that if it's not planned);
proof-read what you've written at the end of the day.

I have just added a section to my advice page, giving my detailed thoughts about the process of thesis-writing.

Back to CS710

Hosted by www.Geocities.ws