Some of the students are MPhils coming off CS700, while others are in their final year, writing up their PhDs. Some of them even did CS710 in their Masters year, and are coming back for a second dose - "It'll force me to write my thesis!" (anonymous).
The deliverables for CS710 will be the abstract of each student's thesis, the table of contents, and at least one chapter (usually the first or the last), plus they will have to give a presentation of their thesis plan.
It seemed to me that the PhDs have already started writing their thesis, whereas the MPhils (at least the September starters) haven't started yet (I know I certainly haven't - I want to get clear on the big picture of my thesis before I start actually writing it up, and I don't think I'm 100% clear yet).
As usual for a Level 7 module, the emphasis is on learning by doing - to have a go at writing our theses, and to read others' theses and try to see what's wrong with them! We should consider where we are now with respect to writing our thesis, think about where we want to be by Easter, and try to reduce the difference, rather like the General Problem Solver (GPS), an old AI system.
Rather than following a set syllabus, David Brée will try to teach what we the students want to learn. Today that question was put to the group, and discussed for the rest of the seminar.
It seems that what we want to learn about thesis writing basically falls into two categories: motivation and the architecture of the thesis.
Equally, some thesis writers are apt to write the first chapter or two and then spend months perfecting it when they should be forging ahead with the rest of the thesis! (I felt that a little bit when writing my CS3900 project report last year.)
Multi-disciplinary theses are more challenging to write than theses which concern one field, in certain ways. For example, my primary research interest is Case-Based Reasoning. I'm looking to apply that to Automatic Programming - a fairly novel cross-fertilization. Case-Based Automatic Programming will be applied in turn to a simplified, concrete exemplar: transforming constraints in information models for schema-to-schema mapping. So I guess I'll need a chapter to introduce Automatic Programming (which is the big picture of the thesis, though not of my research interests), one to introduce CBR and one to introduce the constraints problem before I get to the heart of the thesis.
Of course the University does have some regulations for theses - these are on the Web at http://www.cs.man.ac.uk/rgd/David/CS710/theses.html, but these are the rubric rather than the rhetoric.
Where the research falls on the science v engineering continuum (as discussed in CS700) also comes into play in thesis writing. Are you doing experiments? pure theory? engineering? If you are doing an `engineering' type project (i.e. building a system to do something) then you need to try to get the most science out of it as possible, by generalizing it. Research Associates in particular suffer from this problem.
People who have written a big, dirty system need to write it up coherently, as they have a tendency to keep repeating themselves.
An important issue is how much background knowledge to put in the thesis and how much to assume the reader already has. This depends on who you are writing for - in particular, who is going to be your internal examiner, so you should find out who that is going to be and find out about them (e.g. are they an expert in the field? are they a bit surreal?).
So, one issue in thesis writing is what to leave out! Some things (e.g. certain background for the benefit of the reader, as they won't want to bother going to read it from another source) could be put in appendices (a case that springs to mind is Ian Pratt's fabulous Artificial Intelligence book, which had a chapter on predicate calculus consigned to an appendix as he didn't want to disrupt the flow of his book). As for myself, I tend to put certain sordid details in appendices - things that aren't necessary to understand the thesis as a whole but I want to include for completeness (see my CS3900 project report for example).
Incidentally, the choice of internal/external examiners is an important one, as the wrong choice could be a big mistake if they are going to bounce the thesis! You don't have the right to make that choice yourself, but you can certainly influence the decision!
Another thing PhD candidates worry about is how to reassure yourself that you've got all the relevant references, that there's not some paper out there somewhere that someone has done basically the same work as you and you don't know about it, or something important that should be taken into account in your research. The short answer is: you can't. There's no way you can be sure, and you just have to hope that the examiners wouldn't find it either!
Finally, bear in mind that, although writing a thesis is an open-ended task, it's not your life's work! You have to close it off somehow, sometime, so one of the issues addressed in CS710 will be how to stop a thesis.
Hopefully, you will have a wise supervisor, but that doesn't mean they're always right! Particularly in trivialities such as "don't use abbreviations" or "don't use I, you or we". Another problem is what to do if your supervisor says that each draft you show him is fine, without giving any real (negative) feedback, and you're worried that it's not fine really. A trick David Brée suggested is to throw in a completely garbled paragraph and see what happens!
Over the next few weeks, CS710 will try to address the issues that were raised in this meeting, plus two standard events:
I would also be interested to see some theses that failed, together with an analysis of what was wrong with them. I think that would help us to avoid mistakes.
Are the REGULATIONS FOR THE PRESENTATION OF THESES AND DISSERTATIONS at http://www.cs.man.ac.uk/rgd/David/CS710/theses.html all we need to know or do we ought to get a copy of the "University Regulations" from the Examinations Office?
In today's seminar, we discussed what were the expectations of a PhD thesis, in particular, the candidate's expectations of their own thesis.
A PhD candidate expects, at the minimum, for their thesis to get them a PhD. In addition, it's nice if it's not boring!
A PhD candidate expects their thesis to be "good". This goodness boils down to two sorts of correctness:-
(There's no viva for an MPhil thesis, is there?)
Intrinsic correctness is much more important than extrinsic correctness in terms of passing the PhD examination - they're not going to torpedo you if you miss some obscure reference (and the chances are, if you haven't come across it in three years of burying yourself in this stuff, then neither will they!)!
It's very important that a thesis is a ROUNDED STORY.
A thesis is, by ancient definition, a coherent argument, though the word `thesis' is often used to refer to the physical report in which the thesis itself is laid out.
A thesis is an argument that can be summarized in a nutshell. So you start by writing the Abstract, in which you make a claim (or state a problem to be solved) and then, in the body of the thesis, you prove the claim (or show how you solved the problem). As mentioned in CS700, a PhD doesn't have to be a revolution, in fact it shouldn't be unless you are outstandingly brilliant! (Most people get their PhDs by making a `normal science' contribution, and succeed through determination and perseverance rather than sheer brilliance.)
You need an examiner who is sympathetic to your approach to science.
It's important that the external examiner is an expert in the area for which you wish to be examined, especially if your work is interdisciplinary. For example, if what you're doing is a cross-fertilisation of Computer Science and Physics, then don't get an external who's a Physics specialist if your work is more of a contribution to Computer Science! I guess I have a kind of similar thing myself, as my research concerns the application of case-based reasoning to automatic programming. Many PhDs who fail come to grief on this particular reef.
On the other hand, if the examiner is not an expert in your field then they are less likely to be able to catch you out - for gaps in your literature knowledge, for example. It's nice to feel that you're the expert, when there is a dearth of expertise in the country - this is particularly the case in Case-Based Reasoning!
Bear in mind that the longer a thesis, the more difficult it is to hold it together - both the argument logically and the report physically! :-D
The rule of thumb is that it takes one working day to write (and later redraft) each page, so don't postpone starting for too long. I must confess I haven't started writing my MPhil thesis yet (I hope to have it finished by the end of September), but I should be writing a page a day by May!
Don't expect to get it right on the first draft - expect a lot of rewriting!
The important message is to allow plenty of time for writing the thesis: allocate at least 10% more time than you think you'll need! (In my experience, most tasks in life take longer than you think beforehand - I'm always having to postpone things! Getting down to them is the hardest part.)
Like most tasks when you don't know what to do, break it into smaller parts: into chapters, into sections, into subsections. Then write the little chunks of text when you get excited! (you don't necessarily have to write them in order!). Take a top down view of each chapter, and get the general plan clear in your mind before you write the text.
I wholeheartedly agree with this approach of splitting the thesis at the top level and working out the hierarchy before writing the text (rather than just writing the whole thing from beginning to end) because I think it's more enjoyable this way as well as helping to keep it coherent. I used this approach for my second CS3411 essay and it was pure fun to write!
In her book Problem Solving Strategies for Writing, Linda Flower suggests the use of an issue tree to formulate an argument. This was illustrated in the lecture with the example of constructing magic squares (which I won't regurgitate here, 'cause I don't like it). Again, it's the idea of breaking a big problem into smaller parts when you get stuck.
A sort of multiple inheritance occurs when a problem is a cross-fertilisation of two disciplines (e.g. Physics and graphics). But again, you just break the problem into smaller parts.
There are two kinds of thesis plan you should make:
First, however, David Brée went over the points the examiners are looking for in Masters theses, which I have written down as best I can.
Does anyone have a verbatim copy of these points, or are they on the Web somewhere? If you are David Brée reading this, please give them as a handout!
1. Does the thesis show satisfactory experience of research methods as can be gained in one year?
2. Has the work been carried out in a satisfactory manner?
3. Is there a discussion of the purpose of the investigation, with reference to previous work?
4. Is the thesis satisfactorily presented, with diagrams and references? (see Nicholas Higham's book Handbook of Writing for the Mathematical Sciences)
1. Has the candidate been successful in achieving his (or her) aims and objectives? This means that it is important to be clear at the beginning and the end of the thesis, and link the conclusion back to the beginning.
2. Has the candidate shown originality and independent critical judgement? You have to relate the literature review to your work, and show how each reference is relevant to you - make it clear why you need to review their work. The purpose of the literature review is not to show that you're familiar with the literature of your general field, it is to analyse the work of others that is relevant to yours, so only include those references that are necessary. Don't include irrelevant stuff. Don't include crap stuff (articles are the right place to flame somebody's work, not theses!). Do include the external examiner's stuff. Therefore, don't have a crap external examiner!
3. Does the research reported in the thesis constitute an addition to knowledge? This is rather difficult to judge, as "addition to knowledge" is a very open-textured concept. It should be a useful and scientific contribution, but it does not have to be all that significant a contribution for an MPhil, and not even for a PhD! Some people think that the contribution of a PhD has to turn the subject upside-down and leave it rocking on its foundations, but that's a vast exaggeration! For a PhD, it's more important to show that you're a fully professional researcher with an expert grasp of your field, and knowing the boundaries so that you can extend them as needed. So don't get too paranoid about not making a novel contribution. Since Computer Science is actually very much an engineering subject, we have much less of a problem in this regard than, say, pure mathematics, where your novel contribution does go out the window if someone comes up with the same equation as you and publishes it first!
Bear in mind that all new knowledge is essentially just a combination of old knowledge!
Paliouras's MSc was an evaluation thesis rather than a new contribution (it analyses the scalability of existing algorithms rather than inventing any new ones). This is quite common for Masters theses, because an evaluation thesis is easier to write (but it's so boring and devoid of thrills!). The contribution of Paliouras's MSc was that nobody had ever done such a thorough review of the field of Machine Learning before.
The table of contents should be understandable in and of itself. I was able to get the gist of Paliouras's thesis just by skimming through the contents, because I'm very intelligent and I know a lot about AI (in fact, the field of Machine Learning is not unrelated to Case-Based Reasoning!). But it's also credit to Paliouras that he wrote a good contents!
The only criticism of this table of contents is nit-picking things like the style of his headings: it uses a lot of abbreviations, which impair understanding if you don't know what they mean. For example, even David Brée didn't know what PLS1 stands for without looking it up, so avoid trips down jargon lane!
Some people think you should avoid capital letters in headings wherever possible, but I think it's just important to have a case convention and stick to it consistently. For example, Section 1.4 is titled "Motivation for the project", whereas Section 1.5 is titled "The Structure of the Thesis", which is rather ugly because it's inconsistent.
Another point about Paliouras's thesis, judging just by the table of contents, is that Section 2.4 is only weakly connected to Chapter 3, even though they're both obviously about machine learning algorithms.
Do not underestimate the importance of the table of contents! The examiners usually read the contents second (after reading the abstract), and keep referring to it throughout their reading of the thesis, so the contents should be a good road map of the thesis! In certain ways, you're a bad judge of the comprehensibility of your own table of contents (and of your writing in general), because you know it all, and may have omitted something that's obvious to you but possibly unbeknown to the reader. So it's a good idea to get someone else to read it!
So you should consider putting stuff which is boring but essential background in appendices rather than in the thesis itself. The best example I've seen of this is in Ian Pratt's Artificial Intelligence book, where he relegates a tutorial on the predicate calculus to the Appendix so that that mundane stuff, with which some but not all readers will be familiar, does not disrupt the flow of his exciting book as he goes on to discuss logic and inference in Chapter 3!
Things like experimental results (data and graphs) and protocol analysis (a transcript of someone's thoughts as they solve a problem, which can provide useful insights for AI) should be put in appendices, including a sample in the main text where you're trying to make a point about the results.
The strange things that struck me about it were that it was typewritten rather than word-processed as it would surely have been in this day and age, and the thesis was so short - 65 pages for the main text is shorter than my CS3900 report!
The other thing that jumps out at you when you peruse the table of contents is that the literature review is in Chapter 5, which is unusually late, since most theses have the literature review in Chapter 2. This implies that his work can be understood independently of others'.
The purpose of the literature review is to show that you're an authority in your field. It should not be a list describing references for the sake of it, but really tying it into your thesis. It's nice if it shows an insight, such as a novel classification of the literature. You should step back from your subject and look at what it really means rather than how we normally think of it.
In essence, the abstract has to say two things:
You should begin the abstract with a major claim, saying what you have done in the first sentence: "A system of techniques is presented for proving mechanically that a computer program terminates cleanly." Then you should say exactly what you have done: "Clean termination means that the program has no infinite loops and no semantic errors."
The abstract should give the motivation for what you have done. It should say what artifact you have implemented and how you did it. Note that Sites's abstract says nothing about results - it omits to mention that there is no computer implementation of the techniques, and it doesn't mention what results were achieved using the techniques (no doubt he had his reasons, but this comes across as a somewhat incomplete abstract).
It's very important in your thesis to be clear about what you have done and how you know you've achieved it. The two extremes for evaluation are deduction and hypothesis testing. There are various ways of testing IT systems, such as doing experiments, getting users to fill in questionnaires, and ethnographic methods (I've no idea what those are!). In areas such as information systems, sometimes there's no choice as to how to evaluate systems!
Although I derecommend having such a section in your own thesis, it is nevertheless instructive about some features that a thesis ought to have:
It gets a bit negative on page 2: the techniques do not apply to the broad range of programs, as the abstract led us to hope!
The introduction expands the last paragraph of the abstract, which contrasts proof of clean termination with proof of correctness. It establishes the scope - and non-scope - of the thesis.
On page 3, Sites uses an example (TREESORT3) to show that proof of clean termination is useful.
The limitations of the work described in the thesis are stated on page 4. It is important to be clear about what your artifact doesn't do as well as what it does! It's better to be up front about your shortcomings than to be `discovered' by the examiners, and it's also a sign of maturity as a researcher. So many journal articles aren't honest about limitations, which makes it more difficult for us to critically assess their work!
It's disappointing that Sites didn't manage a computer implementation of his techniques, despite the jazzy title of the thesis!
Don't overclaim, either - e.g. "This is better than Fermat's Last Theorem!" It's better to be modest than to proclaim loudly how wonderful you are and then fail to live up to your own hype (unless like me you really are brilliant, and then you're entitled to boast! :-)). Stating the limitations is also another way of scoping things out of your thesis project.
Even though, again, it's supposed to be the best thesis of its kind, we still managed to rip it to shreds! Only once a thesis has been written can it really be understood and criticised, including by the author! It's common for people who get a PhD to look back at their thesis a year later and think "that was a mistake!" (presumably of a particular aspect of it rather than the whole thing! ;-)).
Cox's thesis title, "The Application of Visualisation Techniques to Three-Dimensional Semiconductor Device Simulation" seems to capture the two key concepts that I thought the thesis was about when we spent five minutes at the start of the seminar reading through the table of contents, the abstract and the prologue: device simulation and visualisation.
Note that a device in this context means anything that has semiconductors in it.
I'm finding it hard to come up with a good title for my MPhil thesis, which is about applying Case-Based Reasoning to Automatic Programming - in particular, to the transformation of constraints in information models for schema-to-schema mapping - I'm not sure how I can express that succinctly in a title!
The most arresting thesis title I've ever read is that of Paul Pun's PhD thesis: "KNOWLEDGE-BASED APPLICATIONS = KNOWLEDGE-BASE + MAPPINGS + APPLICATIONS".
Remember that a table of contents should be understandable in and of itself - you should read it as if you knew nothing and see if it makes sense. Cox's Contents are a bit unclear: for example, what is AVS (Section 2.5) - is it a particular visualisation tool? (abbreviations should be largely avoided in the contents). It's not clear what system Chapter 3 ("System Overview") is an overview of, whether it's his own work or someone else's. And some people find it strange that there's a "Summary and Overview" of the thesis tucked away in Section 2.6 (although it's also in the Prologue). I don't find this so strange myself, because presumably it serves to re-orientate the reader after dredging through all that background!
The contents should give the reader an impression of what you have done. Cox's contents is not very good at that - it is obscured by all that background!
Nitty-gritty details (e.g. of the implementation) should be in appendices rather than the main text.
Cox's thesis is scientific engineering (building something to test a theory) rather than just engineering. This means that you have to test that it works, and why it works! It's dangerous to set out to engineer something that's new, because Computer Science is so rapidly advancing that three or four years down the road, it won't be new any more (e.g. the World Wide Web). If you do this, you have to save yourself by making it scientific!
Here are a couple of nasty questions that could get asked in the viva (oral exam):
For one thing, the abstract doesn't claim enough - it doesn't say anything about building a system, and nowhere does it look like he ran any experiments! It's not even clear from the abstract that there was a software implementation! There also seems to be no conclusion in the abstract.
Let's go through the abstract blow-by-blow:
> Semiconductor device simulation, by allowing the detailed derivation and analysis of
> device behaviour, provides a cost effective tool for the development and testing of
> prototype device structures.
You should begin a thesis with a good sentence, but this is a horrible one, because it uses a subclause to qualify the first noun, which badly delays the verb phrase!
Semiconductor device simulation does not provide a tool, it is a tool!
It is bad to say that the tool is "cost effective", because it's a claim that you can't back up in the thesis, plus this is not an application for an EPSRC grant!
> This thesis investigates the scope, merits and practicality of employing
> visualisation methodology to improve the usability and efficiency of the simulator.
It's not strictly correct to animate the thesis by making it the subject of the sentence - the thesis doesn't do the investigating, it just sits on a shelf! Also, "methodology" is the wrong word here - better "visualisation methods" or "visualisation techniques".
> There are two approaches, of differing scope, to the application of visualisation techniques.
> The narrowest application is to use data visualisation techniques to post-process simulation
> data in order to allow its visual interpretation and analysis.
It should be "the narrower application", because there are just two of them!
> This helps to satisfy requirements both for visual verification of the problem description
> and the analysis of the corresponding simulation results.
"For" and "and" are a bad combination here, as they makes it ambiguous as to whether it's "for visual verification of the problem description and for the analysis of the corresponding simulation" or "for visual verification of the problem description and visual verification of the analysis of the corresponding simulation results".
> The broadest application integrates visualisation techniques within the computational
> simulation environment.
The broader application does not integrate visualisation techniques - the author is the one doing the integrating - it uses visualisation techniques.
> The visual representation of data provides an intuitive visual interface to the simulation
> process which can hide underlying simulation management functionality.
Who's doing the hiding? (Okay, this is just nit-picking!) The moral is, be careful with which clauses.
> This enhances usability.
It is bad to use an anaphor, especially after such a long, nasty sentence before it, because it's unclear what it refers back to.
> This thesis first introduces device simulation and visualisation methodology. It then
> discusses the scope for applying visualisation techniques to this problem domain. Each of
> the two approaches identified is then discussed with reference to a formal model, and
> evaluated via a concrete practical implementation.
This is the last paragraph - it dishes out the dirt too late!
The examiner was obviously very generous not to fail the thesis on account of the abstract - if it was me, I'd at least refer it back for `minor corrections'!
Three golden rules for abstracts:
Next week Rizos Sakellariou, who gave that wonderful CS700 talk last semester, will give another one in CS710 on his thesis!
Rizos Sakellariou's thesis was very mathematical in nature, therefore the following two books were appropriate to writing his thesis:
[1] Higham N.J. (1993). Handbook of Writing for the Mathematical Sciences. Society for Industrial & Applied Mathematics. ISBN 0898713145.
[2] Knuth D.E. (1989). Mathematical Writing. Mathematical Association of America. ISBN 088385063X.
2. Model the reader! Keep asking yourself: "How are they going to misunderstand this?" Include guidelines for the reader.
3. Master the medium and the material!
4. Simplify!
5. Aim for excellence!
Do try to give an example to represent every theorem. I'm reminded of the great Ian Pratt, who always starts with an example before explaining a difficult concept. Spend time looking for hidden analogies that will help the reader to understand (this is one of Rizos Sakellariou's strengths). Start with simple cases and examples before presenting the main results; present special cases of a theorem before presenting the general case of that theorem. For example, if a theorem in geometry applies to higher dimensions, start by restricting it to 2D geometry so that the reader can grasp it before having to try to visualise it in hyperspace!
2. The opening section should be the best section! You need to make a good impression on the general reader (and, in particular, the examiners), particularly at the beginning! You should avoid deep technicalities in the first chapter, and predict what the gaps in the readers' knowledge will be (in general, you should assume they are well-versed in Computer Science but are not specialists in the particular area of your thesis).
You need to come up with a title for the thesis that conveys the essence of the thesis. The title of Rizos Sakellariou's PhD thesis was "ON THE QUEST FOR PERFECT LOAD BALANCE IN LOOP-BASED PARALLEL COMPUTATIONS." The key words here are perfect load balance, loop-based and parallel computations.
3. You shouldn't use the same notation for different things - it's confusing! A good tip is to include a summary of the notation used in the thesis, at the beginning of the thesis.
4. Don't start a sentence with a symbol! A sentence beginning "w is the workload..." looks awkward and is unacceptable. You should integrate equations into sentences properly (e.g. if an equation is at the end of the sentence, you should still finish the sentence with a full stop). An alternative is to use reference numbers, e.g.
x + y = z (EQ10)
and then refer to this equation as EQ10 in the text - this is especially useful for referring to equations that are several pages away.
5. Adopt a methodical numbering convention and stick to it consistently. References to chapters, sections, figures, tables, theorems and lemmata should be capitalised if accompanied by a number, e.g. Chapter 2, Section 2.4.1, Figure 2.5, Table 4.6 etc. Sections, figures, tables and equations should include the chapter number, and you may even choose to include section and subsection numbers: Equation 4.3.2.5 looks rather long-winded, but it tells the reader exactly where it is!
6. Remember to never split an infinitive! ;-) Monstrosities such as "to briefly discuss" and "has been also discussed" are quite common in English, but are unacceptable for academic writing.
7. Common mistakes (see the cited books for a proper treatment of these!):