CS700 Second Presentation

28th January 1998

Andrew Broad

http://www.cs.man.ac.uk/~broada/

Supervisor: Nick Filer
CAD Group

Case-Based Reasoning applied to Automatic Programming

Automatic Programming applied to transforming constraints in information models

Automatic Programming

Generation of executable programs in ordinary programming languages from higher-level descriptions

i.e. Computer, Program Thyself!

We seek descriptions that are:
- very high level!
- extremely declarative! (what rather than how)
- abstract! (not too concerned with details such as algorithms and data structures)

Forward Engineering: High Level Description --> Program Code

Reverse Engineering: Program Code --> High Level Description

Recap: Case-Based Reasoning

Solving new problems by analogy with old ones.

A Case-Based Reasoner remembers a previous similar problem and adapts its solution to solve the new problem.

(diagram of CBR cycle)

Advantages of Case-Based Reasoning (for Automatic Programming)

Learn from experience and improve capabilities!

Solve problems (write programs) quicker and easier!

Suggest solutions even in poorly-understood domains!

Warn of past mistakes and avoid them! (e.g. bugs)

Good for prototyping!

Disadvantages

Large case libraries take up a lot of space!

Inappropriate retrievals could be counterproductive!

Application: Transforming Constraints in Information Models

What is Information Modelling?

An Information Model defines:
- what entities (objects) exist in a domain
- what attributes (fields/slots) those entities have
- relationships between those entities/attributes
- constraints on those entities/attributes

A schema is a collection of related entities and other gubbins (such as type definitions and global constraints)

Here's an example written in the information modelling language EXPRESS:

SCHEMA University;
	ENTITY Course;
		code : STRING;
		title : OPTIONAL STRING;
		department : STRING;
		number_of_students : INTEGER;
	UNIQUE
		code;
	END_ENTITY;
END_SCHEMA;

Schema-to-Schema Mapping

Mapping data from one format to another

(Data Repository)--->(Data Repository)

Schemata may differ structurally.

Schemata may differ semantically, i.e. in terms of their constraints. e.g.

SCHEMA University_2;
	ENTITY Department;
		name : STRING;
		courses : SET [1 : ?] OF Course;
	END_ENTITY;

	ENTITY Course;
		code : STRING;
		title : STRING;
		students : INTEGER;
	INVERSE
		the_department : Department FOR courses;
	UNIQUE
		code;
	WHERE
		students >= 10 AND students <= 250;
	END_ENTITY;
END_SCHEMA;

Some records in the source repository cannot be transferred, because they would violate constraints in the target repository!

So, it would be useful to transform constraints on the target repository to apply to records in the source repository before transferring those records.

This is where automatic programming comes in - given models of the source and target schemata, generate a program to map from one to the other.

This has been done for structural transformations, but the problem of semantic difference (constraints) remains untackled!

These transformations will be carried out at a higher level of abstraction than on the constraints themselves, which involves a combination of reverse and forward engineering ("reengineering").

(reengineering diagram)

Some Research Questions

Knowledge

What knowledge does an Automatic Programming system need?

              `what'         `how'
         +--------------+--------------+
specific | KNOWLEDGE OF | KNOWLEDGE OF |
         | THE PROBLEM  | THE PROGRAM  |
         +--------------+--------------+
 general | KNOWLEDGE OF | KNOWLEDGE OF |
         | THE DOMAIN   | PROGRAMMING  |
         +--------------+--------------+

What knowledge can be captured automatically from existing code? (reverse engineering)

- Code topology
- Call graphs
- Control flow graphs
- Data flow graphs
- Variable reference graphs
- Resource flow graphs
- State-transition diagrams
- Inheritance hierarchies
- Module dependency graphs
- Various metrics (e.g. cohesion and coupling)

Architecture

What components should be included in an architecture for automatic programming?

What should be automated and what not?

Which parts should be case-based and which not?

From the CBR perspective:

- What constitutes a case? (a whole program, or are parts of a program also cases in their own right?)

- How should cases be represented and indexed?

- How should the case library be organised such that retrieval is efficient as well as effective?

- How should similarity be measured?

- Is analogical reasoning needed?

- Is adaptation needed, and if so, what kinds?

- Is evaluation and repair of solutions required? If so, how should solutions be evaluated?

- What is a sensible policy for memory update? (storing cases)

- Is a multi-stage CBR process needed? If so, how should it be controlled?

Back to CS700

Hosted by www.Geocities.ws