Case-Based Reasoning applied to Automatic Programming
Automatic Programming applied to transforming constraints in information models
Automatic Programming
Generation of executable programs
in ordinary programming languages
from higher-level descriptions
i.e. Computer, Program Thyself!
We seek descriptions that are:
- very high level!
- extremely declarative! (what rather than how)
- abstract! (not too concerned with details such as algorithms and data structures)
Forward Engineering: High Level Description --> Program Code
Reverse Engineering: Program Code --> High Level Description
Recap: Case-Based Reasoning
Solving new problems by analogy with old ones.
A Case-Based Reasoner remembers a
previous similar problem and adapts its solution
to solve the new problem.
(diagram of CBR cycle)
Advantages of Case-Based Reasoning
(for Automatic Programming)
Learn from experience and improve capabilities!
Solve problems (write programs) quicker and easier!
Suggest solutions even in poorly-understood domains!
Warn of past mistakes and avoid them! (e.g. bugs)
Good for prototyping!
Disadvantages
Large case libraries take up a lot of space!
Inappropriate retrievals could be counterproductive!
Application:
Transforming Constraints
in Information Models
What is Information Modelling?
An Information Model defines:
- what entities (objects) exist in a domain
- what attributes (fields/slots) those entities have
- relationships between those entities/attributes
- constraints on those entities/attributes
A schema is a collection of related entities and
other gubbins (such as type definitions and global
constraints)
Here's an example written in the information
modelling language EXPRESS:
Schemata may differ semantically, i.e. in terms of
their constraints. e.g.
SCHEMA University_2;
ENTITY Department;
name : STRING;
courses : SET [1 : ?] OF Course;
END_ENTITY;
ENTITY Course;
code : STRING;
title : STRING;
students : INTEGER;
INVERSE
the_department : Department FOR courses;
UNIQUE
code;
WHERE
students >= 10 AND students <= 250;
END_ENTITY;
END_SCHEMA;
Some records in the source repository cannot be
transferred, because they would violate constraints
in the target repository!
So, it would be useful to transform constraints on the
target repository to apply to records in the source
repository before transferring those records.
This is where automatic programming comes in -
given models of the source and target schemata,
generate a program to map from one to the other.
This has been done for structural transformations, but
the problem of semantic difference (constraints)
remains untackled!
These transformations will be carried out at a higher
level of abstraction than on the constraints
themselves, which involves a combination of reverse
and forward engineering ("reengineering").
(reengineering diagram)
Some Research Questions
Knowledge
What knowledge does an Automatic Programming
system need?
`what' `how'
+--------------+--------------+
specific | KNOWLEDGE OF | KNOWLEDGE OF |
| THE PROBLEM | THE PROGRAM |
+--------------+--------------+
general | KNOWLEDGE OF | KNOWLEDGE OF |
| THE DOMAIN | PROGRAMMING |
+--------------+--------------+
What knowledge can be captured automatically from
existing code? (reverse engineering)
- Code topology
- Call graphs
- Control flow graphs
- Data flow graphs
- Variable reference graphs
- Resource flow graphs
- State-transition diagrams
- Inheritance hierarchies
- Module dependency graphs
- Various metrics (e.g. cohesion and coupling)
Architecture
What components should be included in an
architecture for automatic programming?
What should be automated and what not?
Which parts should be case-based and which not?
From the CBR perspective:
- What constitutes a case? (a whole program,
or are parts of a program also cases in their own right?)
- How should cases be represented and indexed?
- How should the case library be organised such that
retrieval is efficient as well as effective?
- How should similarity be measured?
- Is analogical reasoning needed?
- Is adaptation needed, and if so, what kinds?
- Is evaluation and repair of solutions required?
If so, how should solutions be evaluated?
- What is a sensible policy for memory update? (storing cases)
- Is a multi-stage CBR process needed?
If so, how should it be controlled?