HOW-WRKS.TXT - Explains how ECC-Eliza V3.80 works and functions.

ECC-Eliza Homepage: http://ecceliza.cjb.net
Alternate Homepage: http://users.surfree.net.il/juventus    (more reliable)

Introduction and explanations

This file is useful if you are interested in more information on how the
program works and how the data file works.

The data file (ELIZA.DAT) is the part of the program which is used to
recognize the user's text (according to different parts of it) and answer
according to the data found.

90% of what Eliza says is found in the data file. The rest of what Eliza
says is found inside the executable file (ECCELIZA.EXE) and thus it cannot
be modified (under normal conditions).
The reason for this is that the build-in sentences that Eliza says are
general and do not depend upon the information of the user's sentences.
For example, if you noticed Eliza may complain if you repeat yourself or
leave the prompt empty after pressing Enter. For such cases the program
calls the build-in routine which answers accordingly, let's say:
"DON'T REPEAT", if you repeat, etc.
You don't have to worry in such cases, Eliza will handle them fine.
These build-in language sentences include many events such as: asking the
user for his/her name, greeting the user, blank input, repeating,
too short quote, too long quote, quote contains gibberish (such as DGFHJK),
the conversation is getting long, asking to talk about a specific subject,
warning the user to "behave" well, dumping the user, checking if the user
is sure about quitting, complaining if the user quits too soon, saying
a normal goodbye to the user.

ECCELIZA.EXE interacts with the data file when it loads the program
(and possibly later too, but rarely) in order to gather the information from
there and into the memory. It cannot work without the data file.


Basic Data file structure

The data file has a special design. But in order to explain to you it's
structure I will first explain it to you in general, and how the main
executable (ECCELIZA.EXE) uses it.
Once you type in a quote (to Eliza), suppose it has no problem and none
of the build-in sentences are used (if you repeat then there is a good
probability that a build-in reply will be used, but suppose not), Eliza
will check the data file (which it has copied into memory).
The data file is divided into categories. Each category has keywords and
replies. Keywords are text patterns to be found in the user's quote.
A keyword can be longer than one word (several words), but not longer than
40 characters. ECC-Eliza searches the different categories (by their
order). In each category Eliza checks if ANY of the keywords are found
in the user's text, for example: MONEY is found inside I WANT MONEY, but
not found inside: I WANTMONEY. Next, Eliza will check the replies of the
specific category (the one she found a matching pattern in), and will
write one of the replies (chosen mostly randomly). There needs to be at
least one reply and at least one keyword in a category.
In case no keyword matched the user's text, there is a category named
KLast which holds different replies for when no match has been found.


Parameters

Now that you've understood the basic structure of the data file, I'll teach
you special parameters for the data file. A parameter is a special variable
which substitutes an expression (in ECC-Eliza data file terms).
You might have noticed that Eliza calls you by name many times.
In the data file, a reply might be: |, HOW ARE YOU? Notice the | sign.
This sign will be substituted with the user's name, which Eliza asks for when
you start the program. Whenever the program find the "|" character it will
substitute it with the user's name.
Another important parameter is "\". The backslash (\) sign is used to
indicate the keyword used by Eliza. Whenever a backslash is found Eliza will
substitute it with the keyword (pattern) it has found in the user's text
and inside the data file. If there is only one keyword, it is not necessary.
But many times a category contains multiple keywords, and in that case you
might one to call up the keyword which the user has written.
An additional parameter is the exponent "^" character. The "^" character
is substituted with the time of day in words that Eliza encounters.
If it's a special day it might be the name of it (for example: BOXING DAY).
It may also be the time of day (e.g. AFTERNOON), day of week (e.g. MONDAY),
or both the time of day and weekday (e.g. MONDAY AFTERNOON).
There are two other more important parameters. A reply consisting of any
of them is called a variable reply. The two signs are asterisk "*", and
underscore "_". Please note that ONLY 1 of them could be used in a reply,
and it cannot appear more than once in that reply (unlike the above
parameters). The underscore (_) character is a simplification of the asterisk
character (*). The asterisk character returns the text found after the
matched keyword and before a punctuation mark or breaking word.
For instance, if you typed: I AM VERY HAPPY, and the keyword found is I AM,
and the reply in the data file is WHY ARE YOU *?, Eliza will say:
WHY ARE YOU VERY HAPPY. Eliza substituted * with VERY HAPPY, because VERY
HAPPY was found after the identified keyword I AM. Another example for the
above data file example: BECAUSE I AM SAD, RIGHT?, would cause Eliza to
reply: WHY ARE YOU SAD?, Eliza will substitute * only for SAD, because there
is a punctuation mark after it which signals another part of speech.
Some words (breaking words) are treated just like asterisk (*), such words
indicate another part of speech, such as: IF, AND, IN ADDITION, etc.
The asterisk character DOES NOT return what has been found BEFORE the keyword,
only after it.
The underscore character (_) returns ONLY the first word which is found
after the identified keyword. Actually it returns the first word of *.
Sometimes it is useful, but rarely.
Another parameter which is rather an array of characters is Q123.
When Q123 appears as the reply (with nothing in front of it, and nothing
before it) it means that Eliza will query the user if he/she wants to quit.
This should only be used on the quitting category. In fact, it is not the
typical parameter, but rather a program instruction.
The underscore (_), asterisk (*), and backslash (\) are going through a
pronouns reversing routine, read below for more information.


Pronouns reversing routine

Variable replies (consisting of underscore or asterisk) are supposed contain
the user's text. This is done in order to make Eliza feel more personalized.
If you say: YOU ARE A MAD COMPUTER, and it replies to you: I AM NOT A MAD
COMPUTER, it gives you the feeling as if it understands you (, I hope).
But remember that Eliza has a different perspective than you. If it repeats
you like a parrot it is meaningless. For example, you say I AM NOT YOUR
PATIENT, and she replies WHY ARE YOU NOT YOUR PATIENT? (for WHY ARE YOU *?),
it sounds bad. And so, for the underscore (_) and asterisk (*) parameters
Eliza reverses the pronouns. That is, every time it finds "YOUR" it changes
it to "MY" and vice versa. But applies to 25 other occasions, such as YOU'VE
to I'VE, YOU ARE to I AM, etc. ECC-Eliza does it pretty good. The only
problem happens with "YOU". "YOU" can be "translated" into either I or ME.
You have to know English much better than a computer in order to do it.
Usually Eliza gets it right, but not always. Other than that, Eliza should
handle all other cases flawlessly.
Please note that for the backslash (\) parameter Eliza ALSO reverses the
pronouns. Although it is not helpful if you want to quote the user, you
will find it useful many times when the keyword contains a pronoun such as
I, YOU, MINE, etc.


Priority

ECC-Eliza scans the categories by order. Once Eliza recognizes a keyword
pattern inside the user's text it immediately looks for the reply in the
matching category, regardless if there are other matching keywords in the
data file. This is the reason why different categories have different
priorities. The category which Eliza scans first has the highest priority,
the 2nd category has the 2nd highest priority and so on. KLast has the
lowest priority. The categories included in the data file are sorted by
the priority we found best. You may play around with it a little.
CATEGORY.TXT file lists the different categories found by their priority
(order in the data file). The highest category in the data file (uppermost)
has the highest priority.
When you configure the categories make sure it is sorted the way you want.


Priority mismatches

If a category is listed below another category (has less priority), and
contains a sub-keyword of another, it is considered a stuck category.
For example, one category has the keyword: YOU. The other category has
lower priority (comes after it), and has the keyword YOU ARE.
The second category (YOU ARE) is stuck. This is due to the fact that if
YOU is found in the user's text the first category will be used.
YOU ARE is a sub-keyword of YOU because every sentence that contains
YOU ARE contains YOU, but not every sentence that contains YOU also
contains YOU ARE.
Make sure not to have such mismatches, or else what you did is useless.


Data file code

The data file contains keywords and replies. A category is actually a
keyword-reply block. Such a block contains an amount of keywords starting
with the letter K which indicates keywords. After the keyword part of that
block (one keyword per line), it is followed by the letter R which indicates
reply. For each reply (and keyword) the letter R (or K) needs to be written.
A keyword-reply block example would be:

K I AM
K I'M
R WHY ARE YOU *?
R I AM * TOO
R YOU ARE LYING TO ME
R NO, YOU ARE NOT *

This category may be an "I AM" category. Another category may immediately
follow it.

You may insert a "#" sign which indicates a note. Everything written on
the same line of the "#" character and after it is ignored by Eliza, and is
used to notate the data file. The "#" character, like other characters, must
appear as the first character of the line.
Usually the "#" character separates between categories, but it is not needed.
The KLast category appears as: K KLast, the rest is as usual.


Subject and Missing asterisk field

Two new recent additions to the data file are two more fields, just like
the K, R and # characters which indicate a field.
The subject field is the name of the subject which the category handles.
It is optional. It is used when Eliza asks the user to speak about a previous
subject (appears as a build-in sentence, one of many such sentences).
Eliza inserts the subject's name into the build-in sentence. The subject's
name cannot exceed 25 characters (the next characters will be ignored).
Make sure the subject's name appears in plural (FAMILIES), or is a general
word (PEOPLE), or is in verb form (BEING HATED, YOUR LOVE TOWARDS ME, etc).
The subject field starts with the S character on the first character of a
line and appears between the last keyword of a category (K) and the first
reply of a category (R). As said, it is only optional.
The second new field is the M field which indicates missing asterisk.
Sometimes the value of asterisk is nil (does not exist). For example,
if the keyword is I AM, and the reply is I WAS *, an input such as I AM!
would yield I WAS, this does not sound correct. In such a case you could
insert the M field anywhere between the last keyword and the first reply
of a category. The word or words coming after M are limited to 25 characters
and indicate the value of asterisk if it was supposed to be nil.
For the above example, M WHAT YOU ARE, would result in I WAS WHAT YOU ARE.
The value of missing asterisk is NOT going through a pronouns reversing
process, and appears "as is".


Internal program instruction field

A field that should NOT be used is the P field. P is followed by a
number which indicates a program instruction. You shouldn't use it.
It is used to indicate the program what to do if a specific category
was encountered. Currently it is not used much.
In future versions it may control the program's behavior.


Tips on modifying the data file

First of all, you need to be creative. The tools (parameters) available for
you open you are wide variety of choices. This does not mean that the
time-of-day parameter (^) needs to be written every two replies, but you may
integrate it in many replies. I would recommend you to check the data file
for yourself and see how such parameters are being used.
If you are interested in modifying a specific category by adding keywords,
it is important that you check for any asterisks (*) or backslashes (\) in
the replies. If a reply contains a backslash make sure the new keyword would
fit that reply logically. Also make sure that the words that could come after
that keyword in a sentence would fit the asterisk (*) parameter.
You must think of generally every possible option. The more time you spend
on that, the more logical Eliza may sound in different events.
You should also check for priority mismatches, and make sure that your
keywords are in the right priority.
You will uncover additional tips once you get to know the usage better.
Please note that the start of the data file contains useful and summarized
information on how to edit it. You may find that helpful.


Behind the scenes work

The way the data file works is listed in this information file.
However, ECC-Eliza does NOT always use the data file as listed here.
ECC-Eliza processes the user's text many times before it contacts the
data file contents. In addition, there is a chance that keywords will be
skipped even if identified due the program's decision.
There are many other "behind the scenes" work that are used to promise
better manipulation of the user's text. For example those decides the
program's attitude, and helps it decide upon a reply from a variety of
replies if it encounters a different attitude in it.
So the bottom line is that although Eliza follows many of the guidelines
as said here, it may also act differently many times. But as said, this
is done to make Eliza better, and in our opinion, it achieves that.


Limitations and errors

The data file and the program have some limitations:

- The maximum reply length is 255 characters.
- The maximum keyword length is 40 characters.
- Missing asterisk length is limited to 25 characters.
- Subject name length is limited to 25 characters.
- Amount of different categories is limited to 420.
  If you exceed this number the program might act strange.
- Size of data file is limited to 300KB - 500KB, depends on memory state.
- Up to 1 asterisk (*) parameter may be used in a reply.
- Up to 1 underscore (_) parameter may be used in a reply.
- Up to 1 backslash (\) parameter may be used in a reply.
- A reply cannot contain both an asterisk and underscore parameters.

The only valid fields of the data files are #, K, R, P, M and S.
If a field begins with another character it will be ignored and you will
be notified at the execution of the program (ECCELIZA.EXE), with the line
number and line context.







