Here is the 128-character set listed in order of each character’s numeric code:
0 1 2 3 4 5 6 7 8 9 A B C D E F
0 NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI
1 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US
2 SP ! " # $ % & ' ( ) * + , - . /
3 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4 @ A B C D E F G H I J K L M N O
5 P Q R S T U V W X Y Z [ \ ] ^ _
6 ` a b c d e f g h i j k l m n o
7 p q r s t u v w x y z { | } ~ DEL
This 128-character ASCII is somewhat special because all StringNames are in ASCII. This means that CharacterCode identification is in ASCII, even if the actual CharacterCode is ASCII-free.
Some character codes do not yield a printable character, and they are represented using the backslash, “\”, token in conformance with C-string “escape sequence” coding. Specifically, the standard ASCII RuleList implements the following non-printing character token system:
‘\0’ Null character 000 Noop[]
‘\b’ backspace Noop
‘\f’ form feed (top of next page)
‘\n’ newline
‘\r’ carriage return
‘\t’ tab
‘\v’ vertical tab
‘\’’ single quotation mark
‘\”’ double quotation mark
‘\\’ backslash
‘\xxx’ octal value xxx
The above conventions are standard C conventions to represent non-printing characters (with printing characters).
The above non-printing Character token system is consistent with C-string conventions because…
• The above backslach “\” non-printing character conventions are elegant and widely used.
• Algorithms written in Grok32` are designed to generate C-- code, or C-code interpretable by a C-compilers.
The one extension to the C-string-non-printing character system implements QABS because these structures are efficient and flexible. This extension is also essential if this standard ASCII hopes to live up to its role as default String interpretation.
The above C-string conventions can be implemented as a Token.
Formal Expression ASCIIRuleList Expression
Pattern[Sequence]
Pattern[Type[head]][Sequence]
__head
(or …[Type[head]]…
Pattern[ptrnBody][Sequence]
__[ptrnBody]
(or …[ptrnBody]…
The Standard ASCII RuleList is documented to have the following characteristics:
If the EscapeSequence is not just the CharacterCode for the EndCharacter, then it is specified as follows:
EscapeSequence = {EscapeSequencePrefix, EscapeSequenceFunction}
The EscapeSequence identifies special (non-printing) Characters (like EndCharacter, backspace, tab, EOF etc.), and provides a way to specify Characters by CharacterCode. A CharacterSet defined with an EscapeSequence, must include the EndCharacter as one outcome. An example of a noop-character which might be an EscapeSequenceFunction result is EOF (which is the name normally given to the Character signaling the end of a CharacterStream). The EscapeSequencePrefix is a token that invokes the EscapeSequenceFunction. It reads a sequence of Characters following the prefix and returns a single CharacterCode as the result.
Noops = {{‘\0’, (*EndCharacter*) 000}, {‘\004’, (*EOF …StreamNoop*), 004}, {“\b” (*BackSpace*), 010},
{(*StartComment*) ‘/*’, (*EndComment*) ‘*/’},
{“ “, (*EmptySpaceString*) 040}…}
{“\”, (*StringEscapeCode*) 033},
Letters = {{ContextMark, CC}, {LetterChar, CC}…}
(* The CharacterCode (CC) in the Characters in the following list has been deleted. *)
ParaPuncs = {{‘[‘, ‘,’, ‘]’}, {‘{‘, ‘,’, ‘}’, List}, {‘…’, ‘, ‘, ‘…’, PatternFormSequence}, {StartString_String, Separator_String, EndString_String, function_Name}…}
(* Operators = {ContextMark, StartGroup, EndGroup, operator…}
…where
Operator = {OperatorString, expressionHead} *)
The following is almost a completed version of the assignment for the StandardASCIIRuleList.
The representation of Characters is still a little murky and undecided.
One idea is to allow Characters to be given in two different ways…
A file containing a CharacterSet’s RuleList will be named so as to identify the CharacterSet. The contents of this file are then presumed to contain only Characters from that CharacterSet. Characters all have the same integer number of bytes…
Noops={{‘\0’, 0 (*NUL 0 EndCharacter*)},
{`\ESC`, 33 (*Escape StringEscapeCode*)},
{`\004`, 4 (* EOT -end of transmission) - EOF…Stream[Noop]…StreamNoop*)},
{‘\\’, 92 (* LiteralPrefix *)}, {‘\b’, 8 (*BackSpace*)},
{‘(*’, StartComment}, {‘*)’, EndComment},
{‘\\?’, UnrecognizedCharacter},
{‘ ‘, EmptySpace},
…}
Digits = {{‘0’,48}, {‘1’,49}, {‘2’,50}, {‘3’,51}, {‘4’,52}, {‘5’,53}, {‘6’,54}, {‘7’, 55}, {‘8’, 56}, {‘9’, 57}}
Letters = {{‘`’,96}, {‘A’,65}, {‘B’,66}, {‘C’,67}, {‘D’,68}, {‘E’,69}, {‘F’,70}, {‘G’,71}, {‘H’,72}, {‘I’,73}, {‘J’,74}, {‘K’,75}, {‘L’,76}, {‘M’,77}, {‘N’,78}, {‘O’,79}, {‘P’,80}, {‘Q’,81}, {‘R’,82}, {‘S’,83}, {‘T’,84}, {‘U’,85}, {‘V’,86}, {‘W’,87}, {‘X’,88}, {‘Y’,89}, {‘Z’,90}, {‘a’,97}, {‘b’,98}, {‘c’,99}, {‘d’,100}, {‘e’,101}, {‘f’,102}, {‘g’,103}, {‘h’,104}, {‘i',105}, {‘j’,106}, {‘k’,107}, {‘l’,108}, {‘m’,109}, {‘n’,110}, {‘o’,111}, {‘p’,112}, {‘q’,113}, {‘r’,114}, {‘s’,115}, {‘t’,116}, {‘u’,117}, {‘v’,118}, {‘w’,119}, {‘x’,120},
{‘y’,121}, {‘z’,122}}
ParaPuncs = {{‘[‘(*91*), ‘,’(*44*), ‘]’(*93*)}, (* This is THE Bracket Sequence. *)
{‘{‘(*123*), ‘,’, ‘}’(*125*), List},
{'(', '|', ')', AlternativePattern},
{‘…’, ‘, ‘, ‘…’, PatternFormSequence},
{StartString_String, Separator_String, EndString_String, function_Name}…}
(* Tokens are identified by Strings rather then CharacterCodes.
If the String is expressed in ASCII, but is from another CharacterSet, the “\xxx”
LiteralPrefix Octal Code must be used.
WHY NOT ALLOW BOTH LITERAL STRINGS AND CHARACTERCODE?
*)
Operators = {{‘(‘, 40}, {‘)’, 41}, {‘^’ (*94*), Power}, {‘_’ (*95*), Blank}, {‘<’(*60*),Less}, {‘=’(*61*),Name}, {‘>’(*62*),Greater}, …}
OtherCharacters = {'|'(*124*), '~' (*126*), DEL (*127*) }
(*
The ContextMark is #96 and is FIRST amongst the Letters.
Letters=Compute[Or[(64 < Slot[1] < 91), (95 < Slot[1] < 123)]]
*)
(*************************************************)
Formally, the result returned by String[Cardinal["whatever"]
matches the following PatternSequence:
[The following was taken from PatternSequence.
]
Pattern[Sequence]
...also written as...
__
...matches any Sequence
of Expressions.
Pattern[Type[head]][Sequence]
...also written as...
__head
or
...matches any Sequence
of Expressions
with Head
matching head.
(8)
Pattern[ptrnBody][Sequence]
...also written as
__[ptrnBody]
...matches any Sequence
of Expressions
matching the PatternForm,
ptrnBody.
Similarly, the following matches nothing or the PatternSequence
matching expr.
(?)
Or[Pattern[Noop], Pattern[Sequence][expr]]
...also written as...
___[expr]
In the above ElicitationForm,
the presence of Pattern[Noop], makes
this Or[...]
into a Pattern
which can either match something with no elements, or Pattern[Sequence][expr].
The above form is used throughout this Language
Specification, but written as:
____
[The following comes from Pattern's Rule explication.]
If ptrnSet is a PatternSet,
the form below is called a Rule.
(13) If[ptrnSet, match]
In the Standard ASCII RuleList (13) is written as...
ptrnSet -> match
A Rule behaves like a PatternSet
insofar as it expects input
to compare to ptrnSet.
A Rule is unlike a PatternSet
insofar as it does not generally return a boolean result.
Reckon[Expression[Branch]][exprBranchSequence__] executes
each ExpressionAtom
in �exprBranchSequence�.
[The following is from Xor.]
Xor[set1, ___sets] matches elements from set1 found in zero or any even number of elements in "___sets".
"==" means "is equivalent to". If algeb
"===" tests if both sides are equal
after both sides match. In othe words, "===" is an equivalence test
operator.
E.G.
If[Sequence === Slot[0][strNam],
(* This will only be true if "str" is a complex String. *)
Tally[String[Name][Slot]Slot, Slot[str]],
strNam[str][The following is from here.]
Not[xets] is equivalent to "~[xets]"
...
If "string" uses more then one CharacterSet, and SlotOperator is one of the following,
(Slot[Cardinal] | Slot[n] | Slot[m][Cardinal] | Slot[m, n]), then…
SlotOperator["string"]
…returns respectively…
( substring quantity | a specific substring | length of a substring | substring Character ),
where each "substring" is a uniform (and is delimited by) CharacterSet.
Substring fields can be specifically accessed
using constructions with the form:
Slot["string"](
[m] | [m, n] | [Cardinal] | [Cardinal, m] )
__[_String[__Cardinal]]
The above is the short for:
Pattern[Pattern[Type[String]][Pattern[Type[Cardinal]][Sequence]]][Sequence]
...which is testimony to the formal excellence of the Standard
ASCII RuleList!
(c) 2004-2008 by
John Van Wie Bergamini.