Standard ASCII RuleList

 

Here is the 128-character set listed in order of each character’s numeric code:

 

    0  1   2   3   4  5   6   7   8  9   A   B   C  D   E   F

 

0 NUL SOH STX ETX EOT ENQ ACK BEL BS  HT  LF VT  FF  CR  SO  SI

1 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS  GS  RS  US

2  SP  !   "   #  $   %   &   '   (   )   *  +   ,   -   .   /

3  0   1   2   3  4   5   6   7   8   9   :  ;   <   =   >   ?

4  @   A   B   C  D   E   F   G   H   I   J  K   L   M   N   O

5  P   Q   R   S  T   U   V   W   X   Y   Z  [   \   ]   ^   _

6  `   a   b   c  d   e   f   g   h   i   j  k   l   m   n   o

7  p   q   r   s  t   u   v   w   x   y   z  {   |   }   ~  DEL

 

 

This 128-character ASCII is somewhat special because all StringNames are in ASCII. This means that CharacterCode identification is in ASCII, even if the actual CharacterCode is ASCII-free.




Non-Printing Character Tokens

 

Some character codes do not yield a printable character, and they are represented using the backslash, “\”, token in conformance with C-string “escape sequence” coding. Specifically, the standard ASCII RuleList implements the following non-printing character token system:

 

‘\0’                 Null character                        000                    Noop[]

‘\b’                 backspace                                                        Noop

‘\f’                  form feed (top of next page)

‘\n’                 newline

‘\r’                  carriage return

‘\t’                  tab

‘\v’                  vertical tab

‘\’’                   single quotation mark

‘\”’                  double quotation mark

‘\\’                   backslash

‘\xxx’              octal value                              xxx

 

The above conventions are standard C conventions to represent non-printing characters (with printing characters). 

 

The above non-printing Character token system is consistent with C-string conventions because…

            • The above backslach “\” non-printing character conventions are elegant and widely used.

            • Algorithms written in Grok32` are designed to generate C-- code, or C-code interpretable by a C-compilers.

 

The one extension to the C-string-non-printing character system implements QABS because these structures are efficient and flexible.  This extension is also essential if this standard ASCII hopes to live up to its role as default String interpretation.

 

 The above C-string conventions can be implemented as a Token.

  

 

 


ASCII Operators

 

            Formal Expression                                      ASCIIRuleList Expression

 

            Pattern[Sequence]                                      __

            Pattern[Type[head]][Sequence]               __head                        (or       …[Type[head]])

            Pattern[ptrnBody][Sequence]                    __[ptrnBody]              (or       …[ptrnBody]…   )

 

 


Standard ASCII RuleList

 

The Standard ASCII RuleList is documented to have the following characteristics:

 

If the EscapeSequence is not just the CharacterCode for the EndCharacter, then it is specified as follows:

 

                        EscapeSequence = {EscapeSequencePrefix, EscapeSequenceFunction}

 

The EscapeSequence identifies special (non-printing) Characters (like EndCharacter, backspace, tab, EOF etc.), and provides a way to specify Characters by CharacterCode.   A CharacterSet defined with an EscapeSequence, must include the EndCharacter as one outcome.  An example of a noop-character which might be an EscapeSequenceFunction result is EOF (which is the name normally given to the Character signaling the end of a CharacterStream).  The EscapeSequencePrefix is a token that invokes the EscapeSequenceFunction.  It reads a sequence of Characters following the prefix and returns a single CharacterCode as the result.

 

 

Here are the RuleLists without any specific reference to ASCII characters:

 

Noops = {{‘\0’, (*EndCharacter*) 000}, {‘\004’, (*EOFStreamNoop*), 004}, {“\b” (*BackSpace*), 010},

                        {(*StartComment*) ‘/*’, (*EndComment*) ‘*/’},

                        {UnrecognizedCharacter, CC},

                        {“ “, (*EmptySpaceString*) 040}…}

 

 

{“\”, (*StringEscapeCode*) 033},

 

Digits = {{DigitChar, CC}…}

 

Letters = {{ContextMark, CC}, {LetterChar, CC}…}

 

(* The CharacterCode (CC) in the Characters in the following list has been deleted. *)

ParaPuncs = {{‘[‘, ‘,’, ‘]’}, {‘{‘, ‘,’, ‘}’, List}, {‘…’, ‘, ‘, ‘…’, PatternFormSequence}, {StartString_String, Separator_String, EndString_String, function_Name}…}

 

(*         Operators = {ContextMark, StartGroup, EndGroup, operator…}

…where

            Operator = {OperatorString, expressionHead}               *)

 

 

 


Here are the actual RuleLists…

 

The following is almost a completed version of the assignment for the StandardASCIIRuleList

The representation of Characters is still a little murky and undecided. 

One idea is to allow Characters to be given in two different ways…

A file containing a CharacterSet’s RuleList will be named so as to identify the CharacterSet.  The contents of this file are then presumed to contain only Characters from that CharacterSet.  Characters all have the same integer number of bytes…

 

 

Noops={{‘\0’, 0 (*NUL 0 EndCharacter*)},

      {`\ESC`, 33 (*Escape StringEscapeCode*)},

                {`\004`,   4 (* EOT   -end of transmission) - EOF…Stream[Noop]…StreamNoop*)},

                {‘\\’, 92 (* LiteralPrefix *)}, {‘\b’, 8 (*BackSpace*)},

      {‘(*’, StartComment}, {‘*)’, EndComment},

      {‘\\?’, UnrecognizedCharacter},

      {‘ ‘, EmptySpace},

      …}

 

Digits = {{‘0’,48}, {‘1’,49}, {‘2’,50}, {‘3’,51}, {‘4’,52}, {‘5’,53}, {‘6’,54}, {‘7’, 55}, {‘8’, 56}, {‘9’, 57}}

 

Letters = {{‘`’,96}, {‘A’,65}, {‘B’,66}, {‘C’,67}, {‘D’,68}, {‘E’,69}, {‘F’,70}, {‘G’,71}, {‘H’,72}, {‘I’,73}, {‘J’,74}, {‘K’,75}, {‘L’,76}, {‘M’,77}, {‘N’,78}, {‘O’,79}, {‘P’,80}, {‘Q’,81}, {‘R’,82}, {‘S’,83}, {‘T’,84}, {‘U’,85}, {‘V’,86}, {‘W’,87}, {‘X’,88}, {‘Y’,89}, {‘Z’,90}, {‘a’,97}, {‘b’,98}, {‘c’,99}, {‘d’,100}, {‘e’,101}, {‘f’,102}, {‘g’,103}, {‘h’,104}, {‘i',105}, {‘j’,106}, {‘k’,107}, {‘l’,108}, {‘m’,109}, {‘n’,110}, {‘o’,111}, {‘p’,112}, {‘q’,113}, {‘r’,114}, {‘s’,115}, {‘t’,116}, {‘u’,117}, {‘v’,118}, {‘w’,119}, {‘x’,120}, 

{‘y’,121}, {‘z’,122}}

 


ParaPuncs = {{‘[‘(*91*), ‘,’(*44*), ‘]’(*93*)}, (* This is THE Bracket Sequence. *)

      {‘{‘(*123*), ‘,’, ‘}’(*125*), List},
        {'(', '|', ')', AlternativePattern},

      {‘…’, ‘, ‘, ‘…’, PatternFormSequence},

      {StartString_String, Separator_String, EndString_String, function_Name}…}



(* Tokens are identified by Strings rather then CharacterCodes.

      If the String is expressed in ASCII, but is from another CharacterSet, the “\xxx”

      LiteralPrefix Octal Code must be used.

WHY NOT ALLOW BOTH LITERAL STRINGS AND CHARACTERCODE?

*)

Operators = {{‘(‘, 40}, {‘)’, 41}, {‘^’ (*94*), Power}, {‘_’ (*95*), Blank}, {‘<’(*60*),Less}, {‘=’(*61*),Name}, {‘>’(*62*),Greater}, …}

 

 

 

OtherCharacters = {'|'(*124*), '~' (*126*), DEL (*127*) }

 

(*

            The ContextMark is #96 and is FIRST amongst the Letters.

Letters=Compute[Or[(64 < Slot[1] < 91), (95 < Slot[1] < 123)]]

*)

 

(*************************************************)

Examples of Symbolic Conventions from the Language Specification.

(Seperated by horizontal line.)



Formally, the result returned by String[Cardinal["whatever"] matches the following PatternSequence:

 

                _String[_Cardinal...]...


[The following was taken from PatternSequence. ]

 

                            Pattern[Sequence]

 

...also written as...

                                                __

 

...matches any Sequence of Expressions.

 

                            Pattern[Type[head]][Sequence]

 

...also written as...

                                    __head

or

                        __[Type[head]]

 

...matches any Sequence of Expressions with Head matching head.

 

 

(8)                    Pattern[ptrnBody][Sequence]

 

...also written as

                                    __[ptrnBody]


 

...matches any Sequence of Expressions matching the PatternForm, ptrnBody.


Similarly, the following matches nothing or the PatternSequence matching expr.

 

(?)                    Or[Pattern[Noop], Pattern[Sequence][expr]]


...also written as...
                                   ___[expr]

 

In the above  ElicitationForm, the presence of Pattern[Noop], makes this Or[...] into a Pattern which can either match something with no elements, or Pattern[Sequence][expr]. The above form is used throughout this Language Specification, but written as:




____


[The following comes from Pattern's Rule explication.]


If ptrnSet is a PatternSet, the form below is called a Rule.

 

(13)            If[ptrnSet, match]

 

In the Standard ASCII RuleList (13) is written as...

 

                                    ptrnSet -> match

 

A Rule behaves like a PatternSet insofar as it expects input to compare to ptrnSet.
A Rule is unlike a PatternSet insofar as it does not generally return a boolean result.



Reckon[Expression[Branch]]
[exprBranchSequence__] executes each ExpressionAtom in exprBranchSequence.


[The following is from Xor.]

Xor[set1, ___sets] matches elements from set1 found in zero or any even number of elements in "___sets".




"==" means "is equivalent to". If algeb




"===" tests if both sides are equal after both sides match. In othe words, "===" is an equivalence test operator.
E.G.
  If[Sequence === Slot[0][strNam],                         (* This will only be true if "str" is a complex String. *)

        Tally[String[Name][Slot]Slot, Slot[str]],

         strNam[str]
    ]

[The following is from here.]

When String Slot field constructs fail to have any assigned value, the Slot["str"][strDatum] construct returns unchanged.
This means that for each undefined String Slot field alternative,

strDatum:( _Integer | _Integer, _Integer | Integer | Integer, _Integer ),

[The above uses ":" as an operator that invokes a canonizer... that transforms the above into something that will match the following NamePattern.

    Name[strDatum,
           
Or[Pattern[Type[Cardinal]], Pattern[Pattern[Type[Cardinal]], Pattern[Type[Cardinal]]],
                    Pattern[Cardinal], Pattern[Cardinal, Pattern[Type[Cardinal]]]            ]
    ]

]





Not
[xets] is equivalent to "~[xets]"

...

If "string" uses more then one CharacterSet, and SlotOperator is one of the following,

(Slot[Cardinal] | Slot[n] | Slot[m][Cardinal] | Slot[m, n]), then…

SlotOperator["string"]

…returns respectively…

( substring quantity | a specific substring | length of a substring | substring Character ),

where each "substring" is a uniform (and is delimited by) CharacterSet.


 

Substring fields can be specifically accessed using constructions with the form:

                            Slot["string"](  [m] | [m, n] | [Cardinal] | [Cardinal, m] )


"<"...">" delimited Expressions is a ParaPunc for {StartLinkAdrs, EndLinkAdrs}
This comes from the convention of documenting hyperlinks by following the link's location (in the document) with the link address. No quotes.


Here is how the Alternatives ParaPunc is used.

                            Slot["string"](  [m] | [m, n] | [Cardinal] | [Cardinal, m] )




                   __[_String[__Cardinal]]


The above is the short for:


        Pattern[Pattern[Type[String]][Pattern[Type[Cardinal]][Sequence]]][Sequence]


...which is testimony to the formal excellence of the Standard ASCII RuleList!

(14) also has the lovely, editable feature of transforming a Pattern into SubstitutionNames for use in another program. Specifically, (14) becomes:
                        charNamCodes__[charSetNam_String[cCodes__Cardinal]]

SlotSequences are, in the Standard ASCII RuleList, two Cardinal numbers conjoined by an ellipsis.
E.G., "Slot[3, 4...9, 3]".



"This example"-> String[This example]


Glossary


canonizer n
A program that transfoms into a canonical form.


Grok32`

(c) 2004-2008 by
John Van Wie Bergamini.

1









Hosted by www.Geocities.ws

1