Characters

Why is ASCII used to code Grok32`?

 

�We usually think of characters as letters of the alphabet, but they encompass even more then that.  In fact, a character value can be any member of a machine�s character set.  The available characters and their internal representation depend on the machine on which the program runs.  The most common character sets are ASCII and EBCDIC.�

 

The standard called ASCII (American Standard Code for Information Interchange) provides 128 different symbols that a computer can use. The newer extended standard provides 256 characters.  Here is a link to a 256 ASCII character set.  

 

All modern operating systems recognize ASCII.  Grok32` is defined in 128-character ASCII and parses 128-character ASCII Strings into Expressions.  

 

Here is the 128-character set listed in order of each character�s numeric code:

 

    0  1   2   3   4  5   6   7   8  9   A   B   C  D   E   F

 

0 NUL SOH STX ETX EOT ENQ ACK BEL BS HT  LF  VT  FF CR  SO  SI

1 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS  RS  US

2  SP  !   "   #  $   %   &   '   (   )   *  +   ,   -   .   /

3  0   1   2   3  4   5   6   7   8   9   :  ;   <   =   >  ?

4  @   A   B   C  D   E   F   G   H   I   J  K   L   M   N   O

5  P   Q   R   S  T   U   V   W   X   Y   Z  [   \   ]   ^   _

6  `   a   b   c  d   e   f   g   h   i   j  k   l   m   n   o

7  p   q   r   s  t   u   v   w   x   y   z  {   |   }   ~  DEL

 

Here is an interpretation for the first 32 which are somewhat esoteric members in the above ASCII character table.  These are often referred to as Control Codes.

 

NUL (null)

SOH (start of heading)

STX (start of text)

ETX (end of text)

EOT (end of transmission) - Not the same as ETB

ENQ (enquiry)               

ACK (acknowledge)            

BEL (bell) - Caused teletype machines to ring a bell.  Causes a beep

          in many common terminals and terminal emulation programs.

BS (backspace) - Moves the cursor (or print head) backwards (left) 1 space.

TAB (horizontal tab) - Moves the cursor (or print head) right to the

                                     next tab stop.  The spacing of tab stops is dependent

                on the output device, but is often either 8 or 10.

LF (NL line feed, new line) - Moves the cursor (or print head) to a

                         new line.  On Unix systems,

                                     moves to a new line AND all the way to the left.

VT (vertical tab)           

FF (form feed) - Advances paper to the top of the next page (if the

                 output device is a printer).

CR (carriage return) - Moves the cursor all the way to the left,

                                                  but does not advance to the next line.

SO (shift out) - Switches output device to alternate character set.            

SI (shift in)  - Switches output device back to default character set.

DLE (data link escape)       

DC1 (device control 1)       

DC2 (device control 2)       

DC3 (device control 3)       

DC4 (device control 4)       

NAK (negative acknowledge)   

SYN (synchronous idle)       

ETB (end of transmission block) - Not the same as EOT  

CAN (cancel)                

EM (end of medium)  

SUB (substitute)            

ESC (escape)

FS (file separator)

GS (group separator)

RS (record separator)

US (unit separator)

 

 

 

Most word-processing applications have a �Save� option which allows writing to be saved as �plain� text with no formatting such as tabs, bold or under scoring - the raw format that any computer can understand.  This is usually so the writing can be easily imported into any application without issues.� �Plain� text usually means ASCII text, although there are minor variations such as �MS-DOS Format�.

(�Text Document� (with or without �MS-DOS Format�) | �Unicode Document� | �Rich Text Document�)

 

A drawback to ASCII is its small character set. However, languages such as Japanese and Arabic have thousands of characters. ASCII is not used in these situations. 

 

Grok32` is designed to facilitate CharacterSets of arbitrary complexity.� The object is to model any language without prejudice to the extent this is possible while maintaining an ASCII default.� Furthermore, it is believed that the String embodies fundamental aspects of all verbal communication.� For example, the String could probably be adapted to help parse phonetic linguistics or any character-expression sequence, whether human or not.� Frequently, communication is multi-channel, and a correct �parsing� requires parallel String processing of completely different kinds of �character strings�.� A �character string� could be a sequence of human gestures.� What complex of CharacterString channels are sufficient to characterize a whale specie�s language?

 

The result was Unicode which allowed for up to 65,536 different characters. Since Unicode is more complex it is not implemented on many Operating Systems.

 

 

ISO-Latin1 character set

 

A graphical list of all the characters that may be used in an HTML document.  See ISO Latin-1 Characters and Control Characters Table.

 

 


Grok32`

2004, 2005

by John Van Wie Bergamini.

All rights reserved.

Hosted by www.Geocities.ws

1