SMOKE-16 Portable Character Encoding ----------------------------
$Id: portable.txt,v 1.3 2001/09/07 17:36:43 bsittler Exp $
This document contains a brief description of the SMOKE-16
portable character encoding.                               --BCWS
-----------------------------------------------------------------

NOTE ============================================================
The Blue Madness refers to EBCD*C, a bizarre set of non-standard
character encodings "invented" by a certain ancient and nameless
mainframe manufacturer as part of a devious customer-control plot.
Fortunately, it is rapidly being replaced by ASCII-derived
encodings.
=================================================================

OVERVIEW OF THE SMOKE-16 PORTABLE CHARACTER ENCODING

Object files must remain portable among different SMOKE-16 toolset
hosts. This includes textual information such as symbol names and
archive member names. To keep this information portable among hosts
with incompatible native character encodings (such as The Blue
Madness, ISO 8859-x, SJIS, UTF-8 and ASCII,) the SMOKE-16 toolset uses
a portable character encoding (the "portable encoding") for the
"string table" section of object files (see 'a_out.txt' in the 'doc'
directory for more information on the string table, and SMOKE-16
object files in general.)

dec  oct  hex    char   ascii name
               +------+
  0 0000 0x00  | '\0' | null (nul)
              ...

  7 0007 0x07  | '\a' | bell (bel)
  8 0010 0x08  | '\b' | backspace (bs)
  9 0011 0x09  | '\t' | character tabulation (ht)
 10 0012 0x0a  | '\n' | line feed (lf)
 11 0013 0x0b  | '\v' | line tabulation (vt)
 12 0014 0x0c  | '\f' | form feed (ff)
 13 0015 0x0d  | '\r' | carriage return (cr)
              ...

 32 0040 0x20  | ' '  | space
 33 0041 0x21  | '!'  | exclamation mark
 34 0042 0x22  | '\"' | quotation mark
 35 0043 0x23  | '#'  | number sign
 36 0044 0x24  | '$'  | dollar sign
 37 0045 0x25  | '%'  | percent sign
 38 0046 0x26  | '&'  | ampersand
 39 0047 0x27  | '\'' | apostrophe
 40 0050 0x28  | '('  | left parenthesis
 41 0051 0x29  | ')'  | right parenthesis
 42 0052 0x2a  | '*'  | asterisk
 43 0053 0x2b  | '+'  | plus sign
 44 0054 0x2c  | ','  | comma
 45 0055 0x2d  | '-'  | hyphen-minus
 46 0056 0x2e  | '.'  | full stop
 47 0057 0x2f  | '/'  | solidus
 48 0060 0x30  | '0'  | digit zero
 49 0061 0x31  | '1'  | digit one
 50 0062 0x32  | '2'  | digit two
 51 0063 0x33  | '3'  | digit three
 52 0064 0x34  | '4'  | digit four
 53 0065 0x35  | '5'  | digit five
 54 0066 0x36  | '6'  | digit six
 55 0067 0x37  | '7'  | digit seven
 56 0070 0x38  | '8'  | digit eight
 57 0071 0x39  | '9'  | digit nine
 58 0072 0x3a  | ':'  | colon
 59 0073 0x3b  | ';'  | semicolon
 60 0074 0x3c  | '<'  | less-than sign
 61 0075 0x3d  | '='  | equals sign
 62 0076 0x3e  | '>'  | greater-than sign
 63 0077 0x3f  | '\?' | question mark
 64 0100 0x40  | '@'  | commercial at
 65 0101 0x41  | 'A'  | latin capital letter a
 66 0102 0x42  | 'B'  | latin capital letter b
 67 0103 0x43  | 'C'  | latin capital letter c
 68 0104 0x44  | 'D'  | latin capital letter d
 69 0105 0x45  | 'E'  | latin capital letter e
 70 0106 0x46  | 'F'  | latin capital letter f
 71 0107 0x47  | 'G'  | latin capital letter g
 72 0110 0x48  | 'H'  | latin capital letter h
 73 0111 0x49  | 'I'  | latin capital letter i
 74 0112 0x4a  | 'J'  | latin capital letter j
 75 0113 0x4b  | 'K'  | latin capital letter k
 76 0114 0x4c  | 'L'  | latin capital letter l
 77 0115 0x4d  | 'M'  | latin capital letter m
 78 0116 0x4e  | 'N'  | latin capital letter n
 79 0117 0x4f  | 'O'  | latin capital letter o
 80 0120 0x50  | 'P'  | latin capital letter p
 81 0121 0x51  | 'Q'  | latin capital letter q
 82 0122 0x52  | 'R'  | latin capital letter r
 83 0123 0x53  | 'S'  | latin capital letter s
 84 0124 0x54  | 'T'  | latin capital letter t
 85 0125 0x55  | 'U'  | latin capital letter u
 86 0126 0x56  | 'V'  | latin capital letter v
 87 0127 0x57  | 'W'  | latin capital letter w
 88 0130 0x58  | 'X'  | latin capital letter x
 89 0131 0x59  | 'Y'  | latin capital letter y
 90 0132 0x5a  | 'Z'  | latin capital letter z
 91 0133 0x5b  | '['  | left square bracket
 92 0134 0x5c  | '\\' | reverse solidus
 93 0135 0x5d  | ']'  | right square bracket
 94 0136 0x5e  | '^'  | circumflex accent
 95 0137 0x5f  | '_'  | low line
 96 0140 0x60  | '`'  | grave accent
 97 0141 0x61  | 'a'  | latin small letter a
 98 0142 0x62  | 'b'  | latin small letter b
 99 0143 0x63  | 'c'  | latin small letter c
100 0144 0x64  | 'd'  | latin small letter d
101 0145 0x65  | 'e'  | latin small letter e
102 0146 0x66  | 'f'  | latin small letter f
103 0147 0x67  | 'g'  | latin small letter g
104 0150 0x68  | 'h'  | latin small letter h
105 0151 0x69  | 'i'  | latin small letter i
106 0152 0x6a  | 'j'  | latin small letter j
107 0153 0x6b  | 'k'  | latin small letter k
108 0154 0x6c  | 'l'  | latin small letter l
109 0155 0x6d  | 'm'  | latin small letter m
110 0156 0x6e  | 'n'  | latin small letter n
111 0157 0x6f  | 'o'  | latin small letter o
112 0160 0x70  | 'p'  | latin small letter p
113 0161 0x71  | 'q'  | latin small letter q
114 0162 0x72  | 'r'  | latin small letter r
115 0163 0x73  | 's'  | latin small letter s
116 0164 0x74  | 't'  | latin small letter t
117 0165 0x75  | 'u'  | latin small letter u
118 0166 0x76  | 'v'  | latin small letter v
119 0167 0x77  | 'w'  | latin small letter w
120 0170 0x78  | 'x'  | latin small letter x
121 0171 0x79  | 'y'  | latin small letter y
122 0172 0x7a  | 'z'  | latin small letter z
123 0173 0x7b  | '{'  | left curly bracket
124 0174 0x7c  | '|'  | vertical line
125 0175 0x7d  | '}'  | right curly bracket
126 0176 0x7e  | '~'  | tilde
              ...

               +------+

Not coincidentally, this portable character encoding includes all the
printable ASCII characters and those ASCII control characters having
standard C character escape sequences ('\a', '\b', '\t', '\n', '\v',
'\f' and '\r'.) The null character ('\0') is also included, since it
has the same value in every character encoding; it is used to
terminate entries in the string table.

SYMBOLS AND ARCHIVE MEMBERS

Symbol names and archive member names are restricted to characters
from the portable encoding. This means that your object file names
need to have names which are translatable to ASCII before they can be
placed in a SMOKE-16 object archive. This is to ensure that they can
be manually extracted on a wide range of systems. Unfortunately, it
also means you can't use extended characters from The Blue Madness,
ISO-8859-x, SJIS or UTF-8, arbitrary ASCII control characters, or
other extended characters in symbol names or in archive member
filenames. That's the price of wide portability.

ASSEMBLING PORTABLE PROGRAMS

When you assemble SMOKE-16 programs, the resulting SMOKE-16 object
files will use the assembling machine's native character encoding (the
"native encoding") for character and string constants, by
default. These object files will still work on machines using other
character encodings than the native encoding, but program logic
(including any character and string constants) will still be in the
native encoding.

      When you give the '-portable' option to 'as', character and
      string constants are assembled using the portable encoding
      instead of the native encoding, and characters outside the
      portable character encoding must be referred to numerically,
      using hexadecimal or octal character escape sequences.

Of course, the ideal solution would be for every host to use the same
character set.

RUNNING PORTABLE PROGRAMS

When you run SMOKE-16 executables using the SMOKE-16 emulator 'emu',
strings from the emulated SMOKE-16 environment are passed directly to
the emulating machine's C library and/or system calls, with no
character encoding translation. For this reason, programs must be
assembled using a character encoding compatible with the emulating
machine's execution character encoding.
