chcode.doc     3/01/91

**********************************************************************
Copyright (c) 1991

The Workstation Vertical Integeration Project, PDD, III

Share and Enjoy it.
Any part of this program or publication can be reproduced, transmitted,
transcribed stored in your system, or translated into any language or
computer language, in any form or by any means, electronic, mechanical,
magnetic, optical, chemical, biological, and otherwise if you like.
Welcome any question.

The Workstation Vertical Integeration Project, PDD, III
8FL., 106, HOPING E. RD., SEC. 2 TAIPEI, TAIWAN, R.O.C.
(02) 7377197 Dan_Yi Liu

**********************************************************************

            The Transformation of Chinese Utility Codes

<A> NAME

  chcode -- convert any one of these Chinese codes(BIG5, IBM 5550,
            IBM HOST, NSC, EUC, TCA, TELegraph) to the other seven.

<B> SYNOPSIS

  chcode <INop> infile <OUTop> outfile

<C> DESCRIPTION

      'chcode' is used to convert an existed file of one of the Chinese
      codes to an output file of the other seven and must be given the
      following arguments:
      ( Note: If chcode is given no argument, you will get the usage. )

          <INop>   This is the type of the Chinese codes in the
                   input file.
          infile   This is the name of the input file which will
                   be converted by the program.
          <OUTop>  This is the type of the Chinese codes in the
                   output file.
          outfile  This is the name of the destination file to
                   which the conversion results of the program
                   are transfered.

      Options to chcode must always precede file names.  The following
      options are available for chcode:

                   B   BIG5 code
                   I   IBM 5550 code
                   H   IBM HOST code
                   N   NSC(National Standard Code) Internal code
                   P   NSC with Protocol code
                   E   EUC code
                   T   TCA code
                   L   TELegraph code

<D> CODE STRUCTURE

      The number of different Chinese internal code systems developed
      by various orgamizations in Taiwan has grown too large to
      facilitate data transfer between different systems ( especially
      through net_works ). This program alleviates the above difficulty
      by providing convertions among eight internal codes that are most
      commonly used. Most of these Chinese internal codes consist of
      2_byte ASCII codes except for EUC, which control code SI
      (Shift In, 0f) and SO (Shift Out, 0e) to distinguish Chinese mode
      from English mode. As for NSC (with protocal), ISO 2022 standard
      is followed. In this program, the size of the data type of a
      code, called 'CODE', is four bytes.

      The placement of code is as follows:

           +-------------------------------------------------+
           |         typedef  union                          |
           |                {                                |
           |                 unsigned char Byte[4];          |
           |                 long code;                      |
           |               } CODE;                           |
           +-------------------------------------------------+

           2-byte code:
           ( BIG5, IBM 5550, NSC Internal, TCA, TEL codes. )
              Byte[0] ::= High Byte
              Byte[1] ::= Low Byte

           4-byte code:
           ( EUC code. )
              Byte[0] ::= 1'st Byte
              Byte[1] ::= 2'nd Byte
              Byte[2] ::= 3'rd Byte
              Byte[3] ::= 4'th Byte

           Protocol code:
           ( IBM Host, NSC with Prltocol codes. )
              CODE ::= [Control_Code] Internal_Code

      Additionally, the greatest difference between codes lies in the
      difference in code space. We illustrate as follows:

         (1) BIG5 code range

                      |  40H      7EH      A1H      FEH   (Lbyte)
               -------+---+--------+--------+--------+----
                      |   |        |        |        |
                  A1H +---+--------+--------+--------+----
                      |   |////////|        |////////|
                      |   |////////|        |////////|
                  C8H +---+--------+--------+--------+----
                  C9H +---+--------+--------+--------+----
                      |   |////////|        |////////|
                      |   |////////|        |////////|
                  FEH +---+--------+--------+--------+----
               (Hbyte)|   |        |        |        |

          1'st Plane: A140H -> C8FEH
             Symbol Set ------------------- A140H -> A3E0H
             Frequential Used Set --------- A440H -> C67EH
          2'nd Plane: C940H -> FEFEH
             Later Frequential Used Set --- C940H -> F9D5H

         (2) IBM 5550 code range

                      |  40H      7EH      A1H      FCH   (Lbyte)
               -------+---+--------+--------+--------+----
                      |   |        |        |        |
                  81H +---+--------+--------+--------+----
                      |   |////////|        |////////|
                      |   |////////|        |////////|
                  A8H +---+--------+--------+--------+----
                  A9H +---+--------+--------+--------+----
                      |   |////////|        |////////|
                      |   |////////|        |////////|
                  FCH +---+--------+--------+--------+----
               (Hbyte)|   |        |        |        |

          1'st Plane: 8140H -> A8FCH
             Symbol Set -------------------- 8140H -> 8B61H
             Frequential Used Set ---------- 8C40H -> A8C9H
          2'nd Plane: A940H -> FCFCH
             Less Frequential Used Set ----- A940H -> D1C4H

         (3) NSC code range

                      |  21H                        7EH   (Lbyte)
               -------+---+--------------------------+----
                      |   |                          |
                  21H +---+--------------------------+----
                      |   |//////////////////////////|
                      |   |//////////////////////////|
                      |   |//////////////////////////|
                      |   |//////////////////////////|
                      |   |//////////////////////////|
                      |   |//////////////////////////|
                  7EH +---+--------------------------+----
               (Hbyte)|   |                          |

          1'st Plane: 2121H -> 7E7EH
             Symbol Set -------------------- 2121H -> 4241H
             Frequential Used Set ---------- 4421H -> 7D4CH
          2'nd Plane: 2121H -> 7E7EH
             Less Frequential Used Set ----- 2121H -> 7244H
             User Definition Set ----------- 2121H -> 6435H

          . Designate / Transfer plane:

            1>. NSC Internal code
                1'st Plane: InCode = Code + 0x8080
                2'nd Plane: InCode = Code + 0x8000

            2>. NSC code with Protocols
                ProCode = [ <Ctrl_code> ] Code
                          <Ctrl_code> ::= <Designate_code> |
                                          <Transfer_code>
                a>. Designate plane:
                    English mode --
                      <Designate_code> ::= ESC <G_set> <F>
                    Chinese mode --
                      <Designate_code> ::= ESC 2/4 <G_set> <F>
                               <G_set> ::= <G0> | <G1> | <G2> | <G3>
                                  <G0> ::= 2/8
                                  <G1> ::= 2/9
                                  <G2> ::= 2/A
                                  <G3> ::= 2/B
                                   <F> ::= 3/0 | 3/1 | 3/2 | 3/3 |
                                           3/4 | 3/5 | 3/6 | 3/7 |
                                           3/8 | 3/9 | 3/A | 3/B |
                                           3/C | 3/D | 3/E | 3/F
                b>. Transfer plane:
                    <transfer_code> ::= <SI> | <SO> |
                                        <LS2> | <LS3> | <SS2> | <SS3>
                               <SI> ::= 0/F
                               <SO> ::= 0/E
                              <LS2> ::= ESC 6/E
                              <LS3> ::= ESC 6/F
                              <SS2> ::= ESC 4/E
                              <SS3> ::= ESC 4/F

         (4) EUC code range

          1'st Plane:
                                   +----------+----------+
                                   | A1 -- FE | A1 -- FE |
                                   +----------+----------+
          2'nd Plane:
             +----------+----------+----------+----------+
             |    8E    | A2 -- B0 | A1 -- FE | A1 -- FE |
             +----------+----------+----------+----------+

          1'st Plane: A1A1H -> FEFEH
             Symbol Set -------------------- A1A1H -> C2C1H
             Frequential Used Set ---------- C4A1H -> FDCBH
          2'nd Plane: 8EA2A1A1H -> 8EA2FEFEH
             Less Frequential Used Set ----- 8EA2A1A1H -> 8EA2F2C4H
             Lest Frequential Used Set ----- 8EAEA1A1H -> 8EAEE4B5H

         (5) TCA code range

                      |  30H 39H  41H 5AH  61H 7AH  80H    FDH   (Lbyte)
               -------+---+---+----+---+----+---+----+------+----
                      |   |   |    |   |    |   |    |      |
                  81H +---+---+----+---+----+---+----+------+----
                      |   |///|    |///|    |///|    |//////|
                      |   |///|    |///|    |///|    |//////|
                  AFH +---+---+----+---+----+---+----+------+----
                  B0H +---+---+----+---+----+---+----+------+----
                      |   |///|    |///|    |///|    |//////|
                      |   |///|    |///|    |///|    |//////|
                  DEH +---+---+----+---+----+---+----+------+----
                  DFH +---+---+----+---+----+---+----+------+----
                      |   |///|    |///|    |///|    |//////|
                      |   |///|    |///|    |///|    |//////|
                  FDH +---+---+----+---+----+---+----+------+----
               (Hbyte)|   |   |    |   |    |   |    |      |

          1'st Plane: 8130H -> AFFDH
             Symbol Set -------------------- A140H -> A3E0H
             Frequential Used Set ---------- A440H -> C67EH
          2'nd Plane: B030H -> DEFDH
             Later Frequential Used Set ---- C940H -> F9D5H
          User definition Plane: DF30H -> FCFDH
          Code Extend Plane: FD30H -> FDFDH

         (6) TELegraph code range

                      |  21H      7EH      A1H      FEH   (Lbyte)
               -------+---+--------+--------+--------+----
                      |   |        |        |        |
                  A1H +---+--------+--------+--------+----
                      |   |////////|        |////////|
                      |   |////////|        |////////|
                      |   |////////|        |////////|
                      |   |////////|        |////////|
                  FEH +---+--------+--------+--------+----
               (Hbyte)|   |        |        |        |

          1'st Plane: A1A1H -> FEFEH
             Frequential Used Set ---------- A1A1H -> F6D8H
          2'nd Plane: A121H -> FE7EH
             Less Frequential Used Set ----- A121H -> FB3EH

         (7) IBM Host code range

                      |  40H      7FH      81H      FDH   (Lbyte)
               -------+---+--------+--------+--------+----
                      |   |        |        |        |
                  41H +---+--------+--------+--------+----
                      |   |////////|        |////////|
                      |   |////////|        |////////|
                  46H +---+--------+--------+--------+----
                  4CH +---+--------+--------+--------+----
                      |   |////////|        |////////|
                      |   |////////|        |////////|
                  91H +---+--------+--------+--------+----
               (Hbyte)|   |        |        |        |

             Symbol Set -------------------- 4141H -> 46F9H & (4040H)
             Frequential Used Set ---------- 4C41H -> 68CAH
             Less Frequential Used Set ----- 6941H -> 91C5H

<E> FUNCTION STRUCTURE

  ... Alter(InString, OutString, InCodeNo, OutCodeNo) function
      input>.  InString : Input string that is to be converted.
               InCodeNo : The internal code No. in InString.
               OutCodeNo: The internal code No. in OutString.
                      Internal Code No:
                                       0: BIG5
                                       1: IBM 5550
                                       2: NSC Internal code
                                       3: EUC
                                       4: TCA
                                       5: TELegraph
                                       6: IBM Host
                                       7: NSC with Protocol
      output>. OutString: Send the converted result to 'OutString'
                          and transfer chinese mode to English mode.
      return>. Upon successful completion, a string is returned.
               Otherwise, a value of -1 is returned, and ErrorNo
               is set to indicate the error.

  ... Get2Byte(string) function
      input>.  string: Input the address of a string.
      output>. Return 1 (ASCII code) or 2 byte(s).
               (string address will increase by 1 or 2)

  ... Get4Byte(string) function
      input>.  string: Input a string.
      output>. Return 1 (ASCII code), 2 or 4 byte(s).
               (string address will increase by 1, 2 or 4)

  ... GetIBMHcode(string) function
      input>.  string: Input a string.
      output>. Return 1 (ASCII code), or 2 byte(s).
               (string address will increase by 1, or 2)
               Set SI, or SO flag.

  ... GetNSCPcode(string) function
      input>.  string: Input a string.
      output>. Return 1 (ASCII code), or 2 byte(s).
               (string address will increase by 1, or 2)
               Set SI, SO, LS2, LS3, SS2, or SS3 flag(s).

  ... BIG5order(b5) function
      input>.  1, or 2 byte(s) code.
      output>. If the input code is ASCII, return the sum of the
               Hex. value of the ASCII code plus the value of
               'ASCIIOffset'. Else, if the input code is Symbol,
               return the sum of the sequential No. of the Symbol
               plus the value of 'SymOffset'. Otherwise, if the
               input code is word, return the sequential No. of
               the word.
      return>. Upon successful completion, a value of the number
               is returned. Otherwise, a value of -1 is returned,
               and ErrorNo is set to indicate the error.

  ... IBMorder(ibm) function
      input>.  1, or 2 byte(s) code.
      output>. If the input code is ASCII, return the sum of the
               Hex. value of the ASCII code plus the value of
               'ASCIIOffset'. Else, if the input code is Symbol,
               return the sum of the sequential No. of the Symbol
               plus the value of 'SymOffset'. Otherwise, if the
               input code is word, return the sequential No. of
               the word.
      return>. Upon successful completion, a value of the number
               is returned. Otherwise, a value of -1 is returned,
               and ErrorNo is set to indicate the error.

  ... IBMHorder(hst) function
      input>.  1, or 2 byte(s) code.
      output>. If the input code is ASCII, return the sum of the
               Hex. value of the ASCII code plus the value of
               'ASCIIOffset'. Else, if the input code is Symbol,
               return the sum of the sequential No. of the Symbol
               plus the value of 'SymOffset'. Otherwise, if the
               input code is word, return the sequential No. of
               the word.
      return>. Upon successful completion, a value of the number
               is returned. Otherwise, a value of -1 is returned,
               and ErrorNo is set to indicate the error.

  ... NSCorder(nsc) function
      input>.  1, or 2 byte(s) code.
      output>. If the input code is ASCII, return the sum of the
               Hex. value of the ASCII code plus the value of
               'ASCIIOffset'. Else, if the input code is Symbol,
               return the sum of the sequential No. of the Symbol
               plus the value of 'SymOffset'. Otherwise, if the
               input code is word, return the sequential No. of
               the word.
      return>. Upon successful completion, a value of the number
               is returned. Otherwise, a value of -1 is returned,
               and ErrorNo is set to indicate the error.

  ... NSCPorder(nsc) function
      input>.  1, or 2 byte(s) code.
      output>. If the input code is ASCII, return the sum of the
               Hex. value of the ASCII code plus the value of
               'ASCIIOffset'. Else, if the input code is Symbol,
               return the sum of the sequential No. of the Symbol
               plus the value of 'SymOffset'. Otherwise, if the
               input code is word, return the sequential No. of
               the word.
               Additionally, reset SI, SO, LS2, LS3, SS2, or SS3
               flag(s).
      return>. Upon successful completion, a value of the number
               is returned. Otherwise, a value of -1 is returned,
               and ErrorNo is set to indicate the error.

  ... EUCorder(euc) function
      input>.  1, or 2 byte(s) code.
      output>. If the input code is ASCII, return the sum of the
               Hex. value of the ASCII code plus the value of
               'ASCIIOffset'. Else, if the input code is Symbol,
               return the sum of the sequential No. of the Symbol
               plus the value of 'SymOffset'. Otherwise, if the
               input code is word, return the sequential No. of
               the word.
      return>. Upon successful completion, a value of the number
               is returned. Otherwise, a value of -1 is returned,
               and ErrorNo is set to indicate the error.

  ... TCAorder(tca) function
      input>.  1, or 2 byte(s) code.
      output>. If the input code is ASCII, return the sum of the
               Hex. value of the ASCII code plus the value of
               'ASCIIOffset'. Else, if the input code is Symbol,
               return the sum of the sequential No. of the Symbol
               plus the value of 'SymOffset'. Otherwise, if the
               input code is word, return the sequential No. of
               the word.
      return>. Upon successful completion, a value of the number
               is returned. Otherwise, a value of -1 is returned,
               and ErrorNo is set to indicate the error.

  ... TELorder(tel) function
      input>.  1, or 2 byte(s) code.
      output>. If the input code is ASCII, return the sum of the
               Hex. value of the ASCII code plus the value of
               'ASCIIOffset'. Else, if the input code is Symbol,
               return the sum of the sequential No. of the Symbol
               plus the value of 'SymOffset'. Otherwise, if the
               input code is word, return the sequential No. of
               the word.
      return>. Upon successful completion, a value of the number
               is returned. Otherwise, a value of -1 is returned,
               and ErrorNo is set to indicate the error.

  For the code sets in question, they are divided into three groups.
  The sets in each group have the same sequential order of word. One
  group, called 'IBM group', includes the BIG5 set, the IBM 5550 set
  and the IBM Host set, the other, called 'NSC group', the NSC set,
  the NSC with Protocol set, the EUC set and the TCA set, and another,
  called 'TEL group', the TEL set.

  ... IBM2NSC(order) function
      input>.  The sequential No. in IBM Set.
      output>. The sequential No. in NSC Set.

  ... NSC2IBM(order) function
      input>.  The sequential No. in NSC Set.
      output>. The sequential No. in IBM Set.

  ... IDent(order) function
      Output No. is identical with input No.

  ... TCA2TEL(order) function
      input>.  The sequential No. in NSC Set.
      output>. The sequential No. in TEL Set.

  ... TEL2TCA(order) function
      input>.  The sequential No. in TEL Set.
      output>. The sequential No. in NSC Set.

  ... SymbolTab(InCodeNo, OutCodeNo, order)
      input>.  order    : Input order that is to be converted.
               InCodeNo : The input internal code No..
               OutCodeNo: The output internal code No..
                      Internal Code No:
                                       0: BIG5
                                       1: IBM 5550
                                       2: NSC Internal code
                                       3: EUC
                                       4: TCA
                                       5: TELegraph
                                       6: IBM Host
                                       7: NSC with Protocol
      output>. The converted result.
      return>. Upon successful completion, a value of the number is
               returned. Otherwise, a value of the first Symbol is
               returned, and ErrorNo is set to indicate the error.

  ... BIG5code(b5) function
      input>.  the sequential No.
      output>. internal code

  ... IBMcode(ibm) function
      input>.  the sequential No.
      output>. internal code

  ... IBMHcode(hst) function
      input>.  the sequential No.
      output>. internal code

  ... NSCcode(nsc) function
      input>.  the sequential No.
      output>. internal code

  ... NSCPcode(nsc) function
      input>.  the sequential No.
      output>. [Ctrl_code] internal code

  ... EUCcode(euc) function
      input>.  the sequential No.
      output>. internal code

  ... TCAcode(tca) function
      input>.  the sequential No.
      output>. internal code

  ... TELcode(tel) function
      input>.  the sequential No.
      output>. internal code

  ... Put2Byte(word, stream) function
      input>.  word: The output code
               stream: The address of a string.
      output>. send the word to a STREAMS file.

  ... Put4Byte(word, stream) function
      input>.  word: The output code
               stream: The address of a string.
      output>. send the word to a STREAMS file.

  ... PutNSCcode(word, stream) function
      input>.  word: The output code
               stream: The address of a string.
      output>. send the word to a STREAMS file.

  ... PutIBMHcode(word, stream) function
      input>.  word: The output code
               stream: The address of a string.
      output>. Send the control code, SI or SO, to a STREAMS file as
               the plane transfered. Otherwise, send the word to the
               STREAMS file.

<F> EXAMPLES

    Usage: chcode <INop> infile <OUTop> outfile
           <INop | OUTop> ::= B | I | N | E | T | L | H | P
                  Comment :  (B)IG5
                             (I)BM 5550
                             (N)SC Internal code
                             (E)UC
                             (T)CA
                             TE(L)egraph
                             IBM (H)OST
                             NSC with (P)rotocol
