High Level Encoding

High Level Encoding

Now that we have an idea of the physical layout of a PDF417 bar code, we need to know how to encode data.

All data is represented by codewords with values between 0 and 928. High-level encoding converts the data to be printed into corresponding codeword values. How we encode each codeword, described below, depends on what the codeword represents.

Low-level encoding, by contrast, involves physically converting the codeword values into their respective bar/space patterns.

PDF417 encodes values according to clusters, or mutually exclusive encodation sets. The bar-space pattern of each codeword depends on both the value to be encoded and the cluster used by that row.

Clusters

The entire set of 929 codewords in PDF417 is represented in three mutually exclusive symbol character sets, or clusters.  Each cluster encodes the 929 available ‘PDF417' codewords into different bar-space patterns so that one cluster is distinct from another.  The cluster numbers are 0, 3, 6.  The cluster definition applies to all ‘PDF417' symbol characters, except for start and stop characters. 

Each row uses only one of the three clusters (0, 3, or 6) to encode data, with the same cluster repeating sequentially every third row. Row 0 codewords use cluster 0, row 1 uses cluster 3, and row 2 uses cluster 6, etc. In general, cluster number = ((row number) mod 3) *3. To encode any codeword value, it is necessary first to know what cluster it must belong to.

Each cluster presents the 929 available values with distinct bar-space patterns so that one cluster cannot be confused with another. Because any two adjacent rows use different clusters, scan stitching can be used without confusing rows.

Encoding Data

High level encoding converts the data characters into their corresponding codewords. Data compaction schemes are used to achieve high level encoding. Depending on data type, there are several ways to encode data in PDF417, which involve selecting a mode and encoding codeword values.

Modes

A mode is simply a method of compacting data.  PDF417 provides three modes to encode data. In one PDF417 symbol it is possible to switch back and forth between modes as often as required.

The optimal mode for an application may be a combination of the three modes, each used in different portions of the symbol. By using mode latchesshifts you can move at will from mode to mode within the same symbol.

The three modes are defined below.  Each mode defines a particular efficient mapping between user-defined data and codeword sequences:

            Text Compaction mode

            Byte Compaction mode

            Numeric Compaction mode

900 codewords are available in each mode for data encodation and other functions within the mode.  The remaining 29 codewords are assigned to specific functions independent of the current compaction mode.

Codewords 900 to 928 are assigned as function codewords as follows:

            Switching between modes

            Enhanced applications using Extended Channel Interpretations (ECIs)

            Other enhanced applications

At present codewords 903 to 912 and 914 to 920 are reserved.  Below is a table that defines the list of assigned and reserved function codewords.

 

Assignments of ‘PDF417' Function Codewords

 

Codeword

                                      Function

900

mode latch to Text Compaction mode

901

mode latch to Byte Compaction mode

902

mode latch to Numeric Compaction mode

903 to 912

reserved for future functions

913

mode shift to Byte Compaction mode

914 to 920

reserved for future functions

921

reader initialization

922

terminator codeword for Macro PDF control block

923

sequence tag to identify the beginning of optional fields in the Macro PDF control block

924

mode latch to Byte Compaction mode (used differently from 901)

925

identifier for a user defined Extended Channel Interpretation (ECI)

926

identifier for a general purpose ECI format

927

identifier for an ECI of a character set or code page

928

Macro marker codeword to indicate the beginning of a Macro PDF Control Block

 

 

A mode latch codeword may be used to switch from the current mode to the indicated destination and remains in effect until another mode switch is explicitly brought into use.  Codewords 900 to 902 and 924 are assigned to this function.  Below is a table that defines the Mode Definition and Mode Switching Codewords function.

The mode shift codeword 913 causes a temporary switch from Text Compaction mode to Byte Compaction mode.  This switch shall be in effect for only the next codeword, after which the mode shall revert to the prevailing sub-mode of the Text Compaction mode. Codeword 913 is only available in Text Compaction mode; its use is described in.

Mode Definition and Mode Switching Codewords

 

Destination Mode

Mode Latch

Mode Shift

Text Compaction

900

 

Byte Compaction

901/924

913

Numeric Compaction

902

 

 

‘PDF417' also supports the Extended Channel Interpretation system, which allows different interpretations of data to be accurately encoded in the symbol

To select the optimal mode, first select the mode that supports the characters that must be encoded. Next, when there is a choice among modes, consider the “efficiency” of the mode. This involves answering two questions:

1.         Per mode, how many characters per codeword may be encoded?

2.         If mode switching seems promising, what additional overhead (e.g., shift, latch characters) is needed, or is it more efficient to use one mode only?

Each available mode in PDF417 is described in the following sections. The issue of “efficiency” will become clear as we discuss each mode’s advantages and limitations.

Text Compaction Mode

Text Compaction mode permits all printable ASCII characters to be encoded, i.e. values 32 - 126 inclusive in accordance with ISO/IEC 646, as well as selected control characters.

The Text Compaction Mode has four sub-modes:

                       Alpha (uppercase alphabetic)

                       Lower (lowercase alphabetic)

                       Mixed (numeric and some punctuation)

                       Punctuation

Each sub-mode contains 30 characters, including sub-mode latch and shift characters.

The default compaction mode for ‘PDF417' in effect at the start of each symbol is always Text Compaction Mode Alpha sub-mode (uppercase alphabetic).  A latch codeword from another mode to the Text Compaction mode will always switch to the Text Compaction Alpha sub-mode.

Byte Compaction Mode

Byte Compaction mode permits all 256 possible 8-bit byte values to be encoded.  This includes all ASCII characters value 0 to 127 inclusive and provides for international character set support.

The Bite Compaction Mode lets you encode 256 international characters, including the full ASCII set, as well as any 8-bit value from 0 to 255. This mode encodes about 1.2 bytes per codeword. In terms of the breadth of encodable data, it is a powerful mode; in terms of the resulting size of the printed symbol, it is the least efficient mode.

Encoding Data in Byte Compaction Mode

The Byte Compaction mode enables a sequence of 8-bit bytes to be encoded into a sequence of codewords.  It is accomplished by a Base 256 to Base 900 conversion, which achieves a compaction ratio of six Byte Compaction characters to 5 codewords (1,2:1).

All the characters and their values (0 to 255) are defined in ______?.   This is the default graphical and control character interpretation.  When ECIs are invoked this interpretation may be defined as either ECI 000000 or ECI 000002.

Switching to Byte Compaction Mode

Because the default mode for ‘PDF417' is Text Compaction mode, to switch to Byte Compaction mode, it is necessary to use one of the following codewords:

           mode latch 924 shall be used when the total number of Byte Compaction characters to be encoded is an integer multiple of 6

           mode latch 901 shall be used when the total number of Byte Compaction characters to be encoded is not a multiple of 6

           mode shift 913 can be used instead of codeword 901 when a single Byte Compaction character has to be encoded

If there is a need to encode only one Binary/ASCII Plus digit (8-bit), the Binary/ASCII Plus mode shift 913 may be used. When mode shift 913 is used, the next codeword is treated as a single 8 bit value.

Compaction Rules for Encoding Longer Byte Compaction Character Strings (Using Mode Latch 924 or 901)

The following procedure shall be used to encode Byte Compaction character data:

            1.         Establish the total number of Byte Compaction characters.

            2.         If a perfect multiple of 6, mode latch 924 shall be used; or else mode latch 901 shall be used.

            3.         Sub-divide the number of Byte Compaction characters into a sequence of 6 characters, from left to right (the most to least significant characters).  If less than 6 characters go to Step 7.

            4.         Assign the decimal values of the 6 data bytes to be encoded in Byte Compaction mode as b5 to b0 (where b5 is the first data byte).

            5.         Carry out a base 256 to base 900 conversion to produce a sequence of 5 codewords.  Annex

Do the Math

Encoding binary data using latches 901/924 requires converting the data from base 256 to base 900. The encoding process, specified by latch codeword 924, takes 6 base 256 numbers or values at a time, converting them to 5 base 900 codewords. The process continues until no more numbers or values need to be encoded.

If the number of digits to encode is not a multiple of 6, use mode 901. This just converts each group of 6 digits using the above algorithm and appends any remaining digits using 1 codeword per digit.

The following algorithm performs a base 256 to base 900 conversion:

            To calculate the codewords, we must first define the following:

                        n = number of codewords (In this case, it is always 5.)

                        t = temporary variable

            We then calculate t as follows:

            t = d5*2565 + d4*2564 + d3*2563 + d2*2562 + d1*2561 + d0*2560

            Next, we calculate each codeword as follows:

            For each codeword ci = c0 ... cn-1

            BEGIN

                        ci         = t mod 900

                        t           = t div 900

            END   

                        Where:

                        ci         = codeword n

                        Remember:     mod is the integer remainder after division. If the remainder is negative, take the complement to get the correct result. For example, the remainder of -29160 divided by 929 is -361. The complement is 929 - 361, or 568.

            div is the integer division operator

For example:

            Encode the numbers {1,2,3,4,5,6}

            Calculate sum of data bytes to encode:

            t           = 1*2565 + 2*2564 + 3*2563 + 4*2562 + 5*2561 + 6*2560

                        = 1108152157446

            Calculate codeword 0

                        c0        = 1108152157446 mod 900    = 846

                        t           = 1108152157446 div 900      = 1231280174

            Calculate codeword 1

                        c1        = 1231280174 mod 900          = 74

                        t           = 1231280174 div 900            = 1368089

            Calculate codeword 2

                        c2        = 1368089 mod 900                = 89

                        t           = 1368089 div 900                  = 1520

            Calculate codeword 3

                        c3        = 1520 mod 900                      = 620

                        t           = 1520 div 900                        = 1

            Calculate codeword 4 (last codeword)

                        c4        = 1 mod 900                            = 1

                        t           = 1 div 900                              = 0

            The codeword sequence is (including Binary/ASCII Plus mode latch 924): 924,1,620,89,74,846

Example:

            Encode the numbers 1,2,3,4,5,6,7,8,4

            The mode shift character 901 is used since the number of digits to encode is not a multiple of 6. The above results apply for the first 6 digits. The remaining digits are appended as codewords, in order of their occurrence.

            The codeword sequence is:  901,1,620,89,74,846,7,8,4

Numeric Compaction Mode 

The Numeric Compaction mode is a method for base 10 to base 900 data compaction and should be used to encode long strings of consecutive numeric digits.  The Numeric Compaction mode encodes up to 2.93 numeric digits per codeword.  

Compaction Rules for Encoding Long Strings of Consecutive Numeric Digits

Latch to Numeric Compaction Mode

Encoding Data in Numeric Mode

Switching from Numeric Compaction Mode

Latch to Numeric Compaction Mode

Numeric Compaction mode may be invoked when in Text Compaction or Byte Compaction modes using mode latch 902.

Encoding Data in Numeric Mode

Compaction Rules for Encoding Long Strings of Consecutive Numeric Digits

The following procedure is used to compact numeric data:

1. Divide the string of digits into groups of 44 digits. The last group may contain fewer digits.

2. For each group add the digit 1 to the most significant position to prevent the loss of leading    zeros.

                                    EXAMPLE:

                                                original data                 00246812345678

                                                after step 2                  100246812345678

                                    NOTE:  The leading digit 1 is removed in the decode algorithm.

Perform a base 10 to base 900 conversion as follows:

            Starting from least significant codeword 0, calculate the following:

            For each codeword ci = c0 ... cn-1

            BEGIN

                        ci         = x mod 900

                        x          = x div 900

                        If x = 0, then

                                    stop encoding

            END   

                        where:

                                    x          = numeric string to encode, preceded by number 1

            Remember:     mod is the integer remainder after division. If the remainder is negative, take the complement to get the correct result. For example, the remainder of -29160 divided by 929 is -361. The complement is 929 - 361, or 568.

            div is the integer division operator.

Repeat from Step 2 as necessary.

For example:

            Encode the numeric string 00021329000.

            First precede the string by the number one:

                        100021329000

            Next use the conversion algorithm to produce the desired codewords:

            Start by setting x = 100021329000

            Calculate codeword 0

                        c0        = 100021329000 mod 900      = 0

                        x          = 100021329000 div 900        = 111134810

Calculate codeword 1

                        c1        = 111134810 mod 900            = 110

                        x          = 111134810 div 900  = 123483

            Calculate codeword 2

                        c2        = 123483 mod 900                  = 183

                        x          = 123483 div 900                    = 137

            Calculate codeword 3

                        c3        = 137 mod 900                        = 137

                        x          = 137 div 900              = 0

The codeword sequence then is (including numeric mode latch 902): 902,137,183,110, 0

The following rules can be used to determine the precise number of codewords in Numeric Compaction mode:

                       Groups of 44 numeric digits compact to 15 codewords.

                       For groups of shorter sequences of digits, the number of codewords can be calculated as follows:

                                    Codewords = INT (number of digits / 3) +1

                                    EXAMPLE:

                                    For a 28 digit sequence

                                                INT (28 / 3) + 1

                                                = 9 + 1

                                                = 10 codewords

Switching from Numeric Compaction Mode

Numeric Compaction mode may be terminated by the end of the symbol, or by any of the following codewords:

                       900 (Text Compaction mode latch)

                       901 (Byte Compaction mode latch)

                       902 (Numeric Compaction mode latch)

                       924 (Byte Compaction mode latch)

                       928 (Beginning of Macro ‘PDF417' Control Block)

                       923 (Beginning of Macro ‘PDF417' Optional Field)

                       922 (Macro ‘PDF417' Terminator)

The latter three codewords only occur within the Macro ‘PDF417' Control Block of a Macro ‘PDF417' symbol.  Numeric Compaction mode is also affected by the presence of a reserved codeword.

Re-invoking Numeric Compaction mode (by using codeword 902 while in Numeric Compaction mode) serves to terminate the current Numeric Compaction mode grouping as described in??, and then to start a new grouping.  This procedure may be necessary when an ECI assignment number needs to be encoded.

During the decode process for Numeric Compaction mode, the result of the base 900 to base 10 conversion results in a number whose most significant digit is a ‘1'. Otherwise, the symbol is invalid. The leading ‘1' is removed to produce the original number.

Encoding the Left and Right Row Indicators

 

The row indicators in a PDF417 symbol contain several key components: row number, number of rows, security level, and number of data columns.

Not every row indicator contains every component. The information is spread over several rows, and the pattern repeats itself every three rows. This pattern, the spreading and repeating below, makes the symbol as robust as possible.

Row 0:             Left R.I. (Row #, # of Rows)   Right R.I. (Row #, # of Columns)

Row 1:             Left R.I. (Row #, Security Level)         Right R.I. (Row #, # of Rows)

Row 2:             Left R.I. (Row #, # of Columns)          Right R.I. (Row #, Security Level)

For example, row 3's (or 6’s, 9’s, 12’s, etc.) right row indicator is calculated the same way as row 0's right row indicator; row 3's (or 6’s, 9’s, 12’s, etc.) left row indicator is calculated the same way as row 0's left row indicator, etc.

To make this clear, formulas and specific examples follow.

Left Row Indicators

Left row indicators are calculated as follows:

Row 0  30 * (row number div 3) + ((number of rows - 1) div 3)

Row 1  30 * (row number div 3) + security level * 3 + (number of rows - 1) mod 3

Row 2  30 * (row number div 3) + (number of columns - 1)

Notes:  div is the integer division operator

mod is the integer remainder after division. If the remainder is negative, take the complement to get the correct result. For example, the remainder of -29160 divided by 929 is -361. The complement is 929 - 361, or 568.