SMOKE-16 'a.out' and Related Formats ----------------------------
$Id: a_out.txt,v 1.8 2000/11/09 19:55:12 bsittler Exp $
This document contains a brief description of the SMOKE-16
binary file formats.                                       --BCWS
-----------------------------------------------------------------

NOTE ============================================================
Toolset v1, revision 19980411 and earlier, incorrectly stamped their
object files as being written by "v0" of the toolset.  Later revisions
of toolset v1 can use these "v0" files, but write v1 files which have
a different header format and default load address, and are therefore
unusable by the older toolset revisions.

      The '-A v0' option to 'as' and 'ld' writes files usable by
      the older toolset revisions.
=================================================================

Note: The '.h' files mentioned below are in the 'include/smoke16'
      directory.

OVERVIEW OF SMOKE-16 BINARY FILE FORMATS

The SMOKE-16 toolset uses a variant of the 'a.out' file format for
object files (OMAGIC), pure executables (NMAGIC), split I&D
executables (JMAGIC), and object archives (LMAGIC).

You may wish to refer to 'a_out.h' when reading this document, as it
contains more detailed information on the file format. In particular,
realize that this is a portable file format specification, and these
structures are converted into a more host-specific format when loaded
into memory by the fget16* functions (listed in 'libObj.h'.)
Alignment, padding, endianness and word size may differ from the
portable format. The fput16* functions convert back into the portable
format.

The portable format stores numbers with the most significant byte
first. Strings (see the STRING TABLE description, below) are stored in
an ASCII subset described in 'portable.txt' in the 'doc' directory.

A SMOKE-16 'a.out' file contains the following sections: header, text
section (or archive directory,) data section (or member objects,)
uninitialized data section (BSS,) text and data relocation sections,
and symbol table.

HEADER

'struct a_out16_exec'
     +--------+--------+
   0 | a_info          | 1  a_dynamic:1 a_toolversion:7 a_machtype:8
     +--------+--------+
   2 | a_magic         | 3
     +--------+--------+
   4 | a_text          | 5
     +        +        +
   6 |                 | 7
     +--------+--------+
   8 | a_data          | 9
     +        +        +
  10 |                 | 11
     +--------+--------+
  12 | a_bss           | 13
     +-----------------+
  14 | a_syms          | 15
     +--------+--------+
  16 | a_entry         | 17
     +--------+--------+
  18 | a_trsize        | 19
     +--------+--------+
  20 | a_drsize        | 21
     +--------+--------+

  a_toolversion Description                  Symbol ('a_out.h')
-----------------------------------------------------------------
  0            "v0" (see NOTE above)
  1             v1 (current)                 SMOKE16_TOOLVERSION
-----------------------------------------------------------------

  a_machtype    Description                  Symbol ('a_out.h')
-----------------------------------------------------------------
  120           SMOKE-16                     M_SMOKE16
-----------------------------------------------------------------

  a_magic       Description                  Symbol ('a_out.h')
-----------------------------------------------------------------
  0407          object or impure executable  SMOKE16_OMAGIC
  0410          pure executable              SMOKE16_NMAGIC
  0411          split I&D executable         SMOKE16_JMAGIC
  0440          object archive               SMOKE16_LMAGIC
-----------------------------------------------------------------

This is a 22-byte structure 'struct a_out16_exec' which starts with a
two-byte information field 'a_info' which identifies the file format
version ('a_toolversion'; must be 0 or 1 as of this writing,) machine
type ('a_machtype'; must be M_SMOKE16 as of this writing,) and whether
the file uses dynamic linking ('a_dynamic'; must be zero as of this
writing.)  Next is two-byte magic number 'a_magic' which identifies
the file as an object file (OMAGIC), pure executable (NMAGIC), split
I&D executable (JMAGIC), or object archive (LMAGIC). Together, these
first four bytes of the header provide a fairly unique identification
usable by the `file' command; see provided 'magic' file for more
information.

The next four bytes of the header contain the length of the text
section in bytes ('a_text'). Immediately following this are four more
bytes which give the size of the initialized data section in bytes
('a_data').

After that, there are two bytes giving the length of the uninitialized
data section, or BSS ("Block Started by Symbol", so-called for
historical reasons) in bytes ('a_bss').  This section is not recorded
in the object file, since it is uninitialized.

Following this are two bytes which give the length of the symbol table
in bytes ('a_syms'), and two more bytes which give the absolute
address of the program entry point 'a_entry'.

      The '-E' option can override this address for `as',
      `ld' and `emu'.

      The '-e' option for `ld' can specify an entry symbol
      (the default is '__entry'.)

Finally, the header contains two bytes giving the length of the text
relocations in bytes ('a_trsize') and two bytes giving the length of
the data relocations in bytes ('a_drsize').

TEXT SECTION

In normal object files (OMAGIC), pure executables (NMAGIC), and split
I&D executables (JMAGIC), the text section contains the program
code. The text section is loaded into the program's instruction
address space starting at address 0x0400 (in split I&D executables
(JMAGIC) the instruction address space is separate from the data
address space.)

      The '-Ttext' option can override this address for `as',
      `ld' and `emu'.

Constant data is usually placed in the data section, but may be placed
in the text section instead (see the DATA SECTION description, below.)

DATA SECTION

In normal object files (OMAGIC), pure executables (NMAGIC), and split
I&D executables (JMAGIC), the data section contains any initialized
data the program may require. In normal object files (OMAGIC) and pure
executables (NMAGIC), the data section is loaded into the program's
data address space immediately following the text section. In split
I&D executables (JMAGIC), the data section is loaded into the
program's data address space starting at address 0x0400.

      The '-Tdata' option can override this address for `as',
      `ld' and `emu'.

By default, constant data (the assembler ".rdata" section) is written
to the object file in the data section. This data can instead be
placed in the text section, but that will cause problems when
generating split I&D executables (JMAGIC).

      The '-R' option for 'as' places constant data in the text section.

ARCHIVE DIRECTORY [LMAGIC]

'struct a_out16_dirent'
     +--------+--------+
   0 | d_strx          | 1
     +--------+--------+
   2 | d_magic         | 3
     +--------+--------+
   4 | d_value         | 5
     +        +        +
   6 |                 | 7
     +--------+--------+
   8 | d_size          | 9
     +        +        +
  10 |                 | 11
     +--------+--------+
  12 | d_mtime         | 13
     +        +        +
  14 |                 | 15
     +--------+--------+

In object archives (LMAGIC), the text section serves as a directory of
the archive. The archive directory consists of a sequence of 16-byte
directory entry structures 'struct a_out16_dirent' with the following
format: a two-byte offset into the string table for the member's
filename 'd_strx' (see the STRING TABLE description, below,) a
two-byte copy of the member's magic number 'd_magic' (see the HEADER
description, above,) a four-byte offset to the member in the data
segment 'd_value', a four-byte length of the member 'd_size', and a
four-byte modification time 'd_mtime'.

MEMBER OBJECTS [LMAGIC]

In object archives (LMAGIC), the data section holds the member object
files.

UNINITIALIZED DATA SECTION (BSS)

The uninitialized data section is not stored in the object file. The
uninitialized data section is loaded into the program's address space
immediately following the data section.

      The '-Tbss' option can override this address for `ld' and `emu'.

TEXT AND DATA RELOCATION SECTIONS

'struct a_out16_reloc'
     +--------+--------+
   0 | r_addr_high     | 1
     +--------+--------+
   2 | r_addr_low      | 3
     +--------+--------+
   4 | r_index         | 5
     +--------+--------+
   6 | r_info          | 7 r_extern:1 r_high:1 r_low:1 ... r_type:2
     +--------+--------+
   8 | r_value         | 9
     +--------+--------+

  r_type        Description                  Symbol ('a_out.h')
-----------------------------------------------------------------
  0             absolute address             SMOKE16_RELOC_ABSOLUTE
  1             %pc-relative displacement    SMOKE16_RELOC_DISP8
-----------------------------------------------------------------

The relocation sections hold lists of relocations, or fixups, which
need to be made to the relevant sections before they can be used. Each
relocation is described in a 10-byte structure 'struct a_out16_reloc':

The first two bytes hold the address in the relevant segment of the
high byte of the address to be relocated 'r_addr_high'. The next two
bytes hold the address of the low byte of the address to be relocated
'r_addr_low'.

The next two bytes 'r_index' hold the symbol ordinal for the symbol
(in the case of symbol-relative relocations) or the section (see
'n_type' in the SYMBOL TABLE description, below) that the relocation
is relative to.

The next two bytes 'r_info' hold various flags relating to the
relocation, such as whether it is symbol-relative or not ('r_extern',)
whether each of the low and high bytes is to be relocated
('r_high'/'r_low',) and the type of relocation to perform once the
address is known ('r_type'.)

The final two bytes hold the offset to be added to the relocated
address 'r_value'.

SYMBOL TABLE

'struct a_out16_nlist'
     +--------+--------+
   0 | n_strx          | 1
     +--------+--------+
   2 | n_type |n_other | 3
     +--------+--------+
   4 | n_desc          | 5
     +--------+--------+
   6 | n_value         | 7
     +--------+--------+

  n_type        Description                  Symbol ('a_out.h')
-----------------------------------------------------------------
  0x01          external (flag)              SMOKE16_N_EXT
  0x1e          basic types (mask)           SMOKE16_N_TYPE
  0x00          undefined/common             SMOKE16_N_UNDF
  0x02          absolute                     SMOKE16_N_ABS
  0x04          text section                 SMOKE16_N_TEXT
  0x06          data section                 SMOKE16_N_DATA
  0x08          bss                          SMOKE16_N_BSS
  0x0c          alignment                    SMOKE16_N_ALIGN
  0xe0          debugging types (mask)       SMOKE16_N_STAB
-----------------------------------------------------------------

The symbol table contains a list of symbol descriptions in 8-byte
structures 'struct a_out16_nlist':

The first two bytes hold the offset into the string table (see the
STRING TABLE description, below) of the symbol's name 'n_strx', or
zero if the symbol is anonymous.

The following byte contains the type of the symbol 'n_type'. The type
may be internal or external, and may indicate that the symbol is to be
used for debugging purposes. The basic types are undefined/common,
absolute, text-segment relative, data-segment relative, and
bss-relative.

The following byte contains the stab "other" field 'n_other' for
debugging symbols. The next two bytes contain the stab "desc" field
'n_desc'. [stab is a debugging information format; it's not yet
properly supported by the SMOKE-16 toolset.]

The final two bytes contain the value or segment offset of the symbol
'n_value'. For common symbols (SMOKE16_N_UNDF with non-zero
'n_value'), this is the size of the common symbol in bytes. For
alignment symbols, this is the alignment shift in bits.

      The supported alignment symbols are "@t" (text section
      alignment,) "@d" (data section alignment,) and "@b" (bss
      alignment.)

In object archives (LMAGIC), the symbol table contains copies of all
symbols exported by the member objects. In this case, the symbol
values 'n_value' contain the ordinals of the defining members in the
directory.

STRING TABLE

     +--------+--------+
   0 | n               | 1
     +--------+--------+
   2 |        |
    ...

 n-1 |        |
     +--------+

The first two bytes of the string table, if present, contain 'n', the
size of the entire string table in bytes. The remainder of the string
table holds the names of symbols (indexed by the 'n_strx' field in the
symbol table.)

In object archives (LMAGIC), the string table is also used to hold the
names of the member objects (indexed by the 'd_strx' field in the
archive directory.)

Strings are referred to by indexes from the start of the string table,
so the lowest valid string index is 2. A string index of 0 refers to
an empty string. All strings (with the possible exception of the last
one) are terminated with nulls.
