SMOKE-16 Architecture -------------------------------------------
$Id: opcodes.txt,v 1.18 2001/07/26 07:15:46 bsittler Exp $
This document contains a brief description of the SMOKE-16
architecture. None of this is set in stone yet.            --BCWS
-----------------------------------------------------------------

OVERVIEW OF THE SMOKE-16 ARCHITECTURE

The SMOKE-16 is a big-endian 16-bit architecture, capable of
addressing up to 128k 8-bit bytes of memory from user mode (or up to
64k when data and address space are unified.) The supervisor mode
design hasn't been finalized yet, so I won't tease you with much of it
here. There is currently no SMOKE-16 support for floating-point
acceleration, but such support may be added someday, probably using a
variant of the lds/sts instructions.

The user mode is emulated by the 'emu' program, which reflects a few
of your operating system's basic features through SMOKE-16 interrupt
calls (listed in 'smoke16/syscall.h'.) In general, the interrupt calls
have UNIX system call semantics. See the emulator sources and the C
library sources for details.

The assembler syntax is heavily influenced by GNU as (gas) and the GNU
assembler preprocessor (gasp.) In the future, I expect to use gasp as
a preprocessor for all my SMOKE-16 assembly, but for now I use cpp.

REGISTERS

The SMOKE-16 has 16 general purpose registers and 4 system registers
(S-REGS) which are visible to the programmer. Additional mnemonic for
a register are listed after the primary mnemonic. All registers are 16
bits wide.

- SYSTEM REGISTERS (%sX)

  %s0/%pc: program counter, contains address of the current instruction
  %s1/%ps: processor status, contains several fields:
    - 0x8000: supervisor mode (K) (not yet...)
    - 0x7f00: [mask] priority of current hardware interrupt (not yet...)
    - 0x00c0: [mask] comparison flags:
      0x0080: "less than"/"carry"/"borrow" flag (C)
      0x0040: "equal to"/"zero" flag (Z)
    - 0x003f: [mask] reserved for future use
  %s2/%pt: page table base (not yet...)
  %s3/%tt: trap table base (not yet...)

- GENERAL REGISTERS(%X/%rX)

  %0/%r0: always 0x0000, writing to it has no effect
  %1/%r1/%t0: temporary, saved
  %2/%r2/%t1: temporary, saved
  %3/%r3/%t2: temporary, saved
  %4/%r4/%t3: temporary, saved
  %5/%r5/%t4: temporary, saved
  %6/%r6/%t5: temporary, saved
  %7/%r7/%t6: temporary, non-saved (reserved for assembler use)
  %8/%r8/%a0: argument 0/return value 0, non-saved
  %9/%r9/%a1: argument 1/return value 1, non-saved
  %10/%r10/%a2: argument 2/return value 2, non-saved
  %11/%r11/%a3: argument 3/return value 3, non-saved
  %12/%r12/%fp: frame pointer (points to caller's frame pointer)
  %13/%r13/%sp: stack pointer (points to highest unused word on stack)
  %14/%r14/%ra: return address
  %15/%r15/%gp: global pointer (use discouraged)

- By convention, return values of 4 words or less are returned in
  %a0.. %a3, and all other values are returned by having the caller pass the
  address of a temporary structure on the stack as an "invisible" first
  argument. *This address is returned by the callee.*

- By convention, arguments of more than 4 words are passed by reference
  to temporary stack space set aside by the caller. Arguments past the fourth
  word are passed on the stack in the caller's frame, with the first
  argument at the lowest address.

- Saved means a function should return the register to its
  original state before returning. Non-saved means a function should
  place the value on the stack before calling a sub-function, and
  restore it from the stack after the sub-function returns.

INSTRUCTION SET

The SMOKE-16 has a fixed-length instruction word of 16 bits. This is
somewhat limiting (SMOKE-16 has only 32 actual instructions,) but it's
not an insurmountable obstacle. Many instructions commonly used on
other processors are aliased to one or more actual instructions
with additional arguments. Some instructions have more than one name,
and the different names can be used interchangeably.

Actual instructions are marked with a double asterisk ('**');
Aliased instructions are marked with a double dash ('--').

See the end of this file for additional notes.

MOV/LD/ST/LDS/STS/LDI - load/store word

-- mov <symexpr>, %rdest
-- ldi <symexpr>, %rdest
   => sethi %hi(<symexpr>), %rdest
      movb %lo(<symexpr>), %rdest

-- mov %rsrc, %rdest
-- st %rsrc, %rdest
   => or %0, %rsrc, %rdest

-- mov [<symexpr>], %rdest
-- ld [<symexpr>], %rdest
   => sethi %hi(<symexpr>), %t6
      movb %lo(<symexpr>), %t6
      mov [%t6 + %0], %rdest

** mov [%raddr + <offset4>], %rdest
-- ld [%raddr + <offset4>], %rdest
   Encoding: 0x6000 | (rdest << 8) | ((offset4 & 0x1e) << 3) | raddr
   Load the word starting at <offset4> + %raddr into %rdest.

-- mov [%raddr], %rdest
   => mov [%raddr + 0], %rdest

-- mov %rsrc, [<symexpr>]
-- st %rsrc, [<symexpr>]
   => sethi %hi(<symexpr>), %t6
      movb %lo(<symexpr>), %t6
      mov %rsrc, [%t6 + %0]

** mov %rsrc, [%raddr + <offset4>]
-- st %rsrc, [%raddr + <offset4>]
   Encoding: 0xd000 | (rsrc << 8) | ((offset4 & 0x1e) << 3) | raddr
   Store %rsrc at starting at <offset4> + %raddr.

-- mov %rsrc, [%raddr]
   => mov %rsrc, [%raddr + 0]

** mov %ssrc, %rdest
-- lds %ssrc, %rdest
-- ld %ssrc, %rdest
   Encoding: 0xf400 | (rdest << 4) | ssrc
   [supervisor mode only]
   Copy the value in special register %ssrc into %rdest.

** mov %rsrc, %sdest
-- sts %rsrc, %sdest
-- st %rsrc, %sdest
   Encoding: 0xf500 | (rsrc << 4) | sdest
   [supervisor mode only]
   Copy the value in %rsrc into special register %sdest.

SETHI - set high byte, clear low byte

** sethi <symexpr8>, %rdest
   Encoding: 0x7000 | (rdest << 8) | symexpr8
   Loads <symexpr8> into the upper half of %rdest.
   Clears the lower half of %rdest.

MOVB/LDB/STB/LDBI - load/store byte

-- movb [<symexpr>], %rdest
-- ldb [<symexpr>], %rdest
   => sethi %hi(<symexpr>), %t6
      movb %lo(<symexpr>), %t6
      movb [%t6], %rdest

** movb [%raddr], %rdest
-- ldb [%raddr], %rdest
   Encoding: 0xf200 | (rdest << 4) | raddr
   Load the byte at address %raddr into the low half of %rdest.

-- movb %rsrc, [<symexpr>]
-- stb %rsrc, [<symexpr>]
   => sethi %hi(<symexpr>), %t6
      movb %lo(<symexpr>), %t6
      movb %rsrc, [%t6]

** movb %rsrc, [%raddr]
-- stb %rsrc, [%raddr]
   Encoding: 0xf300 | (rsrc << 4) | raddr
   Store the low half of %rsrc at the address %raddr.

** movb <symexpr8>, %rdest
-- ldbi <symexpr8>, %rdest
   Encoding: 0xe000 | (rdest << 8) | symexpr8
   Load <symexpr8> into the lower half of %rdest.
   Does not affect the upper half of %rdest.

ADD - add to words

** add %rsrc, %rarg, %rdest
   Encoding: 0x0000 | (rdest << 8) | (rsrc << 4) | rarg
   Place the sum of the values in %rsrc and %rarg in %rdest.
   Affects the C and Z flags.

-- add %rarg, %rdest
   => add %rdest, %rarg, %rdest

** add <immed4>, %rdest
-- addi <immed4>, %rdest
   Encoding: 0xfe00 | (rdest << 4) | immed4
   Increment %rdest by <immed4>.
   Affects the C and Z flags.

SUB/SUBI - subtract from words

** sub %rsrc, %rarg, %rdest
   Encoding: 0x1000 | (rdest << 8) | (rsrc << 4) | rarg
   Place the difference of the values in %rsrc and %rarg in %rdest.
   Affects the C and Z flags.

-- sub %rarg, %rdest
   => sub %rdest, %rarg, %rdest

** sub <immed4>, %rdest
-- subi <immed4>, %rdest
   Encoding: 0xfd00 | (rdest << 4) | immed4
   Decrement %rdest by <immed4>.
   Affects the C and Z flags.

CMP/CMPL - compare word

** cmp %rsrc1, %rsrc2
   Encoding: 0xf100 | (rsrc1 << 4) | rsrc2
   Compare %rsrc1 and %rsrc2.
   Affects the C and Z flags. Treats operands as signed.

-- cmpl %rsrc1, %rsrc2
   => sub %rsrc1, %rsrc2, %0

AND - bitwise and

** and %rsrc, %rarg, %rdest
   Encoding: 0x4000 | (rdest << 8) | (rsrc << 4) | rarg
   Place the bitwise AND of the values in %rsrc and %rarg in %rdest.

-- and %rarg, %rdest
   => and %rdest, %rarg, %rdest

OR - bitwise inclusive or

** or %rsrc, %rarg, %rdest
   Encoding: 0x5000 | (rdest << 8) | (rsrc << 4) | rarg
   Place the bitwise OR of the values in %rsrc and %rarg in %rdest.

-- or %rarg, %rdest
   => or %rdest, %rarg, %rdest

XOR - bitwise exclusive or

** xor %rsrc, %rarg, %rdest
   Encoding: 0xb000 | (rdest << 8) | (rsrc << 4) | rarg
   Place the bitwise XOR of the values in %rsrc and %rarg in %rdest.

-- xor %rarg, %rdest
   => xor %rdest, %rarg, %rdest

NOT - bitwise complement

** not %rsrc, %rdest
   Encoding: 0xf000 | (rdest << 4) | rsrc
   Place the bitwise complement of the value in %rsrc in %rdest.

-- not %rdest
   => not %rdest, %rdest

ROL/ROR - rotate bits in word

** rol %rsrc, <immed4>, %rdest
   Encoding: 0x2000 | (rdest << 8) | (rsrc << 4) | immed4
   Place %rsrc rotated left by <immed4> bits in %rdest.
   Affects the C and Z flags.

-- rol <immed4>, %rdest
   => rol %rdest, <immed4>, %rdest

** ror %rsrc, <immed4>, %rdest
   Encoding: 0x3000 | (rdest << 8) | (rsrc << 4) | immed4
   Place %rsrc rotated right by <immed4> bits in %rdest.
   Affects the C and Z flags.

-- ror <immed4>, %rdest
   => ror %rdest, <immed4>, %rdest

SL/SLA/SLL/SR/SRA/SRL - shift bits in word

** sl %rsrc, <immed4>, %rdest
-- sla %rsrc, <immed4>, %rdest
-- sll %rsrc, <immed4>, %rdest
   Encoding: 0x8000 | (rdest << 8) | (rsrc << 4) | immed4
   Place %rsrc in %rdest, and shift it left by <immed4> bits.
   Affects the C and Z flags.

-- sl <immed4>, %rdest
-- sll <immed4>, %rdest
-- sla <immed4>, %rdest
   => sl %rdest, <immed4>, %rdest

** sr %rsrc, <immed4>, %rdest
-- sra %rsrc, <immed4>, %rdest
   Encoding: 0x9000 | (rdest << 8) | (rsrc << 4) | immed4
   Place %rsrc in %rdest, and shift it right by <immed4> bits,
   performing sign-extension of negative values.
   Affects the C and Z flags.

-- sr <immed4>, %rdest
-- sra <immed4>, %rdest
   => sr %rdest, <immed4>, %rdest

** srl %rsrc, <immed4>, %rdest
   Encoding: 0xa000 | (rdest << 8) | (rsrc << 4) | immed4
   Place %rsrc in %rdest, and shift it right by <immed4> bits, 
   treating values as unsigned.
   Affects the C and Z flags.

-- srl <immed4>, %rdest
   => srl %rdest, <immed4>, %rdest

BL/BC/BGE/BNC/BE/BZ/BNE/BNZ/BG/B - conditional and unconditional short branch

** bl <disp8>
-- bc <disp8>
   Encoding: 0xf600 | (disp8 >> 1)
   Branch to <disp8> if the C flag is set.

** be <disp8>
-- bz <disp8>
   Encoding: 0xf700 | (disp8 >> 1)
   Branch to <disp8> if the Z flag is set.

** bne <disp8>
-- bnz <disp8>
   Encoding: 0xf800 | (disp8 >> 1)
   Branch to <disp8> if the Z flag is not set.

** bge <disp8>
-- bnc <disp8>
   Encoding: 0xf900 | (disp8 >> 1)
   Branch to <disp8> if the C flag is not set.

** bg <disp8>
   Encoding: 0xfa00 | (disp8 >> 1)
   Branch to <disp8> if neither the Z flag nor the C flag is set.

** b <disp8>
   Encoding: 0xfb00 | (disp8 >> 1)
   Branch to <disp8> unconditionally.

JAL/J/CALL/RET - long jump, procedure call/return

** jal %raddr + %roff, %rret
   Encoding: 0xc000 | (rret << 8) | (roff << 4) | raddr
   Jump to the address %raddr + %roff, placing the address of the
   instruction which would have been executed next in %rret.

-- jal <symexpr> + %roff, %rret
   => sethi %hi(<symexpr>), %t6
      movb %lo(<symexpr>), %t6
      jal %t6 + %roff, %rret

-- jal <symexpr>, %rret
   => sethi %hi(<symexpr>), %t6
      movb %lo(<symexpr>), %t6
      jal %t6 + %0, %rret

-- jal %raddr, %rret
   => jal %raddr + %0, %rret

-- j <symexpr> + %roff
   => sethi %hi(<symexpr>), %t6
      movb %lo(<symexpr>), %t6
      jal %t6 + %roff, %0

-- j <symexpr>
   => sethi %hi(<symexpr>), %t6
      movb %lo(<symexpr>), %t6
      jal %t6 + %0, %0

-- j %raddr + %roff
   => jal %raddr + %roff, %0

-- j %raddr
   => jal %raddr + %0, %0

-- call <symexpr> + %roff
   => sethi %hi(<symexpr>), %t6
      movb %lo(<symexpr>), %t6
      jal %t6 + %roff, %ra

-- call <symexpr>
   => sethi %hi(<symexpr>), %t6
      movb %lo(<symexpr>), %t6
      jal %t6 + %0, %ra

-- call %raddr + %roff
   => jal %raddr + %roff, %ra

-- call %raddr
   => jal %raddr + %0, %ra

-- ret
   => jal %ra + %0, %0

INT/IRET - interrupt call/return

** int <immed7>
   Encoding: 0xfc00 | immed7
   Store the processor state to the kernel stack, and execute the
   handler for interrupt <immed7>. (Doesn't actually do this in
   emu or emu--, but user code can't tell the difference anyhow.)
   See syscall.h for some useful (i.e., implemented) interrupts.

** iret
   Encoding: 0xffff
   [supervisor mode only]
   Restore the processor state from the kernel stack.

NOP - no operation

** nop
   Encoding: 0xfff0
   No effect.

PUSH/POP - stack pseudo-instructions

-- push
   => sub 2, %sp

-- push %rsrc
   => sub 2, %sp
      mov %rsrc, [%sp + 2]

-- pop
   => add 2, %sp

-- pop %rdest
   => mov [%sp + 2], %rdest
      add 2, %sp

== NOTES ==============================================================
- <value> is an arithmetic expression involving constants, the
  usual C arithmetic operators, and the %hi()/%hi16() and %lo()/%lo16
  constructs.
- %hi(x) is equivalent to x >> 8
- %hi16(x) is equivalent to x >> 16
- %lo(x) is equivalent to x & 0xff
- %lo16(x) is equivalent to x & 0xffff
- constants can be:
  - decimal digits, with optional sign prefix
  - 0 followed by octal digits, with optional sign prefix
  - 0x followed by hexadecimal digits, with optional sign prefix
  - a C character constant (i.e. 'c', '\n', '\0377', '\0x1e', etc.)
- <symexpr> is one of the following:
  - <symbol>
  - .
  - <symexpr> + <value>
  - <symexpr> - <value>

- <offset4> is a doubled 4-bit <value>

- <disp8> is one of the following:
  - <value> (which is interpreted as an absolute address)
  - <symexpr>
  <disp8> has a limited range since is uses a signed, doubled 8-bit
  displacement from the instruction following the branch

- <immedX> is an X-bit <value>

- <symexpr8> is %hi(<symexpr>), %lo(<symexpr>), or <immed8>

=======================================================================
