If you've done much reading at all about assembly (or talked to anyone who
has attempted to learn assembly and failed) then you will have been told about
how scary it is, that you "work directly with the memory" [gasp!]. This isn't
nearly as difficult to do or understand as people would have you believe, just
different from C++.
First off, there are variables in assembly that you can define, use, and
reuse. But they are all declared at once, in a special section of your program
at the very beginning. I'll rephrase that too, because it's important: you
cannot define variables in the body of your program as you are going along. All
variables that you are going to use in the scope of your program must be defined
at the beginning. (in a way this is similar to using function prototypes in C++;
the computer must be told to set aside a certain amount of memory for your
variables) Unlike C++, capitalization does not matter, and although you can use
numbers and letters, your variable's name must start with a letter. Other
special characters can be used in variables too, but using them is not
recommended because sometimes those special characters have special properties
associated with them.
An assembly language program is divided up into segments, and all your
variables must be declared in the data segment. The format of it goes something
like this: (for now, don't worry about the syntax of declaring variables in
assembly. If you're really curious, the "db" stands for "define byte" which as
you learned in the last section is eight bits of memory.)
.data
var1 db 45
var2 db 29
.code
[code goes here]
As you can probably see, this gets a bit inconvenient depending on how many
times you will use a certain variable. What if you only need it once? Waste all
that memory for it? To get around this, assembly implements a concept called
registers, which are high-speed memory locations. Registers are very similar
to how variables are done in the TI calculator line (I'm thinking TI-83, because
that's what I use, but I believe it is the same in all of them). In the
calculators there are 27 variables you can use, A-Z and theta. When you put a
number into one of the variables (say Q), you can use it anywhere else on the
calculator with Q. Different programs can be sure to use the same value by using
Q. But you have to be careful not to replace the value in Q with something else.
This is the way that registers in assembly work: you cannot change their name,
there is a set number of them , and any place in your program can use the same
value if it is contained in a register. Registers are great for holding
temporary values, doing calculations with (in which numbers will be discarded as
soon as the answer is found), use as counter variables, and anything else that
requires high-speed data access. The registers are actually on the CPU, so they
can be accessed at speeds far greater than that of variables.
To keep the speed of assembly language execution high, the computer forbids
operations that act on two variables. These are called "memory to memory"
operations, and they are forbidden. Take for example a simple add instruction:
add var1, var2
Clearly we want to add var2 to var1 (the operation always acts on the first operand in the statement), but this is forbidden. We would have to put var2 into a register and the add it, like so.
mov ax, var2 add var1, ax
Which brings us to our next point about registers: they all have a
specialized function and cannot necessarily be used interchangeably.
General Purpose Registers:
AX - "accumulator register" - This is generally the default register for doing
arithmetic and data moving operations. It is fastest and it can do many
different kinds of work.
BX - "base register" - This can do the same things that AX can do, slightly
slower, and it can also be used for indirect addressing; not important now, but
very important later.
CX - "counting register" - unlike base and accumulator, the counting register
does exactly what its name says. Although it can also be used for math and data
movement, it is primarily used in loops. When the loop instruction is used, CS
automatically decrements itself and when it reaches zero the loop breaks.
DX - "data register" - used for storing the memory addresses of data that is
going to be used in input and output.
These three are the main registers you will be using at first. Understand that
many of the others will be used, but you will not have to be directly working
with them. I include the rest merely for completeness at this point.
Segment registers - not to be changed in the code of your program or you will
fuck shit up!
CS - Code segment - contains the memory address of where your program code
starts. If you change this to some random location in memory the computer will
start executing whatever it finds there (thinking that it is your program) and
you can seriously mess things up.
DS - Data segment - contains the memory address of where your variables can be
found. If you change this it won't be able to find your variables and will give
you random data when you try to access them.
SS - Stack segment - contains the memory address of the system stack. The stack
is yet another place to hold data temporarily. Changing this renders all your
valuable data on the stack unreachable.
ES - Extra segment - for storing more memory if your data segment gets full. You
will never need this in the scope of this tutorial.
Index registers - Contain memory addresses instead of actual data
SI - Source index - Can be used in indirect addressing, like BX.
DI - Destination index - Can be used in indirect addressing, like BX. Most often
used in tandem with SI.
BP - Base pointer - used to locate variables on the system stack, but not ones
that are on the top of the stack.
SP - Stack pointer - contains the memory address of the value on the top of the
system stack.
Special registers - Don't touch these either
IP - Instruction pointer - holds the memory address of the next instruction to
be executed in your program. It takes care of itself and by changing it you are
likely to mess things up.
Flags - the flags register has 16 bits (they all do actually) in which each bit,
by virtue of being on or off, relates information about the current state of the
CPU. We do not need to memorize which bits mean what, but be aware that
different assembly language instructions may access or modify the flags register.
All of the above registers are 16 bits, which is equal to 2 bytes and can hold a maximum of 1111111111111111b, FFFFh, or 65535d. When you are moving data from one register to another, or from one memory location to a register, the sizes must match. That means that the variables declared using db (define byte) above cannot be moved into a register that is two bytes long. How to get around this? All the general purpose registers can be broken in half, into a one-byte high and low end for each one. As a result we can use AL, AH, BL, BH, CL, CH, and so on. These are not separate from AX, BX, CX. They just allow you to access half a register at a time. Look at the following example of operations on the AX register.
| AX |
AH |
AL |
| 01101000 | 11110010 |
Now we move something new into the high end of the register, into AH.
mov ah, 11110000
AX now looks like this:
| AX |
AH |
AL |
| 11110000 | 11110010 |
An addition operation on the AH register affects just the AH register.
add ah, 1
AX now looks like this:
| AX |
AH |
AL |
| 11110001 | 11110010 |
But if we add something to AX it forgets that the number is divided into two
parts and executes the operation on the lowest end of the AX register.
add ax, 1
AX now looks like this:
| AX |
AH |
AL |
| 11110001 | 11110011 |
This doesn't cover all aspects of assembly
language memory, but doing that would overwhelm you, the beginner, with details
that you won't need for a while. I'll gradually introduce new concepts as needed
throughout later lessons.