Assembly Lesson 5
A variable that you have stored in memory is composed
of two things, the contents of the variable (the value you have stored in it)
and the address in memory where that value can be found. Once you understand
this concept and how it applies to assembly language, a whole new world opens up
in regards to understanding computers. Here is a diagram of a variable:
| Contents: | 10011011 |
| Memory address: | 001A |
By creating a variable, we'll call it num1, that
contains 155d, the computer reserves a byte size chunk of memory for keeping
this value in at the memory location of 001A. In other words, if you started at
the very beginning of your memory (memory address of 00) and counted out 1Ah (that's
26 in base ten) bytes away from that, you would find 10011011 at that location.
In C++ and other high level languages it completely hides this process from you,
but in assembly you have to give the program the location in memory of variables
you want to use.
The arrow points to the location referenced by var1 and the white space
indicates how much of the system's memory it takes up. Clearly a real computer
would have thousands of times more memory than this diagram, which leads us to
the next problem. How large a space does it take to store a memory address? When
you get into the millions and billions of bits away from 00, a memory address
may easily be larger than an int or even a long int. How do you find an address
in memory using assembly registers that are only 16 bits max? Assembly uses a
technique called direct-offset addressing to solve this. Using direct-offset
addressing, the address is broken into two pieces and two values work in tandem
to find it. One value points to the beginning of a segment (such as the data
segment, code segment, or stack segment) and the other tells how many bytes away
from the beginning of that segment is the piece of data you are looking for.
They are often notated as segment:offset. For example, the cs and ip registers (written
as cs:ip) combine to form the address of the current instruction being executed
in your program. The cs addresses the beginning of the code segment and the ip
increments itself after every instruction to tell you where in the code you are.
.data msg db "i heart computar mashines!!111" .code mov ax, @data mov ds, ax mov ah, 9 mov dx, offset msg int 21h [...]
Here we have something very similar to the Hello World
program, except now you have a better idea of what is going on. Just as the cs:ip
registers work together, the ds register and the dx register (containing the
offset of your variable) are used to find the piece of data you're going to work
with. The first two lines move the correct address for the data segment into the
ds register. The "mov dx, offset msg" puts into dx the number of bytes away from
ds you would move to find msg.
Here, since msg is 10h bytes away from the start of the data segment the value
of "offset msg" would be 10h and moving that into the dx register gives you all
the information you need to find msg in memory. With the beginning address of
the data segment in ds, and the offset for msg in dx, ds:dx is the complete
memory address for msg.
Indirect addressing involves using the memory address
of a variable to access it instead of using the value directly. Certain
registers (bx, di, and si) can be used for this task, but the majority cannot.
Using the same values as the previous example, let's move the value of A6h into
bx. If we say "mov ax, bx" we are putting the literal value in bx, A6h, into ax.
However, we could use indirect addressing and the similar line "mov ax, [bx]"
would move the contents of bx into ax. This is very useful when dealing
with arrays.
Up to this point you can look back at all the variables
that have been declared and you will see that they have all been created using
the "db" (define byte) instruction, even the strings. That is because when you
declare a string (example: string1 db "array",0Dh,0Ah,"$") you are putting all
the characters in memory but you are defining them one byte at a time and the
offset associated with that variable name points only at the very first
character. Using the example I just declared, and assuming that the first
character is at the memory location 100h:
| character: | a | r | r | a | y |
|
|
$ |
| ASCII Code (hex): | 61h | 72h | 72h | 61h | 79h | 0Dh | 0Ah | 24h |
| Memory location: | 100h | 101h | 102h | 103h | 104h | 105h | 106h | 107h |
Now look at this code segment that uses the int function 2, that takes a single
character stored in the dl register and prints it to the screen.
mov ah, 2 mov bx, 3 mov dl, [string1+bx] int 21h
Accessing the second a in string1 by saying "[string1+bx]" is very similar to the direct-offset scheme learned earlier, but now string1 is the base and bx is the offset. Instead of moving the actual value in bx (or the value of string1+bx) we are moving the data at the memory location of string1:bx into dl. Later you will learn how to use a loop that can increment bx and sequentially access every value in the array. There are two other syntaxes for indirectly addressing the contents of an array besides "[string1+bx]":
Almost all of the variables you will declare will be
bytes (db) but some will need to be larger than that. The directives dw (define
word) and dd (define doubleword) are used to make larger variables. A "word" is
the same as two bytes (two characters, 16 bits) and is the same size as a 16 bit
register. A doubleword is 4 bytes, but you won't need to use it much here.
However it is important to know that when you are traversing an array of words,
you need to increment your index counter by two instead of one or you will be
traversing by the byte instead of by the word.