Assembly Lesson 5

Assembly Lesson 5

Offsets and Indirect Addressing

A variable that you have stored in memory is composed of two things, the contents of the variable (the value you have stored in it) and the address in memory where that value can be found. Once you understand this concept and how it applies to assembly language, a whole new world opens up in regards to understanding computers. Here is a diagram of a variable:

Contents:	10011011
Memory address:	001A

By creating a variable, we'll call it num1, that contains 155d, the computer reserves a byte size chunk of memory for keeping this value in at the memory location of 001A. In other words, if you started at the very beginning of your memory (memory address of 00) and counted out 1Ah (that's 26 in base ten) bytes away from that, you would find 10011011 at that location. In C++ and other high level languages it completely hides this process from you, but in assembly you have to give the program the location in memory of variables you want to use.

The arrow points to the location referenced by var1 and the white space indicates how much of the system's memory it takes up. Clearly a real computer would have thousands of times more memory than this diagram, which leads us to the next problem. How large a space does it take to store a memory address? When you get into the millions and billions of bits away from 00, a memory address may easily be larger than an int or even a long int. How do you find an address in memory using assembly registers that are only 16 bits max? Assembly uses a technique called direct-offset addressing to solve this. Using direct-offset addressing, the address is broken into two pieces and two values work in tandem to find it. One value points to the beginning of a segment (such as the data segment, code segment, or stack segment) and the other tells how many bytes away from the beginning of that segment is the piece of data you are looking for. They are often notated as segment:offset. For example, the cs and ip registers (written as cs:ip) combine to form the address of the current instruction being executed in your program. The cs addresses the beginning of the code segment and the ip increments itself after every instruction to tell you where in the code you are.

.data msg db "i heart computar mashines!!111" .code mov ax, @data mov ds, ax mov ah, 9 mov dx, offset msg int 21h [...]

Here we have something very similar to the Hello World program, except now you have a better idea of what is going on. Just as the cs:ip registers work together, the ds register and the dx register (containing the offset of your variable) are used to find the piece of data you're going to work with. The first two lines move the correct address for the data segment into the ds register. The "mov dx, offset msg" puts into dx the number of bytes away from ds you would move to find msg.

Here, since msg is 10h bytes away from the start of the data segment the value of "offset msg" would be 10h and moving that into the dx register gives you all the information you need to find msg in memory. With the beginning address of the data segment in ds, and the offset for msg in dx, ds:dx is the complete memory address for msg.

Indirect Addressing

Indirect addressing involves using the memory address of a variable to access it instead of using the value directly. Certain registers (bx, di, and si) can be used for this task, but the majority cannot. Using the same values as the previous example, let's move the value of A6h into bx. If we say "mov ax, bx" we are putting the literal value in bx, A6h, into ax. However, we could use indirect addressing and the similar line "mov ax, [bx]" would move the contents of bx into ax. This is very useful when dealing with arrays.

Arrays in Assembly

Up to this point you can look back at all the variables that have been declared and you will see that they have all been created using the "db" (define byte) instruction, even the strings. That is because when you declare a string (example: string1 db "array",0Dh,0Ah,"$") you are putting all the characters in memory but you are defining them one byte at a time and the offset associated with that variable name points only at the very first character. Using the example I just declared, and assuming that the first character is at the memory location 100h:

character:	a	r	r	a	y			$
ASCII Code (hex):	61h	72h	72h	61h	79h	0Dh	0Ah	24h
Memory location:	100h	101h	102h	103h	104h	105h	106h	107h

Now look at this code segment that uses the int function 2, that takes a single character stored in the dl register and prints it to the screen.

mov ah, 2 mov bx, 3 mov dl, [string1+bx] int 21h

Accessing the second a in string1 by saying "[string1+bx]" is very similar to the direct-offset scheme learned earlier, but now string1 is the base and bx is the offset. Instead of moving the actual value in bx (or the value of string1+bx) we are moving the data at the memory location of string1:bx into dl. Later you will learn how to use a loop that can increment bx and sequentially access every value in the array. There are two other syntaxes for indirectly addressing the contents of an array besides "[string1+bx]":

string1[bx] (this is the one I will use from here on, since it is most like the C++ equivalent of accessing array elements with an index value)

[string1]+bx

Different Defining Variable Directives

Almost all of the variables you will declare will be bytes (db) but some will need to be larger than that. The directives dw (define word) and dd (define doubleword) are used to make larger variables. A "word" is the same as two bytes (two characters, 16 bits) and is the same size as a 16 bit register. A doubleword is 4 bytes, but you won't need to use it much here. However it is important to know that when you are traversing an array of words, you need to increment your index counter by two instead of one or you will be traversing by the byte instead of by the word.

Hosted by www.Geocities.ws