Assembly Lesson 2

Memory in Assembly

If you've done much reading at all about assembly (or talked to anyone who has attempted to learn assembly and failed) then you will have been told about how scary it is, that you "work directly with the memory" [gasp!]. This isn't nearly as difficult to do or understand as people would have you believe, just different from C++.

First off, there are variables in assembly that you can define, use, and reuse. But they are all declared at once, in a special section of your program at the very beginning. I'll rephrase that too, because it's important: you cannot define variables in the body of your program as you are going along. All variables that you are going to use in the scope of your program must be defined at the beginning. (in a way this is similar to using function prototypes in C++; the computer must be told to set aside a certain amount of memory for your variables) Unlike C++, capitalization does not matter, and although you can use numbers and letters, your variable's name must start with a letter. Other special characters can be used in variables too, but using them is not recommended because sometimes those special characters have special properties associated with them.

An assembly language program is divided up into segments, and all your variables must be declared in the data segment. The format of it goes something like this: (for now, don't worry about the syntax of declaring variables in assembly. If you're really curious, the "db" stands for "define byte" which as you learned in the last section is eight bits of memory.)

.data
var1 db 45
var2 db 29

.code
[code goes here]

Registers

As you can probably see, this gets a bit inconvenient depending on how many times you will use a certain variable. What if you only need it once? Waste all that memory for it? To get around this, assembly implements a concept called registers, which are high-speed memory locations. Registers are very similar to how variables are done in the TI calculator line (I'm thinking TI-83, because that's what I use, but I believe it is the same in all of them). In the calculators there are 27 variables you can use, A-Z and theta. When you put a number into one of the variables (say Q), you can use it anywhere else on the calculator with Q. Different programs can be sure to use the same value by using Q. But you have to be careful not to replace the value in Q with something else. This is the way that registers in assembly work: you cannot change their name, there is a set number of them , and any place in your program can use the same value if it is contained in a register. Registers are great for holding temporary values, doing calculations with (in which numbers will be discarded as soon as the answer is found), use as counter variables, and anything else that requires high-speed data access. The registers are actually on the CPU, so they can be accessed at speeds far greater than that of variables.

To keep the speed of assembly language execution high, the computer forbids operations that act on two variables. These are called "memory to memory" operations, and they are forbidden. Take for example a simple add instruction:

add var1, var2

Clearly we want to add var2 to var1 (the operation always acts on the first operand in the statement), but this is forbidden. We would have to put var2 into a register and the add it, like so.

mov ax, var2 add var1, ax

Which brings us to our next point about registers: they all have a specialized function and cannot necessarily be used interchangeably.

General Purpose Registers:
AX - "accumulator register" - This is generally the default register for doing arithmetic and data moving operations. It is fastest and it can do many different kinds of work.
BX - "base register" - This can do the same things that AX can do, slightly slower, and it can also be used for indirect addressing; not important now, but very important later.
CX - "counting register" - unlike base and accumulator, the counting register does exactly what its name says. Although it can also be used for math and data movement, it is primarily used in loops. When the loop instruction is used, CS automatically decrements itself and when it reaches zero the loop breaks.
DX - "data register" - used for storing the memory addresses of data that is going to be used in input and output.

These three are the main registers you will be using at first. Understand that many of the others will be used, but you will not have to be directly working with them. I include the rest merely for completeness at this point.

Segment registers - not to be changed in the code of your program or you will fuck shit up!
CS - Code segment - contains the memory address of where your program code starts. If you change this to some random location in memory the computer will start executing whatever it finds there (thinking that it is your program) and you can seriously mess things up.
DS - Data segment - contains the memory address of where your variables can be found. If you change this it won't be able to find your variables and will give you random data when you try to access them.
SS - Stack segment - contains the memory address of the system stack. The stack is yet another place to hold data temporarily. Changing this renders all your valuable data on the stack unreachable.
ES - Extra segment - for storing more memory if your data segment gets full. You will never need this in the scope of this tutorial.

Index registers - Contain memory addresses instead of actual data
SI - Source index - Can be used in indirect addressing, like BX.
DI - Destination index - Can be used in indirect addressing, like BX. Most often used in tandem with SI.
BP - Base pointer - used to locate variables on the system stack, but not ones that are on the top of the stack.
SP - Stack pointer - contains the memory address of the value on the top of the system stack.

Special registers - Don't touch these either
IP - Instruction pointer - holds the memory address of the next instruction to be executed in your program. It takes care of itself and by changing it you are likely to mess things up.
Flags - the flags register has 16 bits (they all do actually) in which each bit, by virtue of being on or off, relates information about the current state of the CPU. We do not need to memorize which bits mean what, but be aware that different assembly language instructions may access or modify the flags register.

Register Capacity

All of the above registers are 16 bits, which is equal to 2 bytes and can hold a maximum of 1111111111111111b, FFFFh, or 65535d. When you are moving data from one register to another, or from one memory location to a register, the sizes must match. That means that the variables declared using db (define byte) above cannot be moved into a register that is two bytes long. How to get around this? All the general purpose registers can be broken in half, into a one-byte high and low end for each one. As a result we can use AL, AH, BL, BH, CL, CH, and so on. These are not separate from AX, BX, CX. They just allow you to access half a register at a time. Look at the following example of operations on the AX register.

AX	AH	AL
AX	01101000	11110010

Now we move something new into the high end of the register, into AH.

mov ah, 11110000

AX now looks like this:

AX	AH	AL
AX	11110000	11110010

An addition operation on the AH register affects just the AH register.

add ah, 1

AX now looks like this:

AX	AH	AL
AX	11110001	11110010

But if we add something to AX it forgets that the number is divided into two parts and executes the operation on the lowest end of the AX register.

add ax, 1

AX now looks like this:

AX	AH	AL
AX	11110001	11110011

This doesn't cover all aspects of assembly language memory, but doing that would overwhelm you, the beginner, with details that you won't need for a while. I'll gradually introduce new concepts as needed throughout later lessons.

Hosted by www.Geocities.ws