Assembly Lesson 4
Finally we have enough background knowledge to finally take a look at the typical "Hello World" program for assembly. It is a good deal longer for us than such a simple program would be in any higher level language, but it is much shorter for the computer. I'm going to give the source code, then follow that with a line by line analysis of the program, ending with a more abstract look at the general structure of any assembly program.
title Hello World .model small .stack 100h .data hello db "Hello World!", 0Dh, 0Ah, "$" .code mov ax, @data mov ds, ax mov ah, 9 mov dx, offset hello int 21h mov ah, 4Ch int 21h end
title Hello World -
this is more for your benefit than for the computer's. title is a keyword that
does nothing, just makes anything following it a comment. You can use this to
give your program a title that can be seen from the source code.
.model small - Assembly has several different "models" of programs that
you can use. These determine how much system memory is set aside for specific
segments of your code. Almost all the programs in this tutorial will be smalls.
.stack 100h - This tells the computer how much memory to set aside for
the system stack. Even though we don't explicitly use the stack in this program,
some of the functions we use will. In this case, we are setting aside 100
hexadecimal (256 in base ten) bytes for the stack.
.data - this is the segment of your program where all your variables must
be declared.
hello db "Hello World!", 0Dh, 0Ah, "$" - this line puts the string of
bytes "Hello World!" in memory and assigns the variable of hello to the memory
address that contains the first character in the string. The "0Dh" is the ASCII
code for carriage return (moving the cursor down a row), "0Ah" means line feed
(moving the cursor back to the front of the line), and the "$" is the assembly
language string terminator. It tells the function for printing strings that the
end of the string has been reached. Without it, the function keeps going through
memory until it happens to find a $. This could be far out of your data segment
and will give you lots of garbage (perhaps even unprintable) output. In other
words, don't forget your string terminator, even sometimes you may opt to leave
off the carriage return and linefeed (note, however, that assembly is much
happier if you have them together, and in that order).
.code - this is the segment where all the instructions for your program
will be located.
mov ax, @data - @data contains the memory address for the beginning of
the data segment. This line and the next work together to store that address in
the ds register, which is what your program uses to find the variables you've
created.
mov ds, ax - @data cannot be moved directly into the ds register because
there are limitations on data movement from memory directly to segment
registers. In order to move data from memory into a segment register, you must
use a general purpose register as a middleman. This line and the line above it
must be entered verbatim into all your programs as the first things in your
code.
mov ah, 9 - DOS has a collection of prewritten subroutines that you can
access with the int command. You place a code number in the ah register, that
corresponds with a predefined DOS function, and when you execute the int 21h
command it executes whatever function is set in the ah register. In this case, 9
is the DOS function for printing a string.
mov dx, offset hello - As was mentioned in the section about memory, dx
is the favorite location for holding variables used in input/output. Here the
print string function requires that the string be stored in dx, so we will
dutifully move hello into dx for it. (Actually it is not quite so simple, as the
next section on offsets and indirect addressing will tell you, but for now this
gives you a good idea of how the program works.)
int 21h - the int command does not stand for integer, it is a mnemonic
for "interrupt" because it interrupts the execution of your program and goes off
to do something else for a while. There are hardware interrupts, like stopping
your program because a key has been pressed, and software interrupts where it
goes off to run DOS functions, like this one. 21h is just one of many operands
the int function can take, but it is the only one we will be using for a long
time. The int function looks to see what function the user has specified in the
ah register and runs that one. Here it runs the print string function (9) and
you see "Hello World" on the screen.
** all of the numbers for specifying both int functions and the functions in
the ah register are in hexadecimal, but sometimes the h is left off numbers
under 10, because below 10, decimal and hexadecimal numbers are interchangeable.
However, it is very important not to forget the h on higher functions. int 21
and int 21h are NOT the same. **
mov ax, 4Ch - this is the DOS function for ending a program and returning
control to the OS. The equivalent of a return 0; at the end of your C++
programs.
int 21h - executes the command to end your program. If you forget these
two lines, your program will still compile but it will print a random integer at
the very end of your program's output and give you a warning.
end - somewhat self-explanatory. The syntax for this may change if you
start delineating your program with subroutines.
Presented all at once, the entire source code for even
so simple a program can be overwhelming, but hopefully breaking it down made it
easier to understand piecewise. Now we're going to step back and look at the
structure of the overall program.
The assembler relies on information contained in certain registers to tell it
where it can find the instructions to execute for your program (cs, code
segment), the variables you've assigned (ds, data segment), and where the system
stack is (ss, stack segment). The stack and the code it can find on its own, but
you should never change the information contained in the cs and ss registers.
The data segment needs to be explicitly defined, which you can do (and have to
do) with the first two lines written above. All data must be contained in the
data segment (everything between the .data and the next defined segment), or the
program will not be able to find it, and all your code must be in the code
segment (between .code and end). Other directives such as .stack and .model tell
the assembler other useful information about your program. Every line of code in
your program has the structure we outlined in the last section: a list of
commands, one command per line.
Assembly is not object oriented, nor does it lend itself to being easily
separated into functions. It is what is referred to as "falling rock" code,
where instructions are sequentially executed from start to finish without much
jumping about.