A backup is when the Informix Engine gets data, sends it to OnBar and then OnBar
sends it to the storage manager. This process will repeat itself until the engine
signals there is no more data left to send. With this in mind a bottleneck can
exist in only a few places.
Disk Access: The Informix engine sends I/O requests to the OS in order
for the data on disk to get in to shared memery. This is rather expensive. Once
data is in shared memory it can move pretty fast to OnBar. Memory copies are a lot
faster than disk reads. Bottlenecks can occur here for a few simple reasons:
Plane slow hardware.
The dbspaces which are being backed at the same time all reside on
the same disk. This is a very popular problem. Not only does this mean that a
number of drives are sitting idle but also that the few drives that are running
are not running as efficiently as they could. If there are many dbspaces on the
same disk being backup up at the same time then the head of the disk is spending
a lot time time jumping around the disk. This is not what you want. You want to
have the head of this disk moving as little as possible. This can be done by
having OnBar backup one dbspace at a time per disk. That way the head of the disk will
start at a specific offset and smoothly move along the disk. (NOTE: This
suggestion is for large systems. Smaller systems may not see enough gain to
warrent these considerations).
Informix Engine: The Informix engine will 'collect' the data using 2
methods:
Reading form disk in sequential format. This means that for each dbspace
that is being backed up, Informix will go through each chunk associated with that
dbspace. For each chunk, Informix will start at its begining offset and read up
to the end of that chunk using physical i/o described above.
Reading from shared memory. During an archive checkpoints may occur. When
this happens, Informix will gather pages that should be put on the archive and
sends it to OnBar. This process allows work to be done on the system while
an archive is running. Please note that performing an archive while the
system is under heavy load is not recomended. Archive should be scheduled
during off peak time. Other users accessing the system while an archive is
running will cause the archive time to increase.
Onbar: The only thing OnBar really does is receive buffers from the
Informix engine and sends those buffers
to the Storage manager. I have yet to see this process be a bottleneck.
Storage Manager: Generic tape writes are the slowest of all writes.
A lot of storage managers have come up with ways to increase the throughput to
very high levels. The 'general' limiting factor with storage managers is the
number of available drives. Normally you want to configure onbar so that
there are no more onbar processes than there are drives in the storage manager
(BAR_MAX_BACKUP #_of_tape_drives).
However all storage managers are different. Some storage managers have feature
called interleaving or multiplexing which can increase throughput. For more
information on how to configure onbar to utilixe that feature, see
Storage Manager Infterleaving Conciderations. Also, for more
detailed/reliable info please consult the Storage manager's vendors. This site
will not discuss how deal with optimizing the storage manager as all stoage
managers are different.
Steps to find where the slow-down is (backup):
Reduce the backup process to managable size which still exposes
the throughput problem
Ideally you would want to initally test backing up BAR_MAX_BACKUP
number of dbspaces.
Time an initial archive to act as a baseline for future timings.
Time the archive using the
BSALIBSUB library. You should notice that
nothing is happening on the Storage manager
side. That is because this library is not sending anything there. it is just
throwing it away. This is important because it is measuring the maximum speed
onbar can run with out any interference with the storage manager. If the
storage manager is running then the library was not installed correctly.
If there is a considerable difference between the times on #2 an #3 then the
bottleneck is with the storage manager. If they are pretty close then proceed to
#4.
Using the READER program we will
measure the speed in which the OS can read
the disk(s) associated with the dbspace which is being tested.
Remember to use the same block size that informix uses (8 *
BUFFSIZE). If the throughput numbers are withing 10% of the same throughput
numbers you got from #2 then the system is I/O bound. If there is greater
than ~10% differenct in time between #2 and #3 then then problem lies
somewhere with Onbar or the Informix Engine.
Using the XBSAWriter Utility, Measure
how fast the storage manager can receive data through the XBSA interface.
Run each of the 4 tests a few times so you can gain an average time. Your
results should look like the following:
From the results above (taken from a sample test) you can see that
the bottleneck lies in the Storage manager. Informix/ON-Bar can extract
the data at a rate of approximately 29 Secs per dbspace. The storage
manager can only receive the data at a rate of approximately 66 secs per
dbspace.
Once again knowing what to tune is the most important part of effectively
increasing ON-Bars performance