Simple facts
HP-UX 11
Oracle 8.?
2 CPUs
1G RAM
6 mirrored pairs disk
fact file = 120Meg = 400,000
rows
We load this data into a stage
table first, and then use PL/SQL packages to
validate business rules and
load the fact table.
We have only been able to achieve
10 records per second loading the fact
table. The size of the
file mentioned above is not relevant. A file of 150K
rows loads just as slow.
We do not have much data in the fact table yet
either. We just started
loading.
Our PL/SQL is fairly straightforward,
but does about 9 lookups into other
tables for validation. We
are thinking about making the smaller table
lookups into array lookups.
Will this help? Most of the lookups are on very
small tables (< 20 rows).
Two of the lookups are on larger tables (100K -
200K rows). Are we doing too
many lookups per record?
Also, the cursor on the stage
table is implicit. Should this be explicit for
perf reasons?
We are also issuing a commit
after each row. Should we change that to every
100, or every 1000 rows?
All of the things I can think
about are code related since I am not a DBA. We
can tweak the code, but I
am looking for an order of magnitude of improvement
that may not be possible tweaking
code.
Does anyone have any other
recommendations that may help, or any benchamrks to
compare with? I think
that 10 records a second is poor, even if we perform 9
lookups.
Other DB things we have tried
We moved data files around
disk to spread i/o issue.
We have not yet used table
partitioning.
We are dropping indexes on
fact table.
A DBA has performed general
DB tuning.
can you use declaritive integrity
instead of lookups? declaritive check
constraints + foreign keys
will be much faster then procedural code any day.
can you 'parallelize' the plsql
processing. can you kick off X scripts, each of
which might be able to process
10 rows/second?
have you set timed statistics
to true and enabled sql_trace and used tkprof to
analyze the results (see server
tuning manual for info on the above)....
>Also, the cursor on the stage
table is implicit. Should this be explicit for
>perf reasons?
>
>We are also issuing a commit
after each row. Should we change that to every
>100, or every 1000 rows?
>
make it more then 1. go for 1000 or more.
>All of the things I can think
about are code related since I am not a DBA. We
>can tweak the code, but I
am looking for an order of magnitude of improvement
>that may not be possible
tweaking code.
>
you might just be able to run
10 at the same time and get the speed up. You'll
have to "partition" your data
somehow in the stage table by key or something to
give each process something
to work with.
If you check the execution
plans for the 9 lookups you
will probably find that a
number of them are doing tablescans,
giving you perhaps several
hundred comparisons per second.
Yes, one commit per row will
also have an impact.
No, the difference between
implicit and explicit will be insignificant
Further, if you are using a
single PL/SQL cursor on the
stage table you are either
single-threading, in which case
you are wasting a whole CPU,
or you are multi-streaming
and without careful planning
you could be getting signficant
conflicts on the inserts -
particularly if your fact table is
pre-indexed.
Suggested Strategies (in probable
order of cost/benefit)
1) Make
sure you have good access paths on the lookup
2) Commit
every 100 rows
3) Try to
partition the input data and run multiple streams
but increase initrans on the fact table/indexes
and set freelists = number of threads
4) Look
at options for:
create intermediate table unrecoverably in parallel as
select join between stage table and lookup tables.
and test for lookup failures by the presence of null columns
Note: A good strategy
- if you are using a PL/SQL loop - is to
update the stage table so
that you can restart the loop in case
of failure.
--
Jonathan Lewis
Yet another Oracle-related
web site: www.jlcomp.demon.co.uk