Why DBWR is slow?

Why DBWR is slow?(x10)

I have a huge performance problem with the DBWR-proces(ses) on an AIX
4.2.1. While the redo-logs (4x50Mb) get filled in a couple of minutes
(during which there is high IO on the disk containing them), the
checkpointing (which starts when a redo-log is half-full) takes more than
an hour, resulting, of course, in locking up when all of the redo-logs are
full.

The data-files of the tablespace being loaded via import, are on one
SSA-disk. This disk also contains the log of the Journaled File System.
The DBWR(s) show(s) activity of around 70-80. But iostat (5 sec sample)
shows:
Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
hdisk3_b         4.0      16.8       4.2          0        84
hdisk3_d         0.0       0.0       0.0          0         0
hdisk3_e         0.0       0.0       0.0          0         0
hdisk3_g         0.0       0.0       0.0          0         0
hdisk4_b         9.4      31.9       8.0          0       160
hdisk4_d         0.0       0.0       0.0          0         0
hdisk4_e         0.0       0.0       0.0          0         0
hdisk4_g         0.0       0.0       0.0          0         0

hdisk3_b and hdisk4_b each contain a copy of the JFS-log, so less than 80
Kb were written to the data-files in 5 seconds.

I started with 4 DBWR-processes with use_async_io = false, then switched to
async_io which resulted in the DBWR doing all the work (sic) and the 4
child-processes idling. Same lack of performance.

A very funny symptom: I just shut down abort the database, but the dbwr
kept running for a while, never saw that before. See for yourself:
SVRMGR> shutdown
^CORA-01013: user requested cancel of current operation
Database dismounted.
^CORA-01013: user requested cancel of current operation
SVRMGR> shutdown abort
ORACLE instance shut down.
SVRMGR> host ps -fu orapre | grep MIS
orapre 22952 25970   5 09:39:38 pts/0 0:00 grep MIS
orapre 26228 25970   0 09:25:29      - 0:00 oracleMIS
(DESCRIPTION=(LOCAL=YES
)(ADDRESS=(PROTOCOL=beq)))
orapre 34194     1 69 23:17:19      - 287:26 ora_dbwr_MIS
SVRMGR> exit
Server Manager complete.

Ans:

Slaves consume small amount of CPU because they perform only I/O.
All cpu-intensive work (like buffer lists scans) is done by master (dbwr).
I suspect some problems related to latches preventing dbwr from
effective maintaining dirty buffer and free buffer list.
BTW, is it multiprocessor machine ?
Sure, some problems may be related to particular RDBMS release.
Let's know about your RDBMS version (not dying processes, but you should
try using shutdown immediate instead of shutdown normal; ^C; shutdown
abort) and buffer cache size.
Some problems may be easily found via utlbstat/utlestat report.

--
Piotr Kolodziej [email protected]
Just my private opinion.

Reply:

My excuses for the format of this reply, I don't succeed in setting a fixed
font, you should read it with Courier.

I can't find the reason for the bad performance in the utlbstat/utlestat
myself.

It's an 8 processor machine RS/6000 J50

version:

Oracle7 Server Release 7.3.4.3.0 - Production
PL/SQL Release 2.3.4.3.0 - Production
CORE Version 3.5.4.0.0 - Production
TNS for IBM/AIX RISC System/6000: Version 2.3.4.0.0 - Production
NLSRTL Version 3.2.4.0.0 - Production

Latches:

LATCH_NAME         GETS        MISSES      HIT_RATIO   SLEEPS
SLEEPS/MISS
------------------ ----------- ----------- ----------- -----------
-----------
cache buffers chai    11019219           0           1           0
0
cache buffers lru       113078           7           1           0
0
dml lock allocatio        4147           0           1           0
0
enqueue hash chain       35256           0           1           0
0
enqueues                 64709           0           1           0
0
ktm global data             12           0           1           0
0
latch wait list             54           0           1           0
0
library cache            64590          44        .999          18
409
library cache load         260           0           1           0
0
list of block allo        7101           0           1           0
0
loader state objec          83           0           1           0
0
messages                157629         108        .999           2
019
modify parameter v         311           0           1           0
0
process allocation           1           0           1           0
0
redo allocation         298484         120           1           1
008
redo copy                 1172        1157        .013        1530
1.322
row cache objects        42639           0           1           0
0
session allocation        2736           1           1           2
2
session idle bit          3992           0           1           0
0
session switching          522           0           1           0
0
shared pool              12042          15        .999           9
6
sort extent pool           414           0           1           0
0
system commit numb      229645           0           1           0
0
transaction alloca        3897           0           1           0
0
undo global data         70373           0           1           0
0

LATCH_NAME         NOWAIT_GETS      NOWAIT_MISSES    NOWAIT_HIT_RATIO
------------------ ---------------- ---------------- ----------------
cache buffers chai           269124                0                1
cache buffers lru            386570                8                9
process allocation                1                0                1
redo copy                    140787             2534             2535

file statistics:

TABLE_SPACE                    FILE_NAME
READS      BLKS_READ READ_TIME WRITES     BLKS_WRT   WRITE_TIME MEGABYTES
------------------------------
------------------------------------------------
---------- ---------- ---------- ---------- ---------- ----------
----------
ORADATA                        /misb/oradata02MIS.dbf
         0          0          0      11368      11368 122441112
1074
ORADATA                        /misb/oradata03MIS.dbf
        15         15         21       4003       4003   38897899
944

--
Kind Regards,

Gerard

Reply to the Reply:

What's the value of log_simultaneous_copies ?
Specify it 16 as 2*cpu_count (default=cpu_count) in init<sid>.ora.
But it is not directly related to dbwr.

>TABLE_SPACE                    FILE_NAME
>READS      BLKS_READ READ_TIME WRITES     BLKS_WRT   WRITE_TIME MEGABYTES
>------------------------------
>------------------------------------------------
>---------- ---------- ---------- ---------- ---------- ----------
>----------
>ORADATA                        /misb/oradata02MIS.dbf
>         0          0          0      11368      11368 122441112
>1074
>ORADATA                        /misb/oradata03MIS.dbf
>        15         15         21       4003       4003   38897899
>944

Oupsss.... However Oracle docs say that write time stats may be
misleading, but I don't think that in such large scale....
I'm affraid dbwr may wait for sth (lgwr? have you checkpoint process
started ?).
Do they contain temp segments ? It looks a bit like that ...
If so configure using sort_direct_writes.
What about waiting event sections (background and non-background
processes) in report? It is very interesting.
What's the value of db_block_lru_latches ? If it is default value
(=cpu_count/2) set it to 8 (=cpu_count) when possible.

--
Piotr Kolodziej [email protected]
Just my private opinion.

Hosted by www.Geocities.ws