Oracle Performance Triage:

Stop the Bleeding!

Craig A. Shallahamer

OraPub, Inc.

Portland, Oregon USA

Draft Revision 2c December 18, 2000

Abstract

Triage; A system designed to produce the greatest benefit from limited treatment facilities for battlefield casualties… If you manage Oracle performance, you live in a battlefield. While people's lives usually are not at risk, you, your colleagues, and your family's peace of mind are definitely at risk. This paper will help you setup and execute Oracle performance triage. We start by introducing a powerful Oracle triage method followed by an introduction to a number of fundamental performance concepts. Once this foundation has been established, we move into an actual triage case study. The case study is very real, follows the presented triage method, and uses common UNIX performance tools and free OraPub tools.

Table of Contents

Introduction....................................................................................................................... 3

Overview: How to perform Oracle triage....................................................................... 3

Key Triage Concepts....................................................................................................... 5

The Holistic Problem Isolation Method (HPIM)................................................... 6

Methods of Contention Identification.................................................................... 6

The Decreasing Relevance of Increasing Data...................................................... 7

Understanding "Deltas"............................................................................................ 7

Two Basic Approaches to Gather Performance Data........................................... 8

Choosing the Appropriate Data Gathering Frequency........................................ 9

Total Performance Management (TPM)................................................................. 9

Case Study: Performing Oracle Triage........................................................................... 9

Don't panic................................................................................................................ 10

Scope out the current situation.............................................................................. 10

Document the current performance....................................................................... 10

Install your tools...................................................................................................... 11

Develop and simple communications strategy.................................................... 11

Isolate the problem................................................................................................... 11

Quickly perform your rock-solid analysis............................................................. 12

Quickly implement your recommendations.......................................................... 14

Monitor the situation............................................................................................... 14

Concluding thoughts..................................................................................................... 15

Figures.............................................................................................................................. 16

References....................................................................................................................... 17

Acknowledgments.......................................................................................................... 17

About the Author 17
Introduction

Welcome to the world of Oracle triage. It's not pretty, but it's not boring either. If you are taking the time to read this paper, I suspect you have had the opportunity to experience, to some degree, the joy of intense work pressure surrounded by panic combined with the reality of a significant portion of your company's business being in serious jeopardy.

I wrote this paper to help DBAs enthusiastically triumph while performing Oracle triage and to help better prepare students for taking my Advanced Performance Management for Oracle Based Systems class [1]. The type of training required to do Oracle triage is very unique, requiring a broad range of abilities like effective communication, Oracle internals, Oracle bottleneck detection, and operating system bottleneck detection...just to name a few. Oracle triage is not your typical DBA day (hopefully) and so the required training is not typical either.

This is how I plan on helping you effectively perform Oracle triage. I'm going to introduce you to an actual triage situation and then present the core Oracle triage steps. Then I'll present the key concepts you must understand to successfully triage. Then I'll perform an actual Oracle triage using only widely available UNIX operating tools and free OraPub tools [2]. I'm hoping that with a solid step-by-step triage outline, the key concepts coverage, and an actual triage example, you will be prepared to take your initial plunge into Oracle triage.

Overview: How to perform Oracle triage

In your mind's eye live this for a few minutes... It's two weeks before Christmas and the company's massive mission critical production Oracle application just finished a significant upgrade that would be extremely painful to back out of. Within a few minutes after the application was placed back online, not only was the system as a whole obviously very slow, but specific OLTP and batch processes where taking an unacceptable period of time to complete. The pressure quickly escalated and you are just as quickly surrounded by your peers, your boss (who is now perspiring profusely), and your boss's boss. The CFO is so upset he is actually grabbing people by their ties and yelling at them! People begin to panic and act irrational. You are their savior and they are watching your every move!

As if this situation couldn't get any worse, it is complicated by the fact it is completely reactionary. No one expected this to happen and everyone is getting ready for a nice peaceful Christmas celebration. You know you have to act quickly in your problem determination, analysis, recommendations, and implementation. But it all begins to happen so fast and with such intensity, you soon begin to question every keystroke you feebly attempt to make.

So what are you going to do? Don't panic, scope out the situation, document the current performance, install your tools, develop a simple communication strategy, isolate and correlate the computing system's bottlenecks, perform a rock-solid analysis, implement your recommendations, monitor the results, and repeat the cycle until the problem is eliminated. Let me explain each of these in a little more detail.

1. Don't panic. No matter how bad you mess this up, Oracle DBAs are in such high world-wide demand you could easily get another DBA job...and probably make more money. So you got that going for you.

2. Scope out the situation. Before you begin digging into the details it is important you technically detach yourself from the situation and think like a manager. The first question you ask yourself is, "Can I do this myself?" If you're not sure, you will need to ask for reinforcements. Do not play games here. If the problem is very serious, you have every right to ask for an operating system expert, an Oracle internals expert, an Oracle SQL tuning expert, an application specialist, and someone to help shield you from all the incoming bullets that will be shot directly at you. Hopefully your manager can be your shield. In a triage situation, if you try to be a hero and do it yourself, make sure to save some time to update your resume.

3. Document the current performance. It's as simple as this: You can not prove you significantly improved performance if you don't write down what performance was before you got involved. To set yourself up for success, make sure to carefully and quantitatively document the key performance areas.

4. Install your tools. Triage requires a different type of tool kit then daily database administration. You need a simple tool kit that will allow you investigate the fundamental computing systems. This needs to be done both quickly and easily from both a detailed and a wide breadth perspective. When I speak of systems, I am referring to the Oracle subsystem, the operating subsystem, and the application subsystem. When I triage, I use my OraPub System Monitor (OSM) tool kit [2]. Since you will need to collect data for later analysis as well as look at the systems interactively, get your tools installed and gathering data as soon as possible. You will need good data to build a strong case to support your recommendations.

5. Develop a simple communication strategy. DBAs are not trained in public relations, but you will need to demonstrate some skills in this area. Unfortunately, when the situation heats up, everyone from your boss, your colleagues, your operating system vendor, your application vendor, and your Oracle sales team will be pointing their grimy little fingers directly at YOU. So be prepared. Come up with a simple, timely, and graphical way to outline your approach, your progress, and your gleaming successes.

6. Isolate the problem. Once your tools have been installed immediately begin to interactively isolate the problem. Once your historical tool kit has had some time to gather data begin to add breadth and depth to your analysis. The fastest and most complete way to isolate the problem is to perform a Holistic Problem Isolation Method (HPIM) analysis [3]. While this is discussed in more detail below, using the HPIM is the proven way to isolate the problem with nearly a zero percent risk of making a mistake. Even on massive and very complex Oracle based systems, you should be able to isolate the problem in a only a few hours.

7. Quickly perform your rock-solid analysis. I can not emphasize this enough, your analysis must bullet proof. People will be looking for a fall guy and you do not want it to be you. Make sure you use the HPIM and the session wait contention identification approach. Both of these are explained in more detail below. Your recommendations should be very concise, backed up with solid and obvious data, and make sense to nearly everyone. Remember, a weak but well understood analysis is usually better received than a solid but misunderstood analysis.

8. Quickly implement your recommendations. As soon as possible, implement your recommendations. You don't want to make mistakes, so carefully plan out exactly what you are going to do and review it. I like to have someone with me so I don't make any typergraphical errors. The more tired you are, the more likely you are to make a mistake, so it's important to have someone check your work.

9. Monitor the situation. After your changes have been implemented, monitor and document their affect. This provides a way to prove you have really improved performance, but it also sets you up for the next round of changes. It is very difficult to effectively optimize a system when changes are continually made without proper monitoring and documentation.

10. Repeat until the heat is off. It is perfectly normal to go through multiple rounds of changes and monitoring periods. Multiple carefully planned and executed cycles are far better then making a bunch of changes in a desperate attempt to improve performance. Mature DBAs show their wisdom by repeatedly making calculated changes that improve performance as opposed to a desperate attempt do everything at once.

While this all seems relatively simple on paper, it is amazing what a carefully executed Oracle triage can do in just a few short days or even hours. The trick is to not be pulled into the chaos. You have got to stay focused on your work by carefully following your method. Once you get pulled down, it is easy to get caught into an endless and hopeless cycle. However, if you follow the guidelines listed above, understand the concepts discussed below, and closely follow the case study below, you should be in a good position to tackle Oracle triage.

Key Triage Concepts

There are a handful of key performance concepts that most Oracle DBAs don't quite understand. Yet if they did, their effectiveness would radically improve. While these concepts are more fully explained in my Total Performance Management paper, I feel they are so important to successfully Oracle triage I included them below.

The Holistic Problem Isolation Method (HPIM)

The HPIM was first presented in my original 1994 Total Performance Management [3] paper. The concept is very simple, yet following this method has prevented me from making rash contention identification conclusions. This method is so fundamental to performance management, I offer a free Internet Video Seminar specifically on this topic [4].

Most performance specialists tend to focus on either the Oracle system, the operating system, or the application system. This results in an ill-defined problem definition that will translate into a lop-sided solution. In many cases, the solution, while appearing to solve the problem from one viewpoint results in overall system performance degradation.

The HPIM identifies the bottleneck in each system and then looks for their overlap or where they correlate. Within this overlap lies the first bottleneck. Identifying the bottleneck from three different perspectives and having them support each other builds an extremely powerful base for you to perform your analysis. And the risk of identifying the wrong bottleneck is substantially reduced because it has been correlated from the other two systems.

Methods of Contention Identification.

Quickly identifying Oracle system contention will provide you with more time to perform your analysis. There are two basic ways to identify contention within Oracle. The first method, known as the ratio method, basically creates ratios by placing one statistic in the numerator and another in the denominator. The data block buffer cache hit ratio is an example of a ratio. When enough ratios have been calculated over a period of time, coupled with system knowledge and ratio contention identification experience, they will direct one towards where the Oracle contention resides.

A superior method of Oracle contention identification is performed by querying the various session wait performance views. Identifying Oracle contention using the session wait views is so fundamental to quickly identifying contention, I wrote a paper specifically about using the session wait views [5] and offer an Internet Video Seminar on the topic [6]. The paper is freely available on OraPub's web site. The session wait views exclaim specific Oracle contention for the entire system and for specific processes. For example, the session wait views could show session 645 is waiting for a specific database block because of a full table scan. With session wait information you can quickly identify Oracle contention.

As a side note, most Oracle related publications spend an enormous amount of paper describing the authors expertise using the ratio method of contention identification, when a good basic understanding of Oracle architecture combined with an introduction to the session wait views will prove much, much more beneficial.

The Decreasing Relevance of Increasing Data

Remember when you were a Freshman in college and it was easy to swing your grade point average either up or down? Or, when you were a Senior and it was nearly impossible to bring your grade point average up very much? It is because historical activities locked you into the past and did not reflect the current reality. When looking at Oracle performance data, one can get caught into this same trap.

Because nearly all Oracle performance views are reset when the Oracle instance re-starts, after a few days of counter incrementation[1], what would radically change a ratio becomes increasingly non-reflective of current activity.

Consider the Oracle data block cache hit ratio (DBCHR). Suppose the instance has been running for 30 minutes with an 70% DBCHR. The 70% is the average since the instance has started, not the average the past 15 minutes. The average over the last five minutes could have been 95%, but the data from the first five minutes has masked the more recent data. So when looking at many "v$" related statistics, make sure you know whether you are looking at data from the last x minutes or since the instance has started.

Understanding "Deltas"

A startling example of deltas can be seen when one runs a "top SQL" report at 10am each morning. Will the "top SQL" reflect intensive nightly batch processing or will it reflect business morning activity? The answer is, there is a good chance the "top SQL" will be skewed towards reflecting late night activity. The way to get around this is to periodically gather "v$" data and display the change from one time period to the next. I call this change a delta. Deltas highlight more recent activity and provide a much better performance indicator. Good tools, such as the OraPub System Monitor (OSM) tool kit, will always use deltas whenever possible.

Quickly determine which series is increasing and which is decreasing.

154833232, 154833357, 154833437, 154834889

154833149, 154833271, 154933200, 154933240

Actually they both are increasing, but it takes sometime to make this determination. Part of performance triage is quickly doing things. We'll try anything to increase our ability spot trends or notices significant changes in data. Once way to do this is display the differences in two data points, that is the deltas, instead of the displaying the raw data. When looking at six or twelve digit numbers, it is very difficult for our eyes and minds to pick up trends and find when the trend peaks. However, by looking at the deltas, our eyes will quickly notice these trends. Usually the deltas are more important to us than the raw data. Good performance triage tools will show always display deltas.

Two Basic Approaches to Gather Performance Data

An Oracle performance tool will either look at historical data or at current activity. Each approach has significant differences that when used properly will improve Oracle triage.

First, let me further define the difference. Most DBAs have a set of SQL scripts they frequently use. These tools usually will directly reference the system as it is currently running. For example, doing a select count(*) from v$session to determine the current number of Oracle sessions is referencing current system activity. I typically call this the interactive approach. Contrast this with tools that periodically gather and store data for latter retrieval and analysis. I typically call this the historical approach. Both approaches offer distinct advantages compared to the other when used properly.

The interactive approach allows one to quickly dive down into excruciating detail. Detail that most historical based tools don't capture, like v$session_wait data. However, in a triage environment, things can happen so fast, you will miss important activity while involved in something else. With historical data saved, you can look at your data in a different way or follow different performance analysis paths. The historical approach also offers the ability to highlight trends and overall system activity and contention. The historical approach also allows historical analysis in a way that was not thought of during crunch-time triage. It is important that your tools support both the interactive and the historical based approach. There are many tools on the market today which meet this requirement. OraPub's OSM tool kit supports both interactive and historical approaches.

Choosing the Appropriate Data Gathering Frequency

Now that you decided to gather historical data, the next question is, "How often should the data be gathered?" I have trained myself to look at data gathered with a frequency between 30 to 60 minutes. Any longer than 60 minutes and I can miss out on an important event but any less than 30 minutes, I will have pages and pages of data to sift through making it more difficult to spot a trend. Or I'll end up staring at my computer screen for hours watching amazing animated graphics and mumbling words like, "Wow! Did you see that line jump?" or "I wonder how this would look if it had a green background?" Any way, an important consideration is that gathering, storing, and retrieving tons of data puts a load on the various computing systems involved. It's also not real exciting when one of your performance tool SQL statements shows up as the most resource intensive statements...

Total Performance Management (TPM)

I first published the Total Performance Management (TPM) [3] method in 1994 at an Oracle Applications User Group (OAUG) conference. It has since been one of the most frequently downloaded technical papers on OraPub's web site. It maintained its usefulness because it is a method about how to quickly transform an Oracle environment characterized by explosive surprises and poor performance into an environment where users do not even think about performance...it is just there.

To quickly summarize the TPM method, it is comprised of three phases. The audit, tuning, and the proactive maintenance phase. The audit phase focuses on scoping out the situation and determining the overall approach to solving the problem. The tuning phase is an iterative process that over the course of many cycles can quickly eliminate performance problems allowing the DBA to break-out of the performance is never good enough cycle. Once the DBA does break out of the tuning phase, and into the proactive maintenance phase, time is spent putting tools, processes, and communication pieces in place to thwart future performance threats.

Case Study: Performing Oracle Triage

Below is a relatively simple triage situation. However, the same methods and tools can be used in any size situation. I have written this section like a DBA might write an email or log as he or she makes notes during triage. I also followed the triage method outlined earlier in this paper. So here we go!

Don't panic

OK. I'm not panicking but I'm not dancing either...at least not dancing for joy. I realize that as a DBA my skills are in great demand so I'll try to enjoy this escapade.

Scope out the current situation

After walking around a bit, talking with a few people, and poking around on the system I've determined:

• The main system having problems is the company public web site.

• This is an e-commerce company with no standard retail outlet.

• The marketing department is complaining the most.

• During peak hours, the web site receives over 2000 hits each minute.

• Each hit is recorded in a log file (actually an Oracle table).

• The system is so slow, ten Marketing Analysts were sent home yesterday.

• The Marketing Director is livid because he is putting together a new targeted marketing campaign and must run a number of reports to properly prepare and execute the campaign.

• The Marketing Director and the Information Systems director are good friends.

• The UNIX System Administrators feel the Oracle application the cause of this problem.

• The Oracle DBAs feel the operating system is poorly configured thereby constraining Oracle.

• The initial bottleneck appears to be I/O.

;

After taking this all in, I've decided to form a Triage Team consisting of a SQL tuning, operating system, Oracle internals, and Marketing Department user expert. With these folks, I'm hoping to quickly isolate and confirm the bottleneck(s). After my analysis, I'm hoping this team will quickly reach consensus on our first round tuning recommendations.

I'm also setting expectations that this could take a week to resolve and even longer if the problem turns out to be extremely complex requiring a team of people to resolve or an additional system capacity purchase.

Document the current performance

My Marketing Department user told me there is one key on-line marketing query which all the analysts run multiple times a day. This query provides an on-line summary of hourly web site activity broken down by each web site area's page type. The on-line report can be easily executed by a simple click. Usually the query returns in around 10 seconds, but this past week it is taking around 3 minutes.

I had the Marketing Department user actually run the report for me on my computer three times. We timed the runs on our watches and they took 193, 177, and 210 seconds. That's an average of 197 seconds or 3.28 minutes.

I also heard that web site activity has substantially increased. I checked with one of our UNIX System Administrators and he said that during peak we received around 2000 hits each minute and around 720000 hits per day. I'm going to verify this with the Oracle DBA because each web hit should be stored in the database.

Install your tools

I installed OraPub's OSM tools [2]; both the interactive and the historical tool kits. I modified the historical tool kit's driver script, rock, to enable the gathering of the Oracle "v$" views and operating system cpu, memory, disk I/O, and network activity. I didn't have to write any scripts to gather web site hit activity because it's already recorded in the web log. I did modify the key marketing report so it inserts a line into a log file every time the report is run. The line includes both the start and stop time, so I can calculate response time. I'll use this to document triage success. I set the OSM tools to gather data every 30 minutes.

Develop and simple communications strategy

To minimize triage distraction, I created one simple yet key communication graphic. By hour, it shows the number of web site hits, the number of marketing queries run, and the average marketing query response time. This report will show if query response time correlates with web site and marketing query activity...which I suspect it does. I am also creating a brief status email twice a day that I'll email to anyone who asks. The email consists of a two sentence summary paragraph followed by more details if appropriate. I'm also going to host an open meeting every morning at 10am to anyone who is interested. I'm hoping with my frequent but brief communications, combined with an open attitude about the situation, trust will develop between the triage group and the suffering users. (I think I really do care.)

Isolate the problem

Since I just installed the OSM tool kit, I don't have an enough historical data to review. So I ran some of the interactive reports to quickly isolate the bottleneck; either cpu, memory, i/o, or network. The triage chart in figure 1 outlines the various tools I used, organized by their areas of investigation, followed by my comments. After there was a reasonable amount of historical data to review, I went back and updated the triage chart (figure 1).

Ruminating on the data I gathered and my comments (as shown in figure 1), there is an obvious I/O bottleneck caused by both the marketing reports and the increased web site activity.

Quickly perform your rock-solid analysis

Summary and Recommendations. The computing system is seriously I/O bound as the result of increased web site activity, intensive and repetitive marketing queries, and potentially un-optimized SQL. However, there appears to be a number of items, when combined overall system performance as well as marketing query response time should dramatically improve. While a more detailed analysis is below, here are my recommendations.

1. Move the HITS table into its own Oracle tablespace.

2. Move the HITS tablespace to a non-busy RAID 0+1 array.

3. Move TEMP tablespace database files to a non-busy RAID 0+1 array.

4. Tune the marketing query SQL.

5. Reduce marketing query executions by having the query automatically run every 15 minutes and allowing the results to be instantly viewed via a standard web browser.

6. Increase Oracle's data block buffer cache to better absorb activity busts.

Any one of these recommendations should substantially improve performance, but combined I anticipate a dramatic performance improvement.

Operating System Performance Analysis. There is a clear I/O bottleneck supported by CPUs waiting for I/O before processes can be run, no memory paging or swapping, no network collisions and latency problems, and four extremely active disks. The hot disks are part of the same RAID array and contain the web log (an Oracle table). There are many other disks which are less than 10% busy, so there is an opportunity to spread out the I/O across more disks or simply place the heavy I/O files on a different RAID array. I'll be talking with my O/S expert and administrator about this.

Oracle System Performance Analysis. Oracle sessions are waiting predominately for I/O because of full-table scans on the HITS tables. In fact, the MKTG tablespace and related database files, where the HITS table resides, are significantly more active than the other database files. The MKTG tablespace is also heavily full-table scanned, supporting my wait event analysis. Because each web page is logged, that is, written to the HITS table, when web site activity peaks there is so much insert activity, the server processes have to wait to get a free block in the buffer cache. This is exacerbated because of the MKTG reports, which are also looking for free blocks to fill with HITS blocks used for the marketing queries.

My initial approach to improve performance in this area is to increase the database block buffer cache and to isolate the HITS table into its own tablespace which will reside on a very non-active RAID array. Since there is plenty of memory, I will increase the database block buffer cache to better absorb bursty buffer activity thereby increasing the possibility of more free buffers. This is a short-term solution.

Other key contention possibilities like latching contention and enqueue contention are not an issue at this point. Once we deal with the immediate problems, they could raise their ugly heads.

Another interesting point is because of the MTKG query related full-table scans, there is a tremendous amount of temporary tablespace activity. However, I suspect that this will be substantially reduced once the MTKG query is tuned. But before that happens, I'm going to move the TEMP tablespace data files to a very non-busy RAID array.

Application System Performance Analysis. The MKTG and the web server processes and by far the most resource consuming processes. I looked at the SQL for both processes. The SQL is fairly straightforward and I suspect our SQL tuning expert can significantly reduce the number of I/Os the MKTG query requires (I hope so any way!).

I also can't figure out why so many marketing queries are being run. It seems silly and a waste of resources. I'm going to talk with the Marketing Director about setting up some process where the key marketing query will be run every 15 minutes and written to a text file where any web browser can instantly look at it.

Doing a little math: During the peak bottleneck time (around 2pm), there are 12 marketing queries being run simultaneously. They each touch around 26000 Oracle blocks and take an average of 197 seconds to run. So on average, within 197 seconds there are 312000 Oracle blocks touched. Our database consists of 8KB blocks, so this means 2496000 KB of data is touched every 197 seconds. The data block buffer cache hit ratio during peak is around 90% and the UNIX buffer cache ratio is also around 90% resulting in only 24960 KB of data being physically read every 197 seconds. Doing a little more math, this means that only 126.7 KB/sec is physically read from oxide. Even with the associated insert activity, any descent RAID 0+1 array (which we have) should be able to handle the load. Moving the HITS table to its own non-busy RAID array should dramatically improve I/O response times allowing some time for the SQL to be tuned!

Once the "15 minute marketing query" process is in place, the HITS table is moved to a non-busy RAID 0+1 array, and the query is tuned, I anticipate a dramatic response time improvement.

Quickly implement your recommendations

I meet with both the Marketing Director and the lead UNIX System Administrator. The meetings were very profitable. The Marketing Director agreed to my "15 minute marketing query" idea and the UNIX Administrator agreed to provide virtually idle RAID 0+1 arrays for the HITS and TEMP tablespaces. The SQL tuning expert is already working on the marketing query. The lowest system activity occurs between 2am and 6am, so that's my window of opportunity.

Here's the plan:

• The SQL tuning expert is already working on the marketing query. As soon as it is optimized, it can be placed easily into production. We should experience absolutely no associated downtime.

• Before 2am:

• Because of advanced volume management tools, the UNIX Administrator will move existing files around resulting in the creation of two idle RAID 0+1 arrays. There should be absolutely no associated downtime.

• Once the UNIX Administrator has one of the RAID arrays ready, I'll create a new temporary tablespace, alter all Oracle users to point to the new temporary tablespace, then remove the old temporary tablespace. There should be absolutely no associated downtime.

• I created a script to quickly move the HITS tablespace and its associated database files to the second non-busy RAID array. There should be no absolutely no associated downtime.

• At 2am:

• I will run the script to move the HITS tablespace and all its associated database files. I anticipate 5 minutes of downtime. If the move fails, the system will be back up within 10 minutes and a worst case scenario of 60 minutes of downtime (database file level point-in-time recovery).

• Next day:

• I will work with my application specialists to begin designing the "15 minute marketing query" system. Because of the concept and integration simplicity, I suspect this project will take three days complete.

Monitor the situation

All my "2am" recommendations have gone into effect and I'm working with the applications specialists to complete the "15 minute marketing query" solution. This is going very well and should be implemented in a couple of days. While the SQL tuning expert has improved query response time (50% logical I/O reduction), it is not what I'm expecting. I'm hoping we get, at a minimum, a 90% logical I/O reduction.

I performed an interactive and historical performance analysis just as I did in the Isolate The Problem triage phase. I won't show all the details here, but the situation has substantially improved and the bottleneck has shifted (as I expected it would). The bottleneck is still I/O, but it is now associated with the e-commerce application itself, not the activity logging or the associated marketing component.

I had the Marketing Director come over to my computer and we re-ran the "3 minute" query again and it took an average of 57 seconds. He was pleased but expecting sub-second response. I gently explained the SQL is still being tuned and the "15 minute marketing query" solution has not yet been implemented. And when either one of these items is implemented, I am expecting a less than three second response time. He was pleased with that.

So at this point the heat is off! I'm still working with the SQL tuning expert and the application expert, but the pressure has substantially subsided, business is booming, and people don't think about performance so much anymore. Time to get some sleep...zzzz

Concluding thoughts

Oracle performance triage is an exciting place to be, but no one can live there all the time. I'm hoping through this writing of my research, experience, and conversations, I have been able to accurately convey how to appropriately triage. How successful I am will be when you are faced with a triage situation and have successfully applied the concepts, methods, and tools described in this paper. If this paper has been useful to you, please let me know. Feel free to email me and let me know how this paper has helped you or any other comments you may have to improve its usefulness. Thank you for taking the time to read this paper.
Figures

Figure 1. Oracle Triage Chart. This is an example chart that can be used as a check list to triage any Oracle based system. While additional areas can and will probably need to be investigated, the above chart will bring-out those additional areas. This specific chart is being used for the discussed case study.
References

1. "Advanced Performance Management For Oracle Based Systems" Class Notes (2001). OraPub, Inc., http://www.orapub.com

2. "OraPub System Monitor (OSM)" tool kit (2001). OraPub, Inc., http://www.orapub.com

3. Shallahamer, Craig A. (1995). Total Performance Management. Published and presented at various Oracle related conferences world-wide. http://www.orapub.com

4. Shallahamer, Craig A. (2000). Holistic Problem Isolation Method. OraPub Internet Video Seminar. http://www.orapub.com

5. Shallahamer, Craig A. (1999). Direct Contention Identification Using Oracle's Session Wait Views. Published and presented at various Oracle related conferences world-wide. http://www.orapub.com

6. Shallahamer, Craig A. (1999). Direct Contention Identification Using Oracle's Session Wait Views. OraPub Internet Video Seminar. http://www.orapub.com

Acknowledgments

A special thanks to my clients and students who have brought forth a plethora of stimulating discussions and challenging dilemmas. These situations coupled with my unusual enthusiasm to Oracle triage has evolved into this technical paper.

About the Author

Mr. Shallahamer's seventeen-plus years of experience in the IT marketplace brings a unique balance of controlled creativity to any person, team, or classroom. As the President of OraPub, Inc., his objective is to empower Oracle performance specialists and capacity planners. His specializations include "doing" and teaching other to "do" whole system performance optimization and capacity planning for Oracle based systems. Mr. Shallahamer authors and teaches both of OraPub's key courses; Advanced Performance Management For Oracle Based Systems and Capacity Planning - Performance Modeling & Prediction. In addition to course development and delivery, Mr. Shallahamer consults, is helping to develop a landmark performance management product, technically reviews Oracle books, and is involved with starting an Oracle research center in Cairo, Egypt. Since starting to work for Oracle Corporation in 1989 and departing in 1998, Mr. Shallahamer directed the global technical training efforts for Oracle Consulting's technical consultants, directed the Western area Oracle Services System Performance Group (SPG), co-founded three highly respected technical consulting groups (National Product Specialist Team, Core Technologies, System Performance Group), and has worked at hundreds of client sites in North America, South America, Western Europe, Eastern Europe, the Middle East, and Eastern Asia. As a result, Mr. Shallahamer has had the pleasure to publish and present a number of papers at the EOUG-E/ME, OAUG-A/E, IOUG-A, OpenWorld, and in Oracle Magazine. Mr. Shallahamer can be contacted by email at [email protected] .

[1]I don't think incrementation is a real word, but it seems to fit nicely here.

About the Author 17 Introduction