There are plenty of articles, blogs and write ups available to explain what Automatic Workload Repository aka AWR report is and what information does it provide. I have read a few and found very useful. AWR is an ocean of information and may be quite overwhelming sometime. If you are trying to find out the meaning of each topic and sub topic, Oracle would be your one stop solution (well .. not really !!!). But if you are looking at AWR report in order to analyse Oracle performance and its impact on overall application performance, then you need a planned approach to explore these statistics.
Before diving into analysis, I want to reiterate few known facts.
- The data in an AWR report is the delta, or changes, between the accumulated metrics within each snapshot.
- An AWR contains 3 primary sections – (1) Oracle Configuration Details (2) Performance Statistics and (3) Advisory.
- Any snapshot over 60 0mins may not be ideal for performance analysis (can be changed though) as it can hide the actual issue.
- The default Snapshot retention period is 7 days (can be changed though).
In general, the types of problems observed in Oracle database are –
- The database is slow (period)
- It works fine for most cases but brings to halt for specific (one or few) business scenarios.
- One or few SQL take(s) forever.
So your plan should be different when tacking different problems.
In this article, I have tried to explain a generic approach to start analysing Oracle database performance by looking at the statistics available on AWR report. So the sequence of topics will not be as same as what you see in AWR report.
To begin analysis, I would start by looking at the Snapshot details. You will not find any performance statistics here but you need to make sure that you are looking at the right data. Few checks from the section include –
- Start and End snapshot details. If you are comparing with another AWR report, please make sure you are comparing it right.
- The elapsed time between the snapshots.
- Server details e.g. Memory, number of CPUs, Oracle Database version etc.
Elapsed Time & Database Time
Elapsed time represents simply the time interval between the ‘begin snap id’ and ‘end snap id’. DB time is equal to CPU time in processing + wait time. If the DB time is much greater than elapsed time drill down further in the report to understand the wait events.
In this example, Elapsed Time = 60 mins.
Therefore, Total CPU time = 60 mins * <number of CPUs> =60 mins * 16 CPU = 960 mins.
This means the Database has consumed = 292.24 mins out of 960 mins i.e. (292.24/960) *100 = 30.44% CPU (average for the entire duration).
Is this bad? Not really, if you have executed a performance test and stressed the system; this is quite expected.
However, if you have executed a load test that was not supposed to stress the system (and probably < 20-30% of the actual load) then yes, it is possibly a problem. [Please remember, extrapolation is not a good practice and could be completely misleading. read more about it here]
Then it is about sessions. What is session? Well, a session represents the connection that stores its persistent objects like transactions, caching etc between an application and the database.
In this report, we can see the number of sessions at the beginning was 198 and then gradually came down to 158. This means – the database was already busy even before this AWR report start time and it gradually calmed down. So you will tend to see more activity at the beginning of the session ~15:00 + 15 mins rather at the end 16:00 – 15 mins. It is probably a good idea to find out the nature of load in the beginning of the snapshot and at the end of the snapshot.
And the database has (=292.24/60.04) = ~5 per active sessions on the average for the entire period. Again this is not bad; however please keep an eye on the number of active sessions.
If you have executed a load test and the tests and it finished within this 60 mins; then I would expect to see the number of active session goes down to a minimum and not remains at 158. To confirm, please check the number of active session in the next snapshot.
Then it is Load Profile. There are two primary reasons for looking at load profile in the beginning.
- If you are comparing one AWR report with another, then load profile will help you understand that you have executed exact same/ similar test on your application [and there is no unexpected activity impacted the database performance during that period. Example – database backup job]. I am not saying the number will be exactly same but they should be in an expected range. If that is not the case, then you must find out the changes implemented either in the test you are running or in the system (application or database).
- To analyse using the performance statistics.
In my earlier post about analysing PostgreSQL server performance, I have created a video; the concept represented in the video is same for across all database. In every application, a SQL is sent to database; the database then receives the SQL and parses in its own understandable language, executes to extract required data and then sends the data back to the application. AWR essentially provides lots of information about all these stages but categorises them differently.
In the load profile section, you will find all the statistics for the duration between database receiving a query and sending all data back to the application.
You can see the DB Time in snapshot details is 292.24 mins = 17534.4 sec.
The DB worked 17534.4 sec within 60 mins i.e. 3600 sec (snapshot interval) without considering any background processing work.
Therefore, it has worked = 17534.4 / 3600 = 4.9 per sec.
In this example, it is definitely not bad.
Let’s talk about Parser then. What is parser? It is Oracles way of validating the incoming request and then preparing to serve the request. The general rule of is – hard parsing is bad and soft parsing is good. Although it is quite difficult to put a fix number against each of these types, but anything more than 10% hard parsing is definitely bad. So what is the difference between these two types of parsing?
You provide a SQL to and request Oracle to execute a query and give records back. Oracle first tries to find out if it already has executed similar queries, because if it has then it does not have to load the query to shared memory pool (therefore saves a lot of work) – this is your soft parsing. But if it finds out that it is a completely new query, then it has to first load the query to shared memory pool. This is a hard parsing.
In AWR report, Soft parses are not listed but derived by subtracting the hard parses from parses. A soft parse reuses a previous hard parse; hence it consumes far fewer resources.
In this example, a total of 96.4 parsing in a sec took place and 1 out of them was hard parsing; so 99% was soft parsing.
This is definitely good.
But please be careful, (as I mentioned) the opposite is not always true i.e. 99% soft parsing is good performance but good performance does not necessarily mean 99% soft parsing.
It is Oracles way to go back to previous state if necessary. Oracle keeps a record of every change in this REDO log until a commit. UPDATE, INSERT type of queries always causes to generate a huge REDO logs while SELECT statement sometimes does it too (may be in a lower scale though).
So what should be the optimum size? Well, that is a difficult one. But a high REDO logs indicates a lot of changes taking place in the database.
In this example, Oracle is generating ~1,230 KB of logs per sec.
This means, if you are expecting a similar load on the system 24 hrs a day, then you are probably talking about 1230*24 *3600 = ~100GB of REDO logs.
This seems high. [Please don’t forget to check your Database size and calculate the relative growth]
Also, there are only 6 transactions took place within the period and each of these transactions has generated about 205KB of REDO logs i.e. (=205077* 6) 1,230,457 bytes per sec. This is definitely high.
Check the database size before running any more tests and compare the size increase. Discuss with developers to find out the types of query executed during this test; if these queries predominantly are SELECT type and not many UPDATE or INSERT then this could be a problem.
Oracle will increase the counter whenever the application makes a request. User session Login, soft and hard parsing, physical reads and physical writes – everything is accounted. This means, if you increase the load on the system you should expect to see more Users Calls.
Reads (Logical & Physical)
These two are interesting statistics. So what is read? It is when user (i.e. the application) has requested (user calls) for data, Oracle has validated and transformed (i.e. soft and hard parsing) and then started to extract the data.
As a process, Oracle will try to see if the data is available in Buffer cache i.e. SGA? If it does, then logical read increases to 1.
To explain a bit further, if Oracle gets the data in a block which is consistent with a given point in time, then a counter name “Consistent Gets” increases to 1.
But if the data is found in current mode, that is, the most up-to-date copy of the data in that block, as it is right now or currently then it increases a different counter name “db block Gets”.
Therefore, a Logical read is calculated as = Total number of “Consistent Gets” + Total number of “db block gets”. These two specific values can be observed in ‘Instance Activity Stats’ section.
But if it Oracle does not find the data in buffer cache, then it reads it from physical block and increases then Physical read count to 1.
Clearly, buffer get is less expensive than physical read because database has to work harder (and more) to get the data. Basically time it would have taken if available in buffer cache + time actually taken to find out from physical block.
In this example, a total of 556,323,564 Consistent Gets and 16,162,346 DB Block Gets took place;
Therefore, Logical reads per sec = (556,323,564+ 16,162,346)/3600 sec) = 158,912.5.
Is this good? May be not!
Please remember, lesser physical read compared to Logical read is good performance but too many logical read is definitely not good.
In this case, the distribution is –
(1) Physical Read = =(333.1/159245.6)*100 = 0.2% and
(2) Logical Read = 99.8%.
This shows a well allocated SGA memory and that is definitely good; however the fact that 158K reads in a sec is not good. You need to explore the report to find out the SQL(s) causing this high logical reads.
Transactions (and Rollbacks) & Executes
What is a transaction in Oracle? Transaction in Oracle is simply the total of number of commits and rollbacks.
In this example, the database has rolled back 360 operation in 60 mins (=0.1 * 3600 sec) and committed 21,240 operations (= Total Transactions – Roll backs i.e. = 6*3600 – 360).
Remember, Rollbacks are expensive operations.
So if the question is whether any of these numbers are good or bad?
Well, to answer the question first try to find out about those rollbacks. Speak to DBA and Developers to find out the root cause for those rollbacks. Sometime even a small increase in rollback can impact the database performance significantly.
Now about the number of commits, I would not necessarily say good or bad; rather I will use this number to direct analysis and investigation further.
Please remember, the important aspect is not just the number of transactions, you need to calculate how many queries Oracle has executed (average) per transactions.
This can be calculated as Total execution per sec / number of transactions per sec.
In this example, oracle has executed =311.5/6= ~52 SQLs per transactions. Although there is no absolute boundary to decide low or high; but I would say this is quite high. You need to explore the section detailing about number of SQLs executed most.
I am still working on the other sections as described in the diagram in the beginning and will try to publish as soon as possible.