Solaris Linux iostat in DBAs view

Background

When troubleshooting  any database performance issue, we need to understand that it can happen in two possible sequence,

1. Additional large utilization (load) on database server causing the delay in normal database operation. Which can be from newly added application component, query that is running abnormally (e.g plan change),  new parallel execution etc..

2. A change or suboptimal operation on underlying hardware/software layer that causes the default load of the database to take more time than it usually does. Here database system is not the culprit but the victim. It is trying its best to perform as it used to, but a bottleneck in different layer that is not in the control of database is causing the delay.

Example

Below is a demonstration using iostat command in Solaris environment running Oracle database, which shows the metric differences while database system is experiencing performance delays. In both situation the result is the same, distorted db performance where queries taking more time to execute, transaction timing out in applications etc.. but the cause leading to the result is completely different.

Case 1 - iostat during new query executing with parallel degree of 24

solaris iostat output explained

Here in highlighted part of the iostat output, we can observe very large amount of read I/O (kr/s) with average service time +35 ms (asvc_t) and disk busy percentage (%b) around 100%.
All these iostat outputs indicate overloaded disk sub system, resulted from IO heavy query running with parallel 24.
So in this type of scenario next steps for resolution would be to look at query optimization reducing the actual IO load. 

Case 2 - iostat during delay failure of SAN storage controller affecting set of disks


Linux iostat output explained


Here when we look at the highlighted outputs, we see almost zero IO read requests (kr/s) or IO write requests (kw/s) but still some disks show busy percentage (%b) of100%. Some queries were impacted as they were not completing.
Here when providing the solution we need to understand that database is impacted by underlying infrastructure and need to widen troubleshooting beyond the database. Above example iostat output was during a SAN storage controller failure which impacted some of the disks.

Comments

Popular posts from this blog

ORA-16433: The database or pluggable database must be opened in read/write

Wait for unread message on broadcast channel - Blocking Sessions

ORA-14126 - While splitting max value partition