Server Load on Linux Systems and Its Relationship to Database Performance

 What Is Server Load?

Server load is basically a measure of the amount of computational work a system is performing. In Linux, the system load primarily reflects the demand for the CPU’s resources over time. This measurement includes both the processes currently being executed by the CPU and those waiting for CPU access. The output of server load is typically represented as three numbers, which gives the average load over the last 1, 5, and 15 minutes.

  • 1 minute load average: Immediate snapshot of the CPU demand.
  • 5 minute load average: Mid term trend in resource usage.
  • 15 minute load average: Long term overview of system load

Higher numbers indicate higher CPU demand but related to the number of CPU cores of the server. For example, a load of 4 on a system with 4 cores means the system is running at capacity. On the other hand, a load of 8 on the same system indicates that processes are waiting for CPU time, which could signal a performance bottleneck.

Run Queue and Block Queue and the relation to database load

  • Run Queue: This refers to the list of processes waiting to be executed by the CPU. A high run queue indicates that many processes are competing for CPU time.. In a database server slow queries, large number of concurrent connections or highly parallelized tasks can contribute to the run queue if the system is under resource pressure.

  • Block Queue: This includes processes waiting for I/O (input/output) operations, such as disk or network access. Database operations completely depends on reading or writing to storage and  poor disk performance or heavy I/O load can increase the block queue. If the database is waiting for data to be written or retrieved from storage, it will stay in the block queue, further delaying query execution.

These two queues directly impact the perceived load of a Linux server. In database systems, high load due to CPU contention or I/O delays can lead to cascading slow query responses, application timeouts, or even crashes in extreme cases.

Long Running (Slow) query and Server Load

Long running processes can contribute to both the run queue and block queue and these processes could be heavy queries, batch jobs, or background processes that consume CPU, memory and I/O resources over extended periods. For example:

  • A poorly optimized query (unindexed, heavy joins) may execute continuously, using CPU cycles and I/O bandwidth.
  • Scheduled jobs that process large amounts of data could compete for system resources, increasing overall system load.
  • Large number of inactive server side process or threads (those that have finished their work in DB but not closed by client or waiting in pool) can contribute to load.

Monitoring Server Load Using top and vmstat

Two essential tools to monitor server load  are top and vmstat. These might not be directly available on database that runs as a cloud service (DBaaS) such as AWS RDS or GCP CloudSQL, but these offerings still provide metrices that can be used to identify related system load parameters. Hence understanding the fundamental server load characteristics does help in both Cloud and on-premise. 

1. Monitoring with top

top provides real-time overview of system processes including CPU usage, memory usage, and server load averages. It does provide averages in different time frames (1 min, 5 min and 15 min) and it helps in identifying which processes are consuming the most resources.

  • Key Metrics to Look At:

    • Load averages: As discussed earlier, this shows the average CPU load over the last 1, 5, and 15 minutes.
    • %CPU: Displays the percentage of CPU time each process is consuming. Helps in identifying  heavy process which are hogging CPU resources.
    • %MEM: Shows memory usage. Processes consuming a large portion of memory can cause the system to swap, increasing I/O load and impacting performance.

    Example Output:

    yaml
    top - 15:44:30 up 10 days, 5:23, 3 users, load average: 1.58, 1.44, 1.36
    Tasks: 209 total, 1 running, 208 sleeping, 0 stopped, 0 zombie
    %Cpu(s): 2.5 us, 1.2 sy, 0.0 ni, 96.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
    KiB Mem : 8046120 total, 2672088 free, 1008088 used, 4365944 buff/cache
    KiB Swap: 3991552 total, 3991552 free, 0 used. 6228084 avail Mem
2. Monitoring with vmstat

vmstat provides more detailed information about system performance, particularly CPU queues, memory swapping, I/O utilization and I/O performance. Unlike top, which focuses on real-time process information, vmstat provides matrices over a specified interval, making it easier to spot trends. The metrices will average out with higher intervals.

  • Key Metrics to Look At:

    • r (run queue length): Shows the number of processes waiting for CPU time. A consistently high number comapred to avialble CPU cores in the server points to CPU bottlenecks.
    • b (block queue length): Shows the number of processes waiting for I/O. A high block queue suggests that the system is I/O-bound.
    • si/so (swap-in/swap-out): Indicates the rate at which the system is swapping memory to disk. High swap activity can slow down overall performance.

    Example Command:

    vmstat 1

    Example Output:

    css
    procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
    r b swpd free buff cache si so bi bo in cs us sy id wa st
    1 0 0 1594208 147908 4161960 0 0 39 40 20 200 2 1 97 0 0
    0 0 0 1592208 147908 4161960 0 0 0 5 20 30 1 0 99 0 0

In the example above, we see the run queue length (r) is low, indicating that the CPU is keeping up with demand. However, in scenarios where r and b are high, you would need to investigate further, potentially checking long-running database queries, system processes, or disk performance issues.

Conclusion

Server load on Linux systems is a key metric when evaluating database performance. Understanding how run queues and block queues correlate with CPU and I/O bottlenecks helps in diagnosing whether the database is struggling with internal workload or external system issues. Tools like top and vmstat provide valuable insights into system resource utilization, allowing for targeted optimizations to improve performance.

Comments

Popular posts from this blog

ORA-16433: The database or pluggable database must be opened in read/write

Oracle Multitenant - Create new service for PDB using DBMS_SERVICE

ORA-30013 undo Tablespace currently in use during PDB open