Search This Blog

Monday, May 16, 2011

ESX Service Console-2

Checking for resource starvation of the ESX Service Console

Symptoms

  • High CPU utilization on an ESX host
  • High memory utilization on an ESX host
  • Slow response when administering an ESX host

Purpose

For troubleshooting purposes it may be necessary to check if any processes are consuming a substantial amount of resources on the service console. Processes consuming a substantial amount of resources can prevent correct operation of the ESX system. This article provides you with the steps to check for starvation of resources on the ESX host service console.

Resolution

Introduction to performance monitoring

If any process is utilizing a substantial amount of CPU or memory on your ESX host service console it can prevent correct operation of the system. ESX includes the top utility to be able to check for resource utilization on the service console. It can be used to view the current values for the statistics and to determine if there is starvation of resources on the ESX host service console.
 
To check the utilization of the processes on the service console:
  1. Log in to your ESX host service console as root from either an SSH session or directly from the console of the server.
  2. Type top .
  3. To exit top, press Q.
  4. When you have finished reviewing the output, type logout and press Enter to exit the system.
This screen appears and shows the resource utilization and running processes on the server:

Checking for CPU Starvation of an ESX host

The statistics you must review are load average and CPU Idle. These statistics provide an overall indication of how busy the ESX host is.
 
Load average is a measurement of the number of processes that currently waiting in the run-queue plus the number of processes that being executed for one, five, and 15 minute intervals. A load average of 1.00 means that the ESX host machine's physical CPUs are fully utilized, and a load average of 0.5 indicates they are half utilized. A load average of 2.00 indicates that the system is busy.  If the load average is over 4.00, the system is heavily utilized and performance is be impacted.
 
This screen indicates that the ESX Service Console does not have a queue of tasks waiting to process:
 
 
A load average similar to this screen indicates that tasks are waiting in the run queue to be processed:
 
 
The CPU state counters provide an overview of the CPU utilization in each state on the system. This screen shows a system with a high CPU idle percentage. A high CPU idle means that the system not busy:
 
 
If the CPU idle counter output is low, investigate into which state is consuming the CPU time. The different states mean:
  • User is the percentage of the processor time used for running user processes, such as an application.
  • Nice is percentage of the processor time used for a user process that is running with an altered scheduling priority.
  • System is the percentage of the processor time used for a system process, such as kernel or driver calls.
  • Irq is the percentage of the processor time used for hardware interrupt requests.
  • Softirq is the percentage of the processor time used for software interrupt requests.
  • Iowait is the percentage of the processor time waiting on the completion of disk Input/Output.
  • Idle is the percentage of the processor time that processors are free.
This screen shows the CPU idle state at 0%:
 
 
The CPU time is being consumed in the iowait state. If the CPU time is being consumed in the iowait state, check the disk subsystem to determine what is causing the delay in response from the storage subsystem.
 
Note: If the CPU time is being consumed in the user state, you can determine the process that is consuming the CPU from the list of tasks below the statistics. The list of tasks refreshes every few seconds to provide an updated view of the process list. In the following example vmware-hostd is consuming 0.9% of the available CPU:

Checking for Memory Starvation of an ESX host

Memory and swap are the statistics you need to review. These statistics provide an overall indication of how much memory is being used and if there is heavy swapping occurring on the system. This screen shows an example of the expected output:
 
 
The example above indicates that there is 268248KB (268MB) of RAM in the system and that 84864KB (85MB) is free. There is 554168KB (554MB) of swap available in the system and 503152KB (503MB) is free. In this case there is substantial RAM available for the service console to use and therefore very little swapping occurs.
 
Note: This view only shows you the amount of RAM that is assigned to the ESX host service console, it does not provide a view of the total RAM in the server.
 
To troubleshoot an ESX host that shows a low amount of RAM and high amount of swapping:
  • Disable any third party services that have been installed for testing. The third party services may be using up memory resources.
  • Try increasing the amount of RAM that has been assigned to the ESX host service console. For more information, see Increasing the amount of RAM assigned to the ESX Server Service Console (1003501).
  • Check all virtual machine configurations to ensure none of them have an unreasonably high CPU reservation, like 10000MHz.
Note: You can also see the amount of memory and swap currently in use from the /proc/meminfo file.
I/O Starvation can be caused by many issues, but commonly occues when a LUN is removed and the ESX host is not rescanned. To properly remove LUNs from your ESX host, see Unpresenting a LUN containing a datastore from ESX 4.x and ESXi 4.x (1015084).

No comments:

Post a Comment