For all my TAM customer who uses vROps (vRealize Operations Manager), I usually configure the default Oversized Virtual Machine Report to be sent out weekly to my customer. This report shows all the VMs that are reported oversized and what is the recommended vCPU and Memory to be reduced for the VMs. Reducing the size of VMs will help to reduce resources and increase capacity for future use. This is done as part of resource reclaimation.
We understand that not all VMs are good candidate for right sizing or reclaimation. Some have requirements by the OS and applications which stated the minimum vCPU and Memory. These must be honoured when it comes to support later on. Reducing the size of the VMs sometimes can be challenging not because of the technicality but politically.
Most often than not, right sizing a VM can actually improve the performance of the VM. This is particularly true if the VM configured with high vCPU and Memory. Not only big size VM deprive other VMs of the CPU and Memory, it often have to wait for the all the available CPU slot for its big size VMs. Therefore, this can slow down the VM performance. In addition, we also have to take into consideration of NUMA architecture of the host. Rule of thumb is do not allocate vCPU to a VM that is larger than the physical cores of the host. There is sizing guide based on NUMA which you can find it here.
I have a customer who come to me and asked what he can do from the weekly Oversized VM Report that he received. Is there any other metrics that he can see to justify the recommended reduction on vCPU and Memory of the VM. We know that there are other metrics that can impact the performance of the VM such as CPU Ready time and IO wait. But the default report does not show all these addition metrics.
I came across a blog and the writer has created a vROps dashboard showing VM right-sizing with additional key metrics. I find it very useful and informative. Here is a view of one of my customer which I helped them to setup it up. From here you can have a bird-eyes view of your clusters and all VMs within the cluster you have selected.
CPU Ready is one of the key metrics to look at to determine if the VM is oversized. CPU Ready time, measured in ms, is the time a VM is ready to run but unable to schedule by the physical host CPU. Either the host is busy running other VMs or host CPU does not have all available CPU slot for the VM. This can impact the performance of the VM. As seen below, there are couple of VMs having CPU Ready time showing in RED. Any CPU Ready time higher than 1000 ms is consider high.
I have setup this dashboard for my customer and shared with them on how to dive deeper into the Oversized VM Report and pull out key metrics to justify to the VM owner of the importance of right sizing.
Here are the steps to setup the dashboard in your vROps:
- Download the dashboard here
- Login to your vROps with administrative permission
- Navigate to Visualize -> Dashboards -> Manage -> 3 dots -> Import
- Browse -> Select “Dashboard-2021-12-23 06-47-57 AM.zip”, select “Overwrite” and then click Import
- A new dashboard will be added into vROps. You can rename it.
- Next we will add the View into vROps that comes with the download
- Navigate to Visualize -> Views -> Manage -> 3 dots -> Import
- Browse -> Select “Rightsize_Views.zip”, select “Overwrite existing View” and then click Import
- Once finished, you will see the following Views added
- Lastly, we will need to import a Super Metric. This Super Metric show the uptime of a VM
- To import, in vROps, navigate to Super Metrics -> 3 dots -> Import
- Browse -> Select “supermetric_uptime_days.json”, select “Overwrite existing Super Metric” and then click Import
- So far we have added a Dashboard, couple of Views and a Super Metric. We are not done yet. Glad that you have reach this far.
- Next we will need to enable CPU Ready (ms) and the imported Super Metric. By default, CPU Ready (ms) is not enabled.
- To enabled CPU Ready (ms) collection, navigate to Configure -> Policies -> Select “Default” Policy. Note the “D” under Priority column. “D” stands for Default
- Select the Default Policy -> Edit Policy -> Metrics and Properties -> Under Select Object Type: -> Virtual Machine -> All Filters: -> type “ready” and click Enter
- Expand the Metrics -> CPU -> Select Ready (ms) -> Enabled State -> Instanced State -> Turn on CPU Collect -> click Save
- Finally, we enabled the Super Metric. Steps are similar to enabling CPU Ready (ms) but this time. To do this, navigate to Configure -> Policies -> Select Default Policy -> Edit Policy -> Metrics and Properties -> Select Object Type: Virtual Machine -> Expand Super Metrics
- Select uptime_in_days -> Enabled -> Click Save
There you have it. Easy!!!
Once you have completed all the steps, your dashboard is ready for viewing. It may takes a couple of vROps collection cycle to complete.
Here is another view of a cluster with several big size VMs, showing high CPU Ready time
You can view the VM uptime column generated by the Super Metric if you scroll to the far right of the table
At the bottom of the dashboard, you may also find a quick summary of all the hosts based on the cluster selected
This is one of the useful vROps dashboard that can help you to justify to VM owner on right-sizing and resources reclaimation. It has most of the key metrics shown in the table for easy reference and consumption. I hope this post help you in optimizing and right-sizing your VM workloads.