A Virtualization, Photography, Techie, Foodie, Gadget Blog
Header image

I am adding new topic to the Blog.  High Performance Computing and Supercomputing on VMware vSphere.  This topic came up from a recent customer discussion on how they could build a distributed cloud to the purpose of grid computing that scales on par with traditional super computers.   Coming from a family that has been involved with supercomputers for decades, I found this topic to be of keen interest.  As a result, I’ve started to collect a series of notes and white papers on the topic of High Performance Computing (HPC) on VMware vSphere and the changes that must be considered to maximize performance and scalability of a virtual HPC Cluster Environment.

First, an old but mandatory read on the topic is covered in a Paper published by Cam Macdonald and Paul Lau from the University of Alberta titled:  Pragmatics of Virtual Machines for High-Performance Computing: A Quantitative Study of Basic Overheads by Cam Macdonell and Paul Lu
http://www.vmware.com/files/pdf/paullu.vmware.final.pdf

Macdonald and Lau’s paper is currently a little dated but many of the ideas still hold.  Some points to be aware of is understanding how vSphere 5 has been optimized to reduce overhead, allow for better performance, and the automated orchestration features that will allow for capacity on demand.  Macdonald’s and Lau’s closing issues regarding overhead have been reduced to less than 1% for most compute types and with the increase in performance of the CPU every year, this overhead becomes smaller yet.

Next, VMware Employee Jeff Buell posted a blog posts on “HPC Application Performance on ESX 4.1: Stream” back in Sept 2010.

http://blogs.vmware.com/performance/2010/09/hpc-application-performance-on-esx-41-stream.html

Jeff takes great strides in identifying factors that will optimize performance of the compute cluster.  Most notable is the use of local memory when writing applications will ensure optimal memory bandwidth once deployed and keeping the computer resources within a single NUMA node to optimize resource utilization.   While vSphere can address 1TB of RAM and up to 32CPU’s from a single VM, the optimization for performance lays on keeping VM’s tuned and sized to run within the optimal limitations of the server the VM is hosted upon.

Next, I want to make sure you follow the blog posts of Josh Simons.  Josh works in the Office of the CTO at VMware as a strategist specializing in HPC and maintains the VMware Blog posts on HPC here: http://communities.vmware.com/community/vmtn/cto/high-performance

Josh has contributed several videos and discussions on the topics.  With the recent Supercomputing 2011 event in Seattle, John pulled together several interviews and overviews of technologies that will enable Cloud based HPC.

In addition, Josh’s 2010 overview of HPC in the Cloud

http://communities.vmware.com/community/vmtn/cto/high-performance/blog/2010/11/02/video-available-isc-cloud-10-in-frankfurt

Lastly, my own observations and comments:

With the recent release of vSphere 5 and Auto Deploy, the process of maintaining a scalable Cloud infrastructure has become considerably simplified.  The process of updating an entire server farm can be reduced from weeks to minutes by leveraging PXE and an Image Server to refresh entire farms of servers at reboot.  By adding solutions such as templates, workflow orchestration, and capacity management, we can now scale up clusters of computers on demand to accommodate almost any size distributed workload.  Adding vCloud Director and vCloud connector allows us to scale the compute cluster even further into a single or multiple public cloud providers on demand.  In addition, with the new scalability improvements of vSphere 5, we are finding larger VM’s, more addressable RAM, less overhead, and significant IO gains at the Hypervisor.   All of these improvements contribute to the greater acceptance of HPC workloads in the Cloud and in a VM.

As I dig deeper into the topic, I hope I can contribute some of my own personal works to the field and leverage the knowledge of my colleagues to ensure others can explore this emerging growth area.

Looks like VMware made public a revised vSphere 5.0 License Model today. A lot of very good changes that directly benefit the end user. ( http://www.vmware.com/files/pdf/vsphere_pricing.pdf )

The biggest changes are around vRAM Maximums per processor.

vRAM Entitlement per CPU Socket by vSphere edition
- 32GB vRAM/CPU for Essentials Kit (up from 24GB)
- 32GB vRAM/CPU for Essentials Plus Kit (up from 24GB)
- 32GB vRAM/CPU for Standard (up from 24GB)
- 64GB vRAM/CPU for Enterprise (up from 32GB)
- 96GB vRAM/CPU for Enterprise Plus (up from 48GB)

This means a 4CPU Server of Enterprise Plus is entitled to 384GB of powered on VM’s, a 100% increase over the initial model. These entitlements can also still be pooled across vCenter Server (including vCenter Servers in Linked Mode) and only apply to the actual configured virtual RAM of powered on VM’s. VDI Users and VMware View users have a different license model for High Density VDI servers. (Discussed below)

VMware Server (the free hypervisor) users will also be happier. Under the old model, VMware Server was capped at 8GB. The revisions have increased the cap to 32GB per server (not a per CPU limit). This is a hard limit and cannot be exceed like the retail versions. Personally, I felt the 8GB limit was very limiting. The 32GB change is a significant step in the right direction and should accommodation most SMB’s/Branch Office/Home Office users. Those that feel hindered by the 32GB hard limit should look at vSphere Essentials and Essentials Plus (starting at $560) which provides 6 CPU’s (up to three servers with 2CPU’s each) with a vRAM entitlement of 192GB. Alternative Options are to move to the revised vSphere Acceleration Kits (Standard, Enterprise, and Enterprise Plus) which provide bundles of 6 CPU’s at substantial discounts. Pricing has vSphere Essential starts at $83/CPU retail!

Even better, There is no longer a penalty for very large VM’s. All VM’s only count vRAM consumption up to 96GB. Any vRAM over 96GB is not counted. That means a 1TB VM would be covered by a single CPU license. A 1TB VM is now covered by a single Enterprise Plus CPU server license. I can’t imagine running more than a single 1TB VM on a host, but people will always surprise you.

vRAM usage is now monitored on a 12 month rolling average with daily high water marks. This makes large infrequent deployments less of an issue for customer who anticipate going over the vRAM entitlement but know that they will be removing VM’s later.

As before, VMware View environments don’t follow the vRAM model. View is a CPU Socket based license for View Desktops. Non-VMware View users will be able to leverage a vSphere for Desktops product just for VDI. vSphere Desktop edition is licensed based on the total number of Powered On Desktop Virtual Machines and can be purchased either stand-alone in a pack size of 100 desktop VM or included with the VMware View Bundle.

Another interesting note for current vSphere 4.1 users with valid Support and Subscription (SnS). Customers who purchased licenses for vSphere 4.x (or previous versions) prior to September 30, 2011 to host desktop virtualization, and hold current SnS agreements, may upgrade to vSphere 5.0 while retaining access to unlimited vRAM entitlement. Desktop licenses covered by this provision, however, may not be managed by the same instance of Virtual Center which is being used to manage non-desktop OS virtual machines.

Lastly, I have heard that vCenter will get an update in the near future after release to accurately report these last minute changes. In the mean time we should be expecting a new tool to report the actual vRAM consumption.

There are lots of key aspects that were addressed in the initial vSphere 5 guide that have not changed but are different form the vSphere 4 model. Over all, I think the consumer gains a lot more value in the new editions! Unlimited RAM capabilities per server, no more Core/CPU limits, substantial increase in CPU’s per VM limit by edition (32 CPUs/VM for Ent+!) and vMotion all the way down to the Standard Edition.

A lot has changed. I see it all as a major improvement for both the SMB users and for the Enterprise.

vSphere 5 Feature Model per Edition

Taken from revised the vSphere 5 Pricing Guide

Over the past four years, I have been amazed at how many companies have resisted virtualizing Oracle on VMware products because of fear of support from Oracle. Most had heard FUD over the years regarding corrupt data, performance issues, and DR nightmares from their Oracle account teams while at the same time being told to virtualize Oracle on the Oracle VM Virtualbox. While Oracle never claims the platform is better, the reasoning was Oracle could be accountable for the entire stack and the performance issues would be solely their own. This logic is flawed.

Fortunately, we no longer need to worry about Oracle denying support for a solution on VMware vSphere. AS of November, 2010, Oracle finally opened its eye, and ears, to the customer’s feedback. The single document maintained by Oracle covering the support statements for Oracle products on VMware hypervisors has been updated. Document ID #249212.1, available on MyOracleSupport.com, now addresses all Oracle products, including Oracle Real Application Clusters (RAC) on VMware vSphere.

From the VMware side, we wil ltakr a pro-active role in your oracle support. From the VMware Oracle Support Page, we share the following steps to ensure your issues are resolved smoothly between Oracle and VMware.

Should you encounter a problem while running Oracle in a VMware environment, please follow these steps to ensure rapid resolution to your issue:

When troubleshooting Oracle 10 or 11 running on VMware vSphere 4:

      Open a Support Request with Oracle Support.
      Concurrently open a separate ticket with VMware Global Support Services (GSS) using your VMware Production Support, Business Critical Support or Mission Critical Support agreement.

VMware GSS provides support for our customers running Oracle products. VMware GSS will open a Support Request for all Oracle cases referred to VMware technical support, and will take complete ownership of the issue until resolution.

 

If your organization is concerned about running Oracle products in a VMware vSphere Environment, take a moment to review the VMware Customer Success Stories of customers running Oracle in a virtualized environment on vSphere.

Also, be sure to check out the VMware Communities post on Oracle RAC Performance on vSphere 4.1 posted December 16th, 2010.
 

Links:

Oracle RAC on vSphere 4.1
http://blogs.vmware.com/performance/2010/12/oracle-rac-performance-on-vsphere-41.html

VMware Support Policy for Oracle Products
http://vmware.com/support/policies/oracle-support.html

Oracle Database Customer Success Stories
http://vmware.com/solutions/partners/alliances/oracle-database-customers.html

VMware announced a promtion to get free VMware Alive VM when you purchase an eligible VMware vSphere product.  The promotion indicates you can receive VMware Alive VM allowing you to manage up to 50 virtual machines and one year of basic support at nor additional charge.  This is a great way to get in on a production product without any additional investment.

As I stated a few days ago with the release of VMware Alive 7.2, Alive is a uniquely designed product that enables the VM administrator to have a comprehensive view of the overall performance of their complete vSphere environment.

  • Indicators of health, workload and capacity
  • Heat maps to easily locate trouble areas
  • Mapping of virtual machines to host to cluster and datacenter to get to root cause analysis
  • Trending and analytics for quick and effective problem solving

To qualify, you must purchase an eligible product between Nov 23, 2010 and March 1, 2011.

Promotion Details: http://www.vmware.com/landing_pages/vsphere-promotion/
Eligible products for vSphere Promotion

VMware vSphere Performance Resolution Cheat SheetAfter taking an ESX performance Troubleshooting course a few years back, I regularly find that most of the issues we encounter in the field are related to vSphere admins not knowing what is the root cause of their performance issues.  Sure, you could take the wild guess and point at Disk or CPU Saturation, but often, the issue is much more obscure.  As a result, I started working on a small cheat sheet to assist customers in troubleshooting the root cause of their performance issues in ESX.

Armed with my trusty putty.exe and my cheat sheet, I set out to validate a few performance issues with a friend’s server.  Stepping through the diagnostics and output of ESX top quickly took us to an HBA issue.  All along we were blaming the disk. 

I’ve gotten so much value from this little gem, I am sharing it with the ESX Community! 

Download my VMware vSphere Performance Resolution Cheat Sheet.PDF

Additional resources:
VMware vSphere 4 Performance Troubleshooting Guide form VMware

Duncan Epping and Frank Denneman have been working hard on delivering a book on called VMware vSphere 4.1 HA and DRS Technical deepdive (Volume 1).  Their hard work has paid off and their book is available this week through Amazon and CreateSpace!  I saw Jason Boche just picked his copy up yesterday and I just placed my order.
Here are some details on the book!  It looks great Duncan and Frank!

VMware vSphere 4.1 HA and DRS Technical Deepdive zooms in on two key components of every VMware based infrastructure and is by no means a “how to” guide. It covers the basic steps needed to create a VMware HA and DRS cluster, but even more important explains the concepts and mechanisms behind HA and DRS which will enable you to make well educated decisions. This book will take you in to the trenches of HA and DRS and will give you the tools to understand and implement e.g. HA admission control policies, DRS resource pools and resource allocation settings. On top of that each section contains basic design principles that can be used for designing, implementing or improving VMware infrastructures.

Coverage includes:

  • HA node types
  • HA isolation detection and response
  • HA admission control
  • VM Monitoring
  • HA and DRS integration
  • DRS imbalance algorithm
  • Resource Pools
  • Impact of reservations and limits
  • CPU Resource Scheduling
  • Memory Scheduler
  • DPM

Praise for the HA and DRS technical deepdive:

Marc Sevigny (senior staff engineer VMware HA group): “Duncan Epping’s intimate knowledge of VMware HA internals, plus his experience working with HA installations in a wide range of configurations make him the de facto HA guru. Anyone considering or using VMware HA should become one of his devout followers.”

Anne Holler (senior staff engineer VMware Distributed group): “Frank Denneman has extensive knowledge and experience across a wide range of VMware products from ESX to vCenter to  DRS. He has written an indispensable reference for using and understanding DRS and DPM.”

VMware vSphere 4.1 HA and DRS Technical deepdive (Volume 1)

Somehow I missed this announcement last week!  VMware vCenter CapacityIQ 1.5 released on Dec. 2nd, 2010.

vCenter CapacityIQ 1.5. CapacityIQ helps customers analyze, optimize and forecast capacity in their vSphere environments. This release adds compelling new capabilities around storage and reporting:

  • Storage Analytics – visibility into storage capacity, forecast, bottlenecks via disk space and storage I/O trending
  • Resource Optimizations  – storage aware workload modeling and what-if scenarios, detection of constrained and underutilized hosts and outlier detection
  • Scheduled Reports – automated delivery of capacity utilization and optimization reports

https://www.vmware.com/products/vcenter-capacityiq/

On December 1st, VMware released an updated PowerCLI.  The new CLI introduces  new commands for features such as :

  • ESXCLI functionality is now available directly through a new Get-EsxCli cmdlet
  • Esxtop statistics through a Get-EsxTop cmdlet
  • Enhanced vDS support
  • Support for vCenter Server alarms
  • Various host storage enhancements
  • Encrypted credential store

Some of the features I am excited to see include:

  • Wait Tools : Allow you to wait for VMware Tools of the specified virtual machines to load before proceeding.
  • Added support for querying disk and disk partition information of hosts through the Get-VMHostDisk and Get-VMHostDiskPartition cmdlets.
  • Added support for converting hard disks from Thin to Thick and vice versa through theDiskStorageFormat parameter of Move-VM.

In all, this should make scripting and automating builds and labs much easier!

PowerCLI 4.1.1 Documentation

The full PowerCLI 4.1.1 Changelog

Download the PowerCLI 4.1.1 Code

Remember: If you need to do a task more than twice, script it!