Home > Case Post Mortems > Virtual Machine hardware version influecing QueryPerformanceCounter behaviour on reboot aka Nagios showing wrong system uptimes

Virtual Machine hardware version influecing QueryPerformanceCounter behaviour on reboot aka Nagios showing wrong system uptimes

There is already a  kb article in reference to the QueryPerformanceCounter to be found in the VMware knowledge base that affects Windows XP and Windows Server 2003.

QueryPerformanceCounter behaves improperly inside a virtual machine when /usepmtimer is used with some Windows HALs

Last week a colleague of mine had another customer express concerns on how this counter is handled on his Windows Server 2008 R2 VMs. They noticed that under certain circumstances the counter would not reset anymore on perfoming an in guest reboot. For alot of customers this would generally not be of any concern, but apparently the monitoring solution the customer was using in the way it was implemented on their site was relying on this counter as it calculated the system uptime out of it. So if the counter was not reset after a reboot misleading information would be shown.

The issue was very easy reproducible using the following proof of concept piece of code.

// Basic test program to see if QueryPerformanceCounter resets on reboot

#include “stdafx.h”
#include <iostream>
#include “windows.h”

using namespace std;

int _tmain(int argc, _TCHAR* argv[]){

    LARGE_INTEGER StartingTime;
QueryPerformanceCounter(&StartingTime);
cout << StartingTime.QuadPart;
return 0;

}

The results can be seen from the next couple of screenshot pairs showing the pre and post reboot counter from within a default Windows 2008 R2 installation using hardware version 8.

HW8_prereboot

And as the customer expected after rebooting the counter did actually reset.

HW8_postrebootSo let’s have a look at the exact same VM (as it is a clone that simply got guest customization applied to avoid double IPs on the network) after it was upgraded to hardware version 10.

HW10_prerebootThe program still works which is a good thing 🙂 But initiating a reboot from within the guest and checking the counter again does give a different picture.

HW10_postrebootWe can see that the counter actually kept growing. It would only reset once we performed a real powercycle on the guest by using a shutdown and power up again operation.

HW10_postshutdownSo what is different between those 2 hardware versions?

The answer can be found in the following VMware blog post.

Microsoft Operating System Time Sources and Virtual Hardware 10

We did change the timer hardware for that operating system in hardware version 10. Mark (@vmMarkA on twitter, do follow if you are interested in virtualization and performance) was kind enough to establish contact to one of the devs responsible who quickly pointed out the following source of information.

x86 architecture initial state

We can see that it is indeed expected behaviour to have the CPU registers for TSC being unmodified after an init operation and only to be set to 0 after a reset.

So how do we get out of this without rolling back to an old hardware version or using a different hardware timer that might impact guest performance. It turns out that there is a setting that can be set either in the vmx file of the VM or in /etc/vmware/config so that all VMs will pick it up on the next power on.

monitor_control.enable_softResetClearTSC = “TRUE”

The customer did confirm that this setting indeed did resolve their issue. Looking at the table from the blog post Server 2012 might also be affected from this, so if monitoring solutions are showing up wrong system uptimes and hardware version 10 VMs are being used this is definitely one setting to try out.

Categories: Case Post Mortems Tags:
  1. No comments yet.
  1. No trackbacks yet.