VMware Workstation drivers fighting over NICs – Windows is the clear loser

Share on:

Overview

It has been a while since my last post and I partly blame the job change on it. lots to do, little time... those were the words I have written for this exact post back in 2015, however they still apply in 2020 as well apparently. This is a repost from an old blog of myself way back then simply to test out Hugo, diving a bit into Markdown and getting myself back into the habit of blogging.

The Problem

This weekend my laptop lost power and offered me the possibility to go into full nerd mode for a change of pace again. I basically updated to Windows 10 the day it came out and have been very happy with it since but after the power outage the login mask would suddenly completely freeze.

The first instinct was to blame the power outage and some sort of corrupt system update installation or inconsistent file data. Taking a look at the system restore points quickly revealed that no updates had been installed though.

No restore points available

The Troubleshooting

One thing I learned at my time at VMware was that usually the last thing you remember isn’t actually the last thing that changed on the system. So what was the real last thing that happened? The Reboot due to the power outage! There would be 2 clear and easy ways out of the predicament straight away. 1) Try and restore an earlier restore point and hope for the best, 2) Reinstall.

Both options to me sound pretty annoying though as I don’t perform regular backups and am too lazy to configure everything from scratch again. Additionally I am an Engineer and like to fix problems rather than to erase them.

The next phase was hoping for something just taking a while so after another reboot I put the laptop away for a couple of minutes, grabbed some food and saw a bluescreen (or rather the new equivalent in Windows 10) reading DPC_WATCHDOG_VIOLATION. Quickly doing some google research this error seems to be driver related, but I could not remember having installed any new driver components in ages except for a graphics driver a couple of weeks back and I have rebooted the system since then a couple of times. Since it seemed to be driver related though I tried safe mode booting in the hopes of actually getting some system access to see what else changed and possibly even take a look at the event log for any clues.

Safe mode did indeed let the system start just fine and the programs panel showed that indeed some Windows Updates had been installed. So I uninstalled those and also ran a chkdsk just to be sure the power outage did not screw up something on the file system in any way. Unfortunately this was also fruitless. So another safe mode boot followed.

The system event logs did not really reveal much in terms of any failing driver component as well but in this state my laptop would actually read a USB key just fine. This means I could take a peek at the memory dump the BSOD has been writing (never turn that feature off). So I copied the dump over to another machine, loaded the latest WinDBG binaries and began to analyze the dump.

 1    6: kd> !analyze -v
 2    *******************************************************************************
 3    * Bugcheck Analysis *
 4    *******************************************************************************
 5
 6    DPC_WATCHDOG_VIOLATION (133)
 7    The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL
 8    or above.
 9    Arguments:
10    Arg1: 0000000000000001, The system cumulatively spent an extended period of time at
11    DISPATCH_LEVEL or above. The offending component can usually be
12    identified with a stack trace.
13    Arg2: 0000000000001e00, The watchdog period.
14    Arg3: 0000000000000000
15    Arg4: 0000000000000000
16
17    SYMBOL_NAME: vmnetbridge+70f8
18    FOLLOWUP_NAME: MachineOwner
19    MODULE_NAME: vmnetbridge
20    IMAGE_NAME: vmnetbridge.sys
21    DEBUG_FLR_IMAGE_TIMESTAMP: 53d4fef8
22    BUCKET_ID_FUNC_OFFSET: 70f8
23    FAILURE_BUCKET_ID: 0x133_ISR_vmnetbridge!Unknown_Function
24    BUCKET_ID: 0x133_ISR_vmnetbridge!Unknown_Function
25    PRIMARY_PROBLEM_CLASS: 0x133_ISR_vmnetbridge!Unknown_Function
26    ANALYSIS_SOURCE: KM
27    FAILURE_ID_HASH_STRING: km:0x133_isr_vmnetbridge!unknown_function
28    FAILURE_ID_HASH: {9badc5ac-bdaf-6816-6d94-61ede9c1ab5f}

So the automatic analyzer thinks it’s the vmnetbridge driver but it will normally just take the first thing off the stack that it doesn’t know about and blame that. So let’s look at the stack.

 1    6: kd> k
 2    # Child-SP RetAddr Call Site
 3    00 ffffd000`9845bc78 fffff803`a49f3e7a nt!KeBugCheckEx
 4    01 ffffd000`9845bc80 fffff803`a48971cf nt! ?? ::FNODOBFM::`string’+0xa07a
 5    02 ffffd000`9845bd10 fffff803`a481ed15 nt!KeClockInterruptNotify+0x9f
 6    03 ffffd000`9845bf40 fffff803`a495e257 hal!HalpTimerClockIpiRoutine+0x15
 7    04 ffffd000`9845bf70 fffff803`a49dc60a nt!KiCallInterruptServiceRoutine+0x87
 8    05 ffffd000`9845bfb0 fffff803`a49dca37 nt!KiInterruptSubDispatchNoLockNoEtw+0xea
 9    06 ffffd000`989dda50 fffff803`a4896d53 nt!KiInterruptDispatchNoLockNoEtw+0x37
10    07 ffffd000`989ddbe0 fffff803`a48942af nt!KxWaitForSpinLockAndAcquire+0x23
11    08 ffffd000`989ddc10 fffff800`2dc570f8 nt!KeAcquireSpinLockAtDpcLevel+0x1f
12    09 ffffd000`989ddc40 fffff800`2ba095d3 vmnetbridge+0x70f8
13    0a ffffd000`989ddd60 fffff800`2ba1c0e1 ndis!ndisMIndicateNetBufferListsToOpen+0x133
14    0b ffffd000`989dde20 fffff800`2ba2ae9f ndis!ndisMDispatchReceiveNetBufferListsWithLock+0x201
15    0c ffffd000`989ddf50 fffff800`2ba5410b ndis!ndisMTopReceiveNetBufferLists+0x21dbf
16    0d ffffd000`989de050 fffff800`2ba2c433 ndis!ndisInvokeNextReceiveHandler+0x4b
17    0e ffffd000`989de120 fffff800`2ba0cf67 ndis!ndisFilterIndicateReceiveNetBufferLists+0x1fe23
18    0f ffffd000`989de1c0 fffff800`2d18b06a ndis!NdisFIndicateReceiveNetBufferLists+0x57
19    10 ffffd000`989de200 fffff800`2d18bdf1 jnprns+0xb06a
20    11 ffffd000`989de260 fffff800`2d1876e2 jnprns+0xbdf1
21    12 ffffd000`989de2d0 fffff800`2d187b8e jnprns+0x76e2
22    13 ffffd000`989de360 fffff800`2ba0cc33 jnprns+0x7b8e
23    14 ffffd000`989de3e0 fffff800`2ba1ff2e ndis!ndisCallReceiveHandler+0x43
24    15 ffffd000`989de430 fffff803`a49091f5 ndis!ndisDataPathExpandStackCallback+0x3e
25    16 ffffd000`989de480 fffff800`2ba1ffd5 nt!KeExpandKernelStackAndCalloutInternal+0x85
26    17 ffffd000`989de4d0 fffff800`2ba5435a ndis!ndisExpandStack+0x19

Indeed after the internal Windows functions below the BugCheck we find vmnetbridge on the stack. Going a bit further down though we can see jnprns as well which seems to be the Juniper VPN client software and its system drivers. Since I didn’t need that software anymore anyway I decided to uninstall all components of it but again no change.

I was pondering to uninstall VMware Workstation next but an error in the uninstaller prevented this, in safe mode I had no real access to the custom drivers it installs and the uninstaller simply errors out unless it can remove everything (I would define that as a bug as the error message is simply generic without any hint on what is going wrong unless you look into the actual vminst.log).

But this actually got me thinking, one change I did since the last reboot was to install Oracle VirtualBox alongside VMware Workstation. Even though I did some due diligence on google before to see if there is any issues with having those 2 co-installed I did not stumble across anything that would have made me think again before installing it.

I know that Workstation has to interact with the Windows networking stack quite a bit for the host-only networks, and VirtualBox is actually doing the same. So what seems to be the issue here. Let’s take a look at all CPU callstacks in the dump.

  1    6: kd> !running -it
  2
  3    System Processors: (00000000000000ff)
  4    Idle Processors: (0000000000000000)
  5
  6    Prcbs Current (pri) Next (pri) Idle
  7    0 fffff803a4bf0180 ffffe001d0f43040 (12) fffff803a4c66740 …………….
  8
  9    # Child-SP RetAddr Call Site
 10    00 ffffd000`9854b8a0 fffff803`a48942af nt!KxWaitForSpinLockAndAcquire+0x23
 11    01 ffffd000`9854b8d0 fffff800`2dc570f8 nt!KeAcquireSpinLockAtDpcLevel+0x1f
 12    02 ffffd000`9854b900 fffff800`2ba095d3 vmnetbridge+0x70f8
 13
 14    1 ffffd000981dc180 ffffe001cbb84040 (12) ffffd000981e8cc0 …………….
 15
 16    # Child-SP RetAddr Call Site
 17    00 ffffd000`9818d418 fffff803`a4a8505d hal!KeQueryPerformanceCounter
 18    01 ffffd000`9818d420 fffff803`a4a80d3c nt!KiFreezeTargetExecution+0x2c5
 19    02 ffffd000`9818d530 fffff803`a49db344 nt!KeBugCheck2+0xc14
 20    03 ffffd000`9818dc40 fffff803`a49f3e7a nt!KeBugCheckEx+0x104
 21    04 ffffd000`9818dc80 fffff803`a48971cf nt! ?? ::FNODOBFM::`string’+0xa07a
 22    05 ffffd000`9818dd10 fffff803`a481ed15 nt!KeClockInterruptNotify+0x9f
 23    06 ffffd000`9818df40 fffff803`a495e257 hal!HalpTimerClockIpiRoutine+0x15
 24    07 ffffd000`9818df70 fffff803`a49dc60a nt!KiCallInterruptServiceRoutine+0x87
 25    08 ffffd000`9818dfb0 fffff803`a49dca37 nt!KiInterruptSubDispatchNoLockNoEtw+0xea
 26    09 ffffd000`98529710 fffff803`a4896d48 nt!KiInterruptDispatchNoLockNoEtw+0x37
 27    0a ffffd000`985298a0 fffff803`a48942af nt!KxWaitForSpinLockAndAcquire+0x18
 28    0b ffffd000`985298d0 fffff800`2dc570f8 nt!KeAcquireSpinLockAtDpcLevel+0x1f
 29    0c ffffd000`98529900 fffff800`2ba095d3 vmnetbridge+0x70f8
 30
 31    2 ffffd0009818e180 ffffe001ccbff040 (12) ffffd0009819acc0 …………….
 32
 33    # Child-SP RetAddr Call Site
 34    00 ffffd000`98269418 fffff803`a4a85055 nt!KiSaveProcessorControlState+0x97
 35    01 ffffd000`98269420 fffff803`a4a80d3c nt!KiFreezeTargetExecution+0x2bd
 36    02 ffffd000`98269530 fffff803`a49db344 nt!KeBugCheck2+0xc14
 37    03 ffffd000`98269c40 fffff803`a49f3e7a nt!KeBugCheckEx+0x104
 38    04 ffffd000`98269c80 fffff803`a48971cf nt! ?? ::FNODOBFM::`string’+0xa07a
 39    05 ffffd000`98269d10 fffff803`a481ed15 nt!KeClockInterruptNotify+0x9f
 40    06 ffffd000`98269f40 fffff803`a495e257 hal!HalpTimerClockIpiRoutine+0x15
 41    07 ffffd000`98269f70 fffff803`a49dc60a nt!KiCallInterruptServiceRoutine+0x87
 42    08 ffffd000`98269fb0 fffff803`a49dca37 nt!KiInterruptSubDispatchNoLockNoEtw+0xea
 43    09 ffffd000`989eba50 fffff803`a4896d50 nt!KiInterruptDispatchNoLockNoEtw+0x37
 44    0a ffffd000`989ebbe0 fffff803`a48942af nt!KxWaitForSpinLockAndAcquire+0x20
 45    0b ffffd000`989ebc10 fffff800`2dc570f8 nt!KeAcquireSpinLockAtDpcLevel+0x1f
 46    0c ffffd000`989ebc40 fffff800`2ba095d3 vmnetbridge+0x70f8
 47
 48    3 ffffd000982aa180 ffffe001ccbc1040 (12) ffffd000982b6cc0 …………….
 49
 50    # Child-SP RetAddr Call Site
 51    00 ffffd000`982e2418 fffff803`a4a85055 nt!KiSaveProcessorControlState+0x97
 52    01 ffffd000`982e2420 fffff803`a4a80d3c nt!KiFreezeTargetExecution+0x2bd
 53    02 ffffd000`982e2530 fffff803`a49db344 nt!KeBugCheck2+0xc14
 54    03 ffffd000`982e2c40 fffff803`a49f3e7a nt!KeBugCheckEx+0x104
 55    04 ffffd000`982e2c80 fffff803`a48971cf nt! ?? ::FNODOBFM::`string’+0xa07a
 56    05 ffffd000`982e2d10 fffff803`a481ed15 nt!KeClockInterruptNotify+0x9f
 57    06 ffffd000`982e2f40 fffff803`a495e257 hal!HalpTimerClockIpiRoutine+0x15
 58    07 ffffd000`982e2f70 fffff803`a49dc60a nt!KiCallInterruptServiceRoutine+0x87
 59    08 ffffd000`982e2fb0 fffff803`a49dca37 nt!KiInterruptSubDispatchNoLockNoEtw+0xea
 60    09 ffffd000`989c8710 fffff803`a4896d53 nt!KiInterruptDispatchNoLockNoEtw+0x37
 61    0a ffffd000`989c88a0 fffff803`a48942af nt!KxWaitForSpinLockAndAcquire+0x23
 62    0b ffffd000`989c88d0 fffff800`2dc570f8 nt!KeAcquireSpinLockAtDpcLevel+0x1f
 63    0c ffffd000`989c8900 fffff800`2ba095d3 vmnetbridge+0x70f8
 64
 65    4 ffffd00098323180 ffffe001d0edc040 (12) ffffe001d43ae080 (13) ffffd0009832fcc0 …………….
 66
 67    # Child-SP RetAddr Call Site
 68    00 ffffd000`9832f8c8 fffff803`a48518fe nt!KeBugCheckEx
 69    01 ffffd000`9832f8d0 fffff803`a4ae00e0 hal!HalBugCheckSystem+0x7e
 70    02 ffffd000`9832f910 fffff803`a48528de nt!WheaReportHwError+0x258
 71    03 ffffd000`9832f970 fffff803`a4a86c78 hal!HalHandleNMI+0xfe
 72    04 ffffd000`9832f9a0 fffff803`a49e33c2 nt!KiProcessNMI+0x150
 73    05 ffffd000`9832f9f0 fffff803`a49e3236 nt!KxNmiInterrupt+0x82
 74    06 ffffd000`9832fb30 fffff803`a4831b62 nt!KiNmiInterrupt+0x176
 75    07 ffffd000`9835b388 fffff803`a481d372 hal!HalpTscQueryCounter+0x2
 76    08 ffffd000`9835b390 fffff803`a4a8505d hal!KeQueryPerformanceCounter+0x62
 77    09 ffffd000`9835b3c0 fffff803`a4a80d3c nt!KiFreezeTargetExecution+0x2c5
 78    0a ffffd000`9835b4d0 fffff803`a49db344 nt!KeBugCheck2+0xc14
 79    0b ffffd000`9835bbe0 fffff803`a49f3e7a nt!KeBugCheckEx+0x104
 80    0c ffffd000`9835bc20 fffff803`a4896c26 nt! ?? ::FNODOBFM::`string’+0xa07a
 81    0d ffffd000`9835bcb0 fffff803`a489766d nt!KiUpdateRunTime+0x56
 82    0e ffffd000`9835bd10 fffff803`a481eaa6 nt!KeClockInterruptNotify+0x53d
 83    0f ffffd000`9835bf40 fffff803`a495e257 hal!HalpTimerClockInterrupt+0x56
 84    10 ffffd000`9835bf70 fffff803`a49dc60a nt!KiCallInterruptServiceRoutine+0x87
 85    11 ffffd000`9835bfb0 fffff803`a49dca37 nt!KiInterruptSubDispatchNoLockNoEtw+0xea
 86    12 ffffd000`99786a50 fffff803`a4896d40 nt!KiInterruptDispatchNoLockNoEtw+0x37
 87    13 ffffd000`99786be0 fffff803`a48942af nt!KxWaitForSpinLockAndAcquire+0x10
 88    14 ffffd000`99786c10 fffff800`2dc570f8 nt!KeAcquireSpinLockAtDpcLevel+0x1f
 89    15 ffffd000`99786c40 fffff800`2ba095d3 vmnetbridge+0x70f8
 90
 91    5 ffffd00098360180 ffffe001ccbfe040 (12) ffffd0009836ccc0 …………….
 92
 93    # Child-SP RetAddr Call Site
 94    00 ffffd000`991fa530 fffff803`a48942af nt!KxWaitForSpinLockAndAcquire+0x23
 95    01 ffffd000`991fa560 fffff800`2dc570f8 nt!KeAcquireSpinLockAtDpcLevel+0x1f
 96    02 ffffd000`991fa590 fffff800`2ba095d3 vmnetbridge+0x70f8
 97
 98    6 ffffd000983dd180 ffffe001cdb55040 (12) ffffd000983e9cc0 …………….
 99
100    # Child-SP RetAddr Call Site
101    00 ffffd000`9845bc78 fffff803`a49f3e7a nt!KeBugCheckEx
102    01 ffffd000`9845bc80 fffff803`a48971cf nt! ?? ::FNODOBFM::`string’+0xa07a
103    02 ffffd000`9845bd10 fffff803`a481ed15 nt!KeClockInterruptNotify+0x9f
104    03 ffffd000`9845bf40 fffff803`a495e257 hal!HalpTimerClockIpiRoutine+0x15
105    04 ffffd000`9845bf70 fffff803`a49dc60a nt!KiCallInterruptServiceRoutine+0x87
106    05 ffffd000`9845bfb0 fffff803`a49dca37 nt!KiInterruptSubDispatchNoLockNoEtw+0xea
107    06 ffffd000`989dda50 fffff803`a4896d53 nt!KiInterruptDispatchNoLockNoEtw+0x37
108    07 ffffd000`989ddbe0 fffff803`a48942af nt!KxWaitForSpinLockAndAcquire+0x23
109    08 ffffd000`989ddc10 fffff800`2dc570f8 nt!KeAcquireSpinLockAtDpcLevel+0x1f
110    09 ffffd000`989ddc40 fffff800`2ba095d3 vmnetbridge+0x70f8
111
112    7 ffffd0009849c180 ffffe001cba6d040 (12) ffffd000984a8cc0 …………….
113
114    # Child-SP RetAddr Call Site
115    00 ffffd000`984d4418 fffff803`a4a85055 nt!KiSaveProcessorControlState+0x97
116    01 ffffd000`984d4420 fffff803`a4a80d3c nt!KiFreezeTargetExecution+0x2bd
117    02 ffffd000`984d4530 fffff803`a49db344 nt!KeBugCheck2+0xc14
118    03 ffffd000`984d4c40 fffff803`a49f3e7a nt!KeBugCheckEx+0x104
119    04 ffffd000`984d4c80 fffff803`a48971cf nt! ?? ::FNODOBFM::`string’+0xa07a
120    05 ffffd000`984d4d10 fffff803`a481ed15 nt!KeClockInterruptNotify+0x9f
121    06 ffffd000`984d4f40 fffff803`a495e257 hal!HalpTimerClockIpiRoutine+0x15
122    07 ffffd000`984d4f70 fffff803`a49dc60a nt!KiCallInterruptServiceRoutine+0x87
123    08 ffffd000`984d4fb0 fffff803`a49dca37 nt!KiInterruptSubDispatchNoLockNoEtw+0xea
124    09 ffffd000`9876fa50 fffff803`a4896d50 nt!KiInterruptDispatchNoLockNoEtw+0x37
125    0a ffffd000`9876fbe0 fffff803`a48942af nt!KxWaitForSpinLockAndAcquire+0x20
126    0b ffffd000`9876fc10 fffff800`2dc570f8 nt!KeAcquireSpinLockAtDpcLevel+0x1f
127    0c ffffd000`9876fc40 fffff800`2ba095d3 vmnetbridge+0x70f8

So all CPUs are stuck in waiting to acquire a spinlock which eventually leads to the timeout mentioned in the BSOD. Since it is a VMware networking driver I would suspect the issue to be there. What to do to fix this now though?

The Solution

I already tried putting the laptop into airplane mode so I was sure it was not the wifi driver conflicting here somehow. When I opened my network devices I saw something surprising though.

Windows network adapters

Additionally to the ones shown here there was 1 more connection, a VirtualBox host local adapter. So I deactivated all network interfaces and booted into normal mode and this time it actually succeeded. I activated all VMware adapters again and surely no issues even with the next reboot. Activating the VirtualBox adapter though immediately caused the issue to be there again. So to me it seems like the Workstation network drivers get stuck in a deadlock as soon as another product’s Windows network adapter is present. I am currently working around the issue by using NAT-only networking in VirtualBox and I did indeed never restart the laptop after having installed VirtualBox. There is no issue when adding an adapter during runtime of the operating system but a reboot grinds it to a complete halt.

This process overall cost me 4 hours easily but on the upside I did not lose any of my data and I could actually find the cause without having to resort to guess work or reinstall, my inner nerd is satisfied!


Closing Comments

This section is the new 2020 addition from my old copy and paste... I managed to fight myself through the Hugo setup and initial struggles with the web hoster (still fighting one more battle with inconsistent path linkage that I cannot explain, so far their tech support is looking into it). Since this is also somewhat readable it looks like my 10 minutes spent on reading through Markdown syntax in 2020 was worth that time...

Big thanks to Chip Zoller (@chipzoller) for providing the Clarity based Hugo theme for this!