Removing stale hosts/cluster compute resource entries from vRA 7
Update: This did remove all my compute resource configuration within my fabric groups (potentially because I did not create reservations yet which actually utilized these fabric groups and compute resources that I did not want to remove). Please use with caution!
After having finally found the time to set up the latest version of vRealize Automation (7.0.1) and adding my vCenter endpoint everything was looking good, until I got carried away, performed some other tasks, renamed a few clusters in vCenter and then went back to configure my fabric groups.
Too bad that data collection already ran before I renamed the clusters… so as a result I now had stale entries for compute resources when trying to add a new fabric group.
As we can see there is no more cluster called “Lab” in my environment. I tried using a tip from someone else who had seen this in the past and stopped and restarted the agent but that didn’t exactly help.
Since I hadn’t set anything up yet I tried the brutal method of simply removing the endpoint itself, without too much success either…
Well, at least we get an error message, put that into google and it lead to: https://kb.vmware.com/kb/2105809.
Too bad that the KB is only marked for 6.x.
Next step: Annoy Carl Prahl and Kimberly Delgado to squeeze an answer out of them. This did help. The method seems to be valid for 7.0 as well but they also mentioned that the same result could be achieved via the CloudClient. I never played with that before, so I thought to give it a go.
The download and documentation can be found at: https://developercenter.vmware.com/tool/cloudclient/4.0.0.
First impression: Runs under Windows => me likey! Needs Java … oh crap :/
Second impression: I am too lazy to read documentation, but it does tab complete, it has the help command build in for live documentation while trying stuff.
So first things first, let’s try help and see what happens.
Long command list and I actually see something with compute resources in there as well. So let’s list those.
Or not… guess we need to log in first (there is ways to automate this, that exercise is left to the reader to figure out).
That actually worked, cool! Now I can also list all or only inactive compute resources, progress!
Let’s pause here for a second… whoever did write this message deserves a good timeout in a quiet corner. This is just utter ^&*()! Why on earth would you scare any end user by telling them that a manual process is needed and if not done prior to continuing this command might lead to data corruption, WITHOUT actually disclosing what that process is but rather just prompt the user to continue. This is just really really bad design.
What makes this even worse is that even after searching around I could only find the above mentioned KB article when trying to find documentation on how to remove stale hosts or clusters from vRA. Nothing in the official documentation about this, nothing in the CloudClient documentation about this “manual process” as well.
So I decided to continue on, as it is a lab… I snapshotted the appliance and made a IaaS DB backup to prepare for the worst case of corruption and a resulting rollback.
The next step tells you which resources will be removed (since I don’t have any active reservation this maps to all entities vRA has collected so far). After continuing that as well you get a surprise though.
After basically having wasted my time the developer of this great tool now decides it’s time to let me in on the secret of the manual action after all. Worst possible timing, you just increased my frustration by factor 10… Gladly it’s a rather easy task to perform.
The compute resources are indeed gone now, so time to start the agent again and let it collect new data and indeed after a couple of seconds my stale “Lab” cluster is gone. Mission Accomplished and I didn’t even need to remove the endpoint after all but lost all configuration for compute resources in my fabric groups, so do be careful with this!