One of the great things about running the home lab is you get to triage issues just like you would see in production. I came across a new issue for me in the home lab running vSAN 7.0 on top of the latest build of vSphere ESXi 7.0b, build 16324942. This has to do with one of the Skyline health checks for vSAN calle vSAN daemon liveness. In this post, we will take a look at vSphere vSAN daemon liveness check failed with a failure starting the epd process. Let’s dive more into this issue and see the symptoms of the issue.
vSphere vSAN daemon liveness check failed
I came into the lab this morning with a “red bang” on the vSAN cluster. After looking at the Skyline health of the cluster itself, I saw the vSAN daemon check was the culprit.
The VMware KB here describes it this way:
“vSAN daemons may still have issues, but this test does a very basic check to make sure that they are still running. If this reports an error, the state of the CLOMD, EPD, and CMMDSD service(s) is not working as expected and needs to be checked on the relevant ESXi host. A good way to further probe into CLOMD health is to perform a virtual machine creation test (Proactive tests), as this involves object creation that will exercise and test CLOMD thoroughly. For more information about this issue, refer to the following article: vSAN CLOMD daemon may fail when trying to repair objects with 0 byte components (2149968)“
Interestingly for me, this was an issue that was only with the EPD component of the vSAN daemon check.
The other two checks were normal. The check performs checks on the following:
- COMD
- EPD
- CMMDSD
From the KB:
Data node of stretched cluster | Witness node of stretched cluster | Data node of metadata cluster | Metadata node of metadata cluster | |
CLOMD | Yes | No | Yes | Yes |
EPD | Yes | No | Yes | No |
CMMDSD | Yes | Yes | Yes | Yes |
Troubleshooting
According to the KB, you can try a few things for troubleshooting purposes including the following:
For me, I only had an issue with the EPD service. When I run the relevant commands for only that service, I received the following:
As you can see, manually attempting to start the service fails with the error:
- EPD uses a ramdisk for the db file
On the left, I have the server that is “unhealthy” and on the right, I have a server that is “healthy” from the vSAN daemon liveness checks. As you can see, the server on the right, there are two files:
- epd-store.db
- epd-store.db-journal
So, with a couple of these findings, is it corruption? I am running the ESXi 7.0 hosts booting from USB devices.
One weird thing, in Googling the error, I have turned up a couple of other posts related to vSAN 7.0 with ESXi 7.0b hosts. Take a look at those posts here:
As of today, I have not resolved the issue. However, definitely seeing some weird layout on my USB boot disk for sure. It makes me wonder however with others seeing the exact same error if this is a bug with 7.0b.
Another interesting note to make about this error, it does not prevent the vSAN host from running VMs. As soon as I brought this host up from multiple reboots troubleshooting the issue, each time, DRS was able to successfully place virtual machines on the server as you can see below. For now, I am going to leave the host in play to continue to tinker with the issue for a possible workaround.
Please make a comment if you have ran into this issue with vSAN 7.0 and the vSAN daemon liveness check error.
Wrapping Up
The vSAN Daemon Liveness Check Failed error with vSAN 7 in my case was due to the EPD service failing with a ramdisk for the db file error message. I am still attempting to narrow this down to corruption on the boot disk or perhaps a bug with the 7.0b release due to finding two other cases of this being posted out in the community forum as well as blog posts. Let me know if you guys see this.
2 Comments
Hi! I'm not sure if you resolved this, but I had the same problem with booting from an SD card with vSAN running. Ultimately, it appears that a symbolic link was missing within the /var/lib/vmware/osdata directory.
ReplyDeleteYou need to have a link from /var/lib/vmware/osdata/locker to your main locker VMFS volume. You can find this number by running "df -h' and getting the volume ID.
Then, you should be able to run 'ln -s /vmfs/volumes/5fb5b4fa-f178e79a-1b6c-78e3b51827a4/ /var/lib/vmware/osdata/locker' where you use your volume ID.
Vsan Daemon Liveness Check Failed – Vsan 7 >>>>> Download Now
ReplyDelete>>>>> Download Full
Vsan Daemon Liveness Check Failed – Vsan 7 >>>>> Download LINK
>>>>> Download Now
Vsan Daemon Liveness Check Failed – Vsan 7 >>>>> Download Full
>>>>> Download LINK Lf