Proxmox HA Cluster Configuration for Virtual Machines

 

If you are learning the Proxmox hypervisor or want high-availability cluster resources for learning and self-hosting services with some resiliency, building a cluster is not too difficult. Also, you can easily create Proxmox HA virtual machine clustering once you create cluster nodes. Let’s look at Proxmox HA virtual machine deployment and how to ensure your VM is protected against failure and increase uptime, much like VMware HA.

Proxmox cluster: the starting point

The starting point for a high availability solution with Proxmox is the Proxmox cluster. Most start with a single Proxmox server in the home lab. However, building a cluster requires a 2nd and third node. There are ways to increase the vote of one node if you have two Proxmox servers in a cluster if one goes down. However, for “production” Proxmox VE, having 3 nodes is the standard for configuring a minimum Proxmox clusters for scalability and avalability.

Below, I have three nodes in a Proxmox cluster running Proxmox 8.1.3 in the Proxmox UI.

Proxmox cluster running three nodes in an ha configuration
Proxmox cluster running three nodes in an ha configuration

A Proxmox cluster includes multiple Proxmox servers or nodes that operate together as a logical unit to run your workloads. Understanding how to set up and manage the PVE cluster service effectively is important to ensure your VM data is protected and you have containers hardware redundancy.

Remember that this doesn’t replace all the other best practices with hardware configurations, such as redundant network hardware and power supplies in your Proxmox hosts and UPS battery backup as the basics.

 

Shared storage

When you are thinking about a Proxmox cluster and virtual machine high availability, you need to consider integration with shared storage as part of your design. Shared storage is a requirement so that all Proxmox cluster hosts have access to the data for your VMs. If a Proxmox host goes down, the other Promox hosts can pick up running the VM with the data they already have access to.

You can run a Proxmox cluster where each node has local storage, but this will not allow the VM to be highly available.

For my test cluster, I configured Proxmox Ceph storage. However, many other types of shared storage can work such as an iSCSI or other connection to a ZFS pool, etc. Below, we are navigating to Ceph and choosing to Install Ceph.

Getting started installing ceph
Getting started installing ceph

This launches the Info screen. Here I am choosing to install Ceph Reef and using the No-Subscription repo.

Starting the ceph setup
Starting the ceph setup

Type Y to begin the installation of Ceph.

Confirm the ceph installation
Confirm the ceph installation

Create the Ceph Pool, including configuring the:

  • Name
  • Size
  • Min Size
  • Crush Rule
  • # of PGs
  • PG autoscale mode

Below is the default value for the configuration.

Creating the ceph pool
Creating the ceph pool

A healthy Ceph pool after installing Ceph on all three nodes, creating OSDs, Managers, Monitors, etc.

A healthy ceph storage configuration and pool
A healthy ceph storage configuration and pool

 

Setting Up Your Proxmox Cluster

The journey to high availability begins with the creation of a Proxmox cluster. Here, we’ll guide you through the process of joining multiple nodes to form a unified system. Each Proxmox node will contribute to the cluster’s overall strength, offering redundancy and reliability.

Key Steps in Creating a Proxmox Cluster

  1. Choosing Cluster Nodes: Selecting the right Proxmox nodes is the first step. Ensure that each node is equipped with redundant network hardware and adequate storage capabilities.
  2. Configuring the Network: A stable cluster network is vital. We’ll explore how to set up a network that supports HA, focusing on IP address configuration and avoiding common pitfalls like split brain scenarios.
  3. Cluster Formation: The process of forming a cluster involves initializing the first node and then adding additional nodes with the join cluster function. We’ll walk you through the commands and steps necessary to create your cluster.

Let’s look at screnshots of creating a Proxmox cluster and joining nodes to the cluster. Navigate to Datacenter > Cluster > Create Cluster.

Beginning to create the cluster
Beginning to create the cluster

This will launch the Create Cluster dialog box. Name your cluster. It will default to your primary network link. You can Add links as a failover. Click Create.

Create the new cluster
Create the new cluster

The task will begin and should complete successfully.

The cluster creation completes successfully
The cluster creation completes successfully
Get the cluster join information
Get the cluster join information

You can then click the Cluster join information to display the information needed to join the cluster for the other nodes. You can click the copy information button to easily copy the join information to the clipboard.

Viewing the join information
Viewing the join information

On the target node, we can click the Join Cluster button under the Datacenter > Cluster menu.

Join the cluster from another node
Join the cluster from another node

Now we can use the join information from our first node in the cluster to join additional nodes to the cluster. You will also need the root password of the cluster node to join the other Proxmox nodes.

Entering the join information
Entering the join information

Below, I have created a cluster with 4 Proxmox hosts running Ceph shared storage.

A 4 node proxmox ve cluster
A 4 node proxmox ve cluster

 

Configuring Virtual machine HA

Once your Proxmox cluster is operational, the next step is configuring HA for your virtual machines. The Virtual Machine HA config provides automation for restarting a VM that is owned by a failed host, on a healthy host. This involves setting up shared storage, understanding HA manager, and defining HA groups administration.

High Availability Setup Requirements

When provisioning Proxmox high availability, there are a number of infrastructure requirements.

  1. Shared Storage Configuration: For VMs to migrate seamlessly between nodes, shared storage is a necessity as we have mentioned above so data does not have to move during a failover.
  2. The HA Manager: Proxmox’s HA manager plays a critical role in monitoring and managing the state of VMs across the cluster. It works like an automated sysadmin. After you configure the resources it should oversee, such as VMs and containers, the ha-manager monitors their performance and manages the failover of services to another node if errors occur. Also, the ha-manager can process regular user commands, including starting, stopping, relocating, and migrating services.
  3. Defining HA Groups (optional) : HA groups determine how VMs are distributed across the cluster.

Let’s look at a basic example of configuring a single VM for high availability. Below, in the web interface we have navigated to the Datacenter > HA > Resources > Add button. Click the Add button.

Add resources for ha
Add resources for ha

Select the VM ID to create the HA resource.

Add the resource id for the vm you want to be ha
Add the resource id for the vm you want to be ha

This will configure a service for the VM to make the VM highly available. The service will start and enter the started state. Now, we have the VM configured for HA.

The ha service for the vm is started
The ha service for the vm is started

Configuring HA groups (optional)

The HA group configuration file /etc/pve/ha/groups.cfg defines groups of cluster nodes and how resources are spread across the nodes in the cluster. You can configure resources to only run on the members of a certain group. You can use this to give priority to certain VMs, on certain hosts. Below is the Create HA Group configuration dialog box.

Create an ha group for resources
Create an ha group for resources

Fencing device configuration

Fencing is important for managing node failures in Proxmox VE. When a host goes down and is completely offline, it prevents resource duplication during recovery. Without fencing, resources could run on multiple nodes simultaneously which can corrupt data.

Unfenced nodes can access shared resources, posing a risk. For instance, a VM on an unfenced node might still write to shared storage even if it’s unreachable from the public network, causing race conditions and potential data loss if the VM is started elsewhere.

Proxmox VE employs various fencing methods, including traditional ones like power cutoffs and network isolation, as well as self-fencing using watchdog timers. These timers, integral in critical systems, reset regularly to prevent system malfunctions. If a malfunction occurs, the timer triggers a server reboot. Proxmox VE utilizes built-in hardware watchdogs on modern servers or falls back to the Linux Kernel softdog when necessary.

Fencing configuration in proxmox ve
Fencing configuration in proxmox ve

 

Now, I simulated a failure of the Proxmox host by disconnecting the network connection. The pings to the VM start timing out.

Virtual machine going down after a failed proxmox host
Virtual machine going down after a failed proxmox host

The host is now taking down the VM resource.

Proxmox ha viewing the server as dead
Proxmox ha viewing the server as dead

The HA process will restart the VM on a healthy host.

Virtual machine restarted on a healthy proxmox host
Virtual machine restarted on a healthy proxmox host

After just a couple of minutes, the VM restarts and starts pinging on a different host.

The proxmox ha virtual machine configuration has brought the vm back up
The proxmox ha virtual machine configuration has brought the vm back up

Best Practices for Cluster Health

  1. Regular Updates and Backups: Keeping your Proxmox servers and VMs up-to-date is critical. High availability is not a replacement for VM and container backup. Always protect your data with something like Proxmox Backup Server.
  2. Monitoring Tools and Techniques: Proxmox has several tools for monitoring the health and performance of your cluster. Keep a check on your cluster health. Make sure monitor your nodes from the GUI and ensure things like shared storage are in a healthy state.
  3. Handling Node Failures: Even with HA, node failures can happen. We’ll cover the steps to recover from a failed node and how to ensure minimal impact on your virtual machines.
  4. Documentation: Be sure to document the configuration of the cluster, including IPs, storage configuration, etc.

Rebooting Proxmox Servers running HA 

If you want to reboot a Proxmox server for maintenance or other that is part of an HA cluster, you need to stop the following service on the node, either from the command line or GUI:

/etc/init.d/rgmanager stop

 

Frequently Asked Questions About Proxmox HA Configuration

Can I configure HA with just two Proxmox nodes?

Yes, it’s possible to configure HA with two nodes, but it’s not ideal due to the potential risk of split-brain scenarios. For optimal redundancy and reliability, a minimum of three nodes is recommended.

 
How does Proxmox handle VM migration in HA setups?

Proxmox automatically migrates VMs from a failed node to a functioning one within the cluster. This process is managed by the HA manager, which monitors node and VM states to initiate automatic failover.

 
What are the key considerations for Proxmox HA network configuration?

Key considerations include having a redundant network setup, ensuring reliable IP address allocation, and configuring a separate network for cluster communication to prevent data traffic interference.

 
Can I use local storage for Proxmox HA?

Local storage can be used, but it doesn’t support live migration of VMs in case of node failure. Shared storage solutions like Ceph or NFS from a NAS as an option are preferred for true HA capabilities and settings.

 
What happens if the HA manager fails?

Proxmox’s HA manager is designed with redundancy. If the primary manager fails and a change, another node in the cluster takes over its duties and goes into action, ensuring continuous monitoring and management of the HA setup.

 
How do I update VMs in a Proxmox HA cluster without downtime?

Use live migration to move VMs to another node in order to update the original node software. This ensures that your VMs remain operational during updates, minimizing downtime.

 
What is the role of a quorum in a Proxmox cluster?

A quorum is used to ensure that decisions (like VM failover) are made reliably in the cluster. It prevents split-brain scenarios by requiring a majority of nodes to agree on the cluster state.

Wrapping Up

Proxmox virtualization has some great features, including high-availability configuration for virtual machines. In this article, we have considered the configuration of a high-availability Proxmox VE cluster and then configuring high availability for VMs. In the comments, let me know if you are running a Proxmox VE cluster, what type of storage you are using, and any other details you would like to share.

Post a Comment

0 Comments