Proxmox Best Practice Part 5 – Performance optimisation: Get the last percent out!

Proxmox Performance Optimizations – A few helpful tips for you

CPU optimizations for virtual machines to correctly configure the host CPU type for VMs

When you create a VM, Proxmox uses a generic CPU type by default. This is compatible, but not optimal for performance. With the CPU type ‘host’, you can pass all the functions of your physical CPU to the VM:

qm set 100 --cpu host

This setting ensures that the VM can use all the CPU features of the host system, from dedicated instruction sets to hardware acceleration. This brings significant performance advantages, but makes the VM less portable between different hardware systems.

Alternatively, you can choose a specific CPU type and add specific features:

qm set 100 --cpu "Haswell,+aes"

Here, the Haswell CPU type with AES encryption is used. This is useful when you need a balance between performance and compatibility.

NUMA Awareness for Better Memory Performance

For modern multi-socket systems, NUMA (Non-Uniform Memory Access) is an important performance factor. You can first view the NUMA topology of your system:

numactl --hardware

For VMs with high performance requirements, you should enable NUMA:

qm set 100 --numa 1 qm set 100 --memory 8192 --sockets 2 --cores 4

This configuration divides the VM resources according to the physical NUMA nodes. This reduces memory latency and improves overall performance, especially for memory-intensive applications.

Understand and apply memory optimizations

Configure ballooning intelligently

Memory ballooning is a clever feature that dynamically redistributes RAM between host and VMs. By default, it is enabled:

qm set 100 --balloon 4096 # Minimum RAM

This value determines how much RAM the VM retains at least, even if the balloon system is active. For most applications, this works well and saves RAM.

However, for performance-critical VMs, you should disable ballooning:

qm set 100 --balloon 0

Without ballooning, the VM has constant access to all its RAM, providing more consistent performance – especially important for databases or real-time applications.

Huge pages for storage-intensive workloads

Huge Pages reduce memory management overhead for VMs with a lot of RAM. First you have to activate it on the host:

echo 'vm.nr_hugepages=1024' >> /etc/sysctl.conf sysctl -p

The number depends on your available RAM – each Huge Page is usually 2MB in size. Then you activate them for the VM:

qm set 100 --hugepages 1g

This is especially beneficial for VMs with 8GB+ RAM, as it significantly improves memory performance.

Optimizing Storage I/O Performance

I/O scheduler for different memory types

The I/O scheduler decides how to organize disk access. For SSDs is mq-deadline optimal:

echo mq-deadline > /sys/block/sda/queue/scheduler

SSDs have no mechanical parts, so a simple scheduler is best. For HDDs, however, bfq (Budget Fair Queueing) better:

echo bfq > /sys/block/sdb/queue/scheduler

BFQ considers the mechanical properties of HDDs and optimizes accordingly. To make these settings permanent, create a udev rule:

cat > /etc/udev/rules.d/60-scheduler.rules << 'EOF' # SSD Scheduler ACTION=="add changedchange", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="mq-deadline" # HDD Scheduler ACTION=="add changedchange", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="bfq" EOF

Understanding Disk Cache Modes

The cache modes determine how write operations are handled:

  • writethrough: Each write process is immediately written to the hard disk. This is very safe, but also the slowest, as it is waiting for the mechanical confirmation.
  • writeback: Write operations are first buffered in RAM and later written to the hard disk. This is much faster, but carries the risk of data loss in the event of a sudden power failure.
  • none: Disables any caching. This is ideal for shared storage systems such as NFS or Ceph, where the storage system itself takes over the caching.

Apply VM-specific I/O optimizations

I/O threads outsource disk operations to separate threads:

qm set 100 --scsi0 local-lvm:vm-100-disk-0,iothread=1

This reduces CPU load on the main thread of the VM and improves I/O performance.

For maximum performance, you can activate the writeback cache:

qm set 100 --scsi0 local-lvm:vm-100-disk-0,cache=writeback

For SSDs, you should also enable TRIM support and SSD optimizations:

qm set 100 --scsi0 local-lvm:vm-100-disk-0,discard=on,ssd=1

This discard=on Enables TRIM commands that help the SSD manage deleted areas. This ssd=1 Flag tells the VM that it is an SSD, which activates internal optimizations.

You should implement these optimizations step by step while monitoring performance.

Not every optimization fits every workload, so test in a development environment before adapting your production VMs.


Network performance

VirtIO optimizations:

# Activate multi-queue sqm set 100 --net0 virtio,bridge=vmbr0,queues=4 # SR-IOV for Dedicated Hardware qm set 100 --hostpci0 0000:01:00.0,pcie=1

Monitoring and troubleshooting

Important log files

# Proxmox-Logs tail -f /var/log/daemon.log # General Proxmox logs tail -f /var/log/pve-firewall.log # Firewall logs tail -f /var/log/pveproxy/access.log # Web interface accesses # VM-specific logs tail -f /var/log/qemu-server/100.log # VM 100 logs # Cluster logs tail -f /var/log/corosync/corosync.log tail -f /var/log/pve-cluster/pmxcfs.log

Performance monitoring (Disk health, CEPH Monitoring, Notifications)

CLI tools:

# CPU and Memory Usage htop # Disk I/O iotop -ao # Network Traffic nethogs # Process Monitoring ps aux --sort=-%cpu  ⁇  head -20

Retrieve RRD graphs via API:

# CPU usage for Node curl -k -H "Authorization: PVEAPIToken=root@pam!monitoring=SECRET" \ "https://proxmox:8006/api2/json/nodes/proxmox1/rrddata?timeframe=hour&cf=AVERAGE"

Common problems and solutions

Problem: ‘TASK ERROR: command “lvcreate” failed’

# LVM-Thin Pool Full - Make Space lvs --all # Show pools lvextend -L +50G /dev/pve/data # Expanding the pool

Problem: VM does not start – ‘kvm: could not insert module’

# KVM modules load modprobe kvm modprobe kvm-intel # or kvm-amd # Permanently activate echo kvm >> /etc/modules echo kvm-intel >> /etc/modules # or kvm-amd

Problem: High I/O-Wait at VMs

# Check I/O statistics iostat -x 1 # VM-specific I/O limits set qm set 100 --scsi0 local-lvm:vm-100-disk-0,mbps_rd=100,mbps_wr=50

Extended scenarios

High availability cluster

3-node cluster setup:

# On Node 1 pvecm create mycluster # On Node 2 and 3 pvecm add 192.168.1.10 # IP of Node 1 # Check cluster status pvecm status

Configuring Fencing (important for split-brain avoidance):

# Activate watchdog timer echo softdog >> /etc/modules update-initramfs -u # Configure fencing device (e.g. IPMI) ha-manager add fence-device ipmi --options "lanplus=1,username=admin,password=secret,ip=192.168.1.100"

GPU passthrough for VMs

Activate IOMMU:

# /etc/default/grub GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt" # Intel # or GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt" # AMD update-grub reboot

Pass GPU to VM:

# View PCI devices lspci -nn  ⁇  grep -i nvidia # Disconnect GPU from host driver echo "0000:01:00.0" > /sys/bus/pci/devices/0000:01:00.0/driver/unbind # At VM pass sqm set 100 --hostpci0 0000:01:00.0,pcie=1,x-vga=1

Proxmox Advanced features

Cloud init: VM creation

What is Cloud-Init And why should you use it?

Cloud-Init is the de facto standard package for initializing VM instances. Think of it like an intelligent initial setup wizard that automatically configures your VMs. Instead of manually copying SSH keys, configuring the network, and installing software after each VM creation, Cloud-Init does it all on the first boot.

This not only saves you time, but also makes your VM deployments reproducible and error-resistant. Cloud-Init enables dynamic deployment of instances without manual intervention.

Creating a Cloud-Init Template - The Right Way

Step 1: Choosing the Right Cloud Image

Not all cloud images are the same. Here are some examples of downloads:

# Debian 12 (Bookworm) - Stable and long-term support wget https://cloud.debian.org/images/cloud/bookworm/latest/debian-12-generic-amd64.qcow2 # Debian 13 (Trixie) - Testing, for latest features wget https://cloud.debian.org/images/cloud/trixie/latest/debian-13-generic-amd64.qcow2 # Ubuntu 24.04 LTS (Noble Numbat) wget https://cloud-images.ubuntu.com/noble/current/noble-server-cloudimg-amd64.img # Ubuntu 25.04 (Plucky Puffin) - Minimal Installation wget https://cloud-images.ubuntu.com/minimal/releases/plucky/release/ubuntu-25.04-minimal-cloudimg-amd64.img

Pro tip: The minimal images are much smaller and contain only the absolutely necessary packages. Perfect for container-like deployments.

Step 2: Create and configure VM correctly

# Create VM with optimal settings qm create 9000 --memory 2048 --cores 2 --name ubuntu-cloud-template \ --net0 virtio,bridge=vmbr0 --agent enabled=1 # VirtIO SCSI Controller - ESSENTIAL for modern Linux distributions! qm set 9000 --scsihw virtio-scsi-pci # Import cloud image (adjust path!) qm set 9000 --scsi0 local-lvm:0,import-from=/path/to/noble-server-cloudimg-amd64.img # Add cloud init drive qm set 9000 --ide2 local-lvm:cloudinit # Set boat order qm set 9000 --boot order=scsi0 # Enable serial console (important for Cloud Images!) qm set 9000 --serial0 socket --vga serial0 # Enlarge hard disk (Cloud images are often only 2GB) qm disk resize 9000 scsi0 +8G # Mark as template qm template 9000

Why these specific settings?

  • VirtIO SCSI: Modern cloud images expect this controller
  • Serial console: Cloud images often use the serial console instead of VGA
  • QEMU Agent: Enables better integration between host and VM
  • Disk resize: Cloud images are deliberately kept small

Deploy VMs from Template Intelligently

Basic deployment

# Template clone with meaningful name qm clone 9000 201 --name webserver-prod-01 --full # Cloud-Init basic configuration qm set 201 --sshkey ~/.ssh/id_rsa.pub qm set 201 --ipconfig0 ip=10.0.10.201/24,gw=10.0.10.1 qm set 201 --nameserver 1.1.1.1 qm set 201 --searchdomain example.com qm set 201 --ciuser admin qm set 201 --cipassword $(openssl passwd -6 "SuperSafe password123!") # VM start qm start 201

Advanced configuration with custom user data

Here it becomes really powerful! Creates a custom cloud config:

# /var/lib/vz/snippets/webserver-config.yaml #cloud-config locale: en_EN.UTF-8 timezone: Europe/Berlin # Install packages: - nginx - git - htop - curl - wget - unzip - vim - ufw # Configure users users: - name: admin groups: [adm, sudo] sudo: ALL=(ALL) NOPASSWD:ALL shell: /bin/bash ssh_authorized_keys: - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAB... # Your SSH key here # Configure and start services runcmd: - systemctl enable nginx - systemctl start nginx - ufw allow ssh - ufw allow 'Nginx Full' - ufw --force enable - sed -i 's/PasswordAuthentication yes/PasswordAuthentication no/' /etc/ssh/sshd_config - systemctl reload sshd # Nginx default page replace write_files: - path: /var/www/html/index.html content: |
      <!DOCTYPE html>
      <html>
      <head><title>Web server ready</title></head>
      <body>
        <h1>The server is ready!</h1>
        <p>Deployed on: $(date)</p>
      </body>
      </html>

# Restart system after setup power_state: Delay: 1 mode: reboot message: "Cloud-Init Setup completed, restart..."

Use template with Custom Config:

qm set 201 --cicustom "user=local:snippets/webserver-config.yaml"

Advanced cloud init features

Vendor data for special configurations

# /var/lib/vz/snippets/vendor-config.yaml #cloud-config # Advanced network configuration network: Version: 2 ethernets: ens18: dhcp4: false addresses: - 10.0.10.201/24 gateway4: 10.0.10.1 nameservers: addresses: [1.1.1.1, 8.8.8.8] search: [example.com]
qm set 201 --cicustom "user=local:snippets/webserver-config.yaml,vendor=local:snippets/vendor-config.yaml"

Metadata for VM-specific information

# Meta-Data can also be set via API qm set 201 --tags "production,webserver,nginx"

Optimize network performance

VirtIO Multi-Queue for better throughput

# Activate multi-queue (number = CPU cores) qm set 100 --net0 virtio,bridge=vmbr0,queues=4 # For very high loads: Optimize packet processing qm set 100 --net0 virtio,bridge=vmbr0,queues=8,mtu=9000

What does that mean? Multi-queue distributes network interrupts across multiple CPU cores. Single queue uses only one core for network I/O.

SR-IOV for dedicated hardware performance

# Pass PCI device directly (highest performance) qm set 100 --hostpci0 0000:01:00.0,pcie=1 # With ROM-BAR for better compatibility qm set 100 --hostpci0 0000:01:00.0,pcie=1,rombar=1

When to use SR-IOV? For high-performance applications such as firewalls, load balancers or if you need native NIC features.

Master monitoring and troubleshooting

Systematically monitor important log files

# Proxmox core logs tail -f /var/log/daemon.log # General System Events tail -f /var/log/pve-firewall.log # Firewall activities tail -f /var/log/pveproxy/access.log # Web interface accesses # VM-specific logs (customize VM ID) tail -f /var/log/qemu-server/100.log # VM 100 QEMU logs # Cluster-specific logs tail -f /var/log/corosync/corosync.log # Cluster communication tail -f /var/log/pve-cluster/pmxcfs.log # Cluster file system

Performance monitoring like the professionals

CLI tools for live monitoring

# System overview with htop htop # Disk I/O in detail iotop -ao # Shows cumulative I/O statistics # Network traffic by process nethogs # Top CPU consumer ps aux --sort=-%cpu  ⁇  head -20 # Memory-Usage detailed free -h && echo "---" && cat /proc/meminfo  ⁇  grep -E "(MemTotal ⁇ MemFree ⁇ MemAvailable ⁇ Cached ⁇ Buffers)" # Test storage performance dd if=/dev/zero of=/tmp/testfile bs=1G count=1 oflag=dsync

Retrieve RRD data via API (for own dashboards)

# CPU usage for a node curl -k -H "Authorization: PVEAPIToken=root@pam!monitoring=euer-secret-here" \ "https://proxmox:8006/api2/json/nodes/proxmox1/rrddata?timeframe=hour&cf=AVERAGE" # VM-specific metrics curl -k -H "Authorization: PVEAPIToken=root@pam!monitoring=euer-secret-here" \ "https://proxmox:8006/api2/json/nodes/proxmox1/qemu/100/rrddata?timeframe=day" # Storage metrics curl -k -H "Authorization: PVEAPIToken=root@pam!monitoring=euer-secret-here" \ "https://proxmox:8006/api2/json/nodes/proxmox1/storage/local-lvm/rrddata"

Solving common problems – tried-and-tested solutions

Problem: ‘TASK ERROR: command “lvcreate” failed’

# Check storage status df -h lvs --all vgs # LVM-Thin Pool extend lvextend -L +50G /dev/pve/data # or percentage lvextend -l +100%FREE /dev/pve/data # When the pool is full: # First stop VMs and delete snapshots qm list qm stop VMID qm delsnapshot VMID snapshot-name

Problem: VM does not start - KVM modules are missing

# Check virtualization support egrep -c '(vmx ⁇ svm)' /proc/cpuinfo # Should be > 0 # KVM modules manually load modprobe kvm modprobe kvm-intel # Intel CPUs # or modprobe kvm-amd # AMD CPUs # Permanently activate echo "kvm" >> /etc/modules echo "kvm-intel" >> /etc/modules # or kvm-amd # Check lsmod  ⁇  grep kvm

Problem: High I/O-Wait beats performance

# Analyze I/O statistics in detail iostat -x 1 5 # Monitor for 5 seconds # VM-specific I/O limits set qm set 100 --scsi0 local-lvm:vm-100-disk-0,mbps_rd=100,mbps_wr=50,iops_rd=1000,iops_wr=500 # I/O-Nice for single VMs qm set 100 --scsi0 local-lvm:vm-100-disk-0,iothread=1

Problem: Out-of-Memory Kills (OOM)

# Check memory overcommitment grep -i oom /var/log/kern.log # VM memory balancing customize sqm set 100 --balloon 0 # Balloon deactivate qm set 100 --shares 2000 # Higher CPU priority # Optimize host memory echo 1 > /proc/sys/vm/overcommit_memory # Aggressive overcommit

Build high-availability clusters

3-Node Cluster Setup - Production Ready

# Node 1 (initialize cluster) pvecm create production-cluster --bindnet0_addr 192.168.1.10 --ring0_addr 192.168.1.10 # Node 2 join pvecm add 192.168.1.10 --ring0_addr 192.168.1.11 # Node 3 join pvecm add 192.168.1.10 --ring0_addr 192.168.1.12 # Validate cluster status pvecm status pvecm nodes

Configure fencing for split-brain protection

# Enable Hardware Watchdog echo "softdog" >> /etc/modules update-initramfs -u # Configure IPMI fencing (recommended for production) ha-manager add fencing-device ipmi-node1 \ --options "lanplus=1,username=admin,password=secret,ip=192.168.100.10" ha-manager add fencing-device ipmi-node2 \ --options "lanplus=1,username=admin,password=secret,ip=192.168.100.11" ha-manager add fencing-device ipmi-node3 \ --options "lanplus=1,username=admin,password=secret,ip=192.168.100.12" # Configure HA services ha-manager add vm:100 --state started --node node1 --max_restart 2

Setting up shared storage for HA

# Ceph cluster for internal storage ceph-deploy new node1 node2 node3 ceph-deploy install node1 node2 node3 ceph-deploy mon create-initial # Or external NFS/iSCSI storage pvesm add nfs shared-nfs --server 192.168.1.200 --export /storage/proxmox \ --content images,vztmpl,backup # Configure storage replication pvesr create-local-job 100-0 node2 --schedule "*/15"

GPU passthrough for power users

Enable IOMMU correctly

# Edit GRUB config vim /etc/default/grub # For Intel CPUs: GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction" # For AMD CPUs: GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt pcie_acs_override=downstream,multifunction" # GRUB update and reboot update-grub reboot

GPU blacklisting and VM assignment

# Find GPU PCI IDs lspci -nn  ⁇  grep -i nvidia # Example output: 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104 [GeForce RTX 2080] [10de:1e82] # Host driver blacklisten echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf echo "blacklist nvidia*" >> /etc/modprobe.d/blacklist.conf # VFIO modules load echo "vfio" >> /etc/modules echo "vfio_iommu_type1" >> /etc/modules echo "vfio_pci" >> /etc/modules echo "vfio_virqfd" >> /etc/modules # GPU to VFIO bind echo "options vfio-pci ids=10de:1e82,10de:10f8" > /etc/modprobe.d/vfio.conf update-initramfs -u reboot # GPU to VM pass sqm set 100 --hostpci0 0000:01:00.0,pcie=1,x-vga=1 # For multi-GPU: Both PCI functions qm set 100 --hostpci0 0000:01:00.0,pcie=1,x-vga=1 qm set 100 --hostpci1 0000:01:00.1,pcie=1 # Audio part of the GPU

Fix GPU passthrough issues

# Check IOMMU groups for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*} printf 'IOMMU Group %s ' "$n" lspci -nns "${d##*/}" done # GPU reset bug workarounds echo "options vfio-pci disable_vga=1" >> /etc/modprobe.d/vfio.conf # Vendor reset for AMD GPUs git clone https://github.com/gnif/vendor-reset.git cd vendor-reset make && make install echo "vendor-reset" >> /etc/modules

Backup strategies for professionals

Automated backup jobs

# Daily backup of all VMs pvesh create /cluster/backup --schedule "02:00" --mode snapshot \ --compress lzo --node proxmox1 --storage backup-nfs --all 1 \ --mailto admin@example.com # Incremental backups for large VMs pvesh create /cluster/backup --schedule "06:00" --mode snapshot \ --compress zstd --node proxmox1 --storage backup-nfs \ --vmid 100,101,102 --bwlimit 50000

External backup replication

# Set up PBS (Proxmox Backup Server) pvesh create /cluster/storage --storage pbs-backup --type pbs \ --server backup.example.com --datastore proxmox-backups \ --username backup@pbs --password secret --fingerprint XX:XX:XX... # Configure backup retention pvesh set /cluster/backup/backup-job-id --prune-backups "keep-daily=7,keep-weekly=4,keep-monthly=6"

This smorgasbord covers some important aspects you need for professional and fast Proxmox deployments. From cloud-init automation to high-availability GPU clusters, you'll find tried-and-tested solutions to real-world challenges.

Completion and advanced resources

Proxmox is a powerful tool, but with great power comes great responsibility. (winker) The best practices shown here are the result of experience in practice. Start with the basics and gradually work your way up to the advanced features.

Your next steps:

  1. Build a testing/staging environment: Test all configurations in a separate environment
  2. Implement monitoring: Monitor your system from the beginning
  3. Test backup strategy: Performs regular restore tests
  4. Join the Community: The Proxmox forum is very helpful

So remember: Take your time, the basics Understand before you More complex setups fades over. The Proxmox admin guide As a website I have linked several times in the article as a reference is also worth gold. Take a look in the forum around, If you have a question. There is also an entry point for YouTube channel. For those of you who are in the enterprise environment: The makers of Proxmox also offer training courses.

The remaining parts of this article series I have also linked here again for you: Part 1: network | Part 2: storage | Part 3: backup | Part 4: security | Part 5: performance

And the most important thing again at the end:
Always have a working backup.