Hypervisors: Major Upgrades (Debian Bookworm, QEMU v8 & Linux 6.2)

Resolved

The minor and major upgrades to the final compute node have been successfully applied.

It paid off being cautious, investigating the fault, uncovering the bug, and helping the hypervisor developer.

Please let us know if you’re seeing any issues, however testing came back looking stable.

Updated

We’ll be upgrading the final node tonight, thank you for your patience.

It seems the fix is stable and effective, so it should be uneventful now.

The 3-year-old re-surfacing bug should be gone for good very shortly.

Ended
Updated

G’day,

We’ve completed the major upgrade works on all machines except for the one which hit the bug.
For that machine, it will either be upgraded tomorrow or next weekend, depending on a few things.

Most clients are now through all impact. The minor and major updates went well. VM kernels also done.

We’re grateful for your patience throughout these works. If you’ve any questions or concerns, do reach out.

Cheers,
LEOPARD.host

Updated

G’day,

Just a quick update that impact to the following service types is now successfully completed:

  1. Internet Radio
  2. Corporate Assets
  3. Legacy Web Hosting

However, for our Fully Managed clients, we hit an LVM Metadata parsing[1] bug[2] in the grub bootloader, and while there is a simple fix (regenerate the metadata, then reinstall grub to boot partition etc) we’re endeavouring to assist the hypervisor developer with understanding causation. We have only seen this bug on 1 machine of many, so regardless of their responsiveness, we will work through all other non-impacted machines this evening, as they completed minor patching OK and are just pending a reboot before we can do the major upgrades.

It is likely that they won’t seek further info, in which case hopefully the impacted node can be updated again (major update, following minor patching update) along with the other machines. Patching on-VM is done already for Fully Managed clients, however power cycling is synced with hypervisors, so kernel versions aren’t effective as yet due to the 2x reboots per-node not yet being applied to the machines (in the impacted category).

Please let us know if you’ve any queries or concerns before works continue this evening, thank you.

Cheers,
LEOPARD.host

[1] grub apparently fails to parse LVM metadata correctly if there is a wraparound in the metadata ring buffer.
[2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=987008 (fixed, though remains, or incredibly similar)

Began
Maintenance Planned

G’day,

We’ll be performing major upgrades to all production hypervisors, following successful testing. These works were booked for 2 weeks prior though were not completed so have been re-scheduled.

This will be conducted remotely following the decommissioning of some hardware which required manual intervention during specific update sequences (some months ago). If assistance is required, we have smart-hands on-site 24/7/365 who can assist, as well as TNC specialists who can travel to site and assist smart-hands.

We have already completed patching in the same sub-major, however have not rebooted nodes for that. With these works, 1 node at a time, guests will be powered off softly, then rebooted, then updated to the new major, then rebooted again, and then once testing is successful, guests will be started up and return to normal service.

Please let us know if you’ve any queries or concerns about these works.

Cheers,
LEOPARD.host

Debian 12 Bookworm: https://www.debian.org/releases/bookworm/
QEMU (KVM) 8 info: https://www.qemu.org/2023/04/20/qemu-8-0-0/
Linux 6.2: https://lkml.iu.edu/hypermail/linux/kernel/2302.2/03207.html

8 Affected Services:
The Network Crew Pty Ltd (TNC)

« Merlot Digital website

Network: AS138521