Installing AWX on AlmaLinux 9

I ran into some issues installing AWX on AlmaLinux 9 on Proxmox (I had the same issues with Alma 8.7). This also applies to RockyLinux 9.

I was installing AWX via Rancher following https://github.com/ansible/awx-operator#basic-install. I made it all the way to the section where you create the awx-demo.yaml, add it to your kustomization.yaml and build via kustomize build . | kubectl apply -f -. From there I was receiving errors such as “unable to determine if virtual resource”,”gvk”:”apps/v1″ and the build would ultimately fail out.

In order to make it past that error I found a found a few posts which suggested changing the CPU type from “Default (kvm64)” to Host. This sets the VM to match the CPU of the host.

***If you are running HyperV, there is a similar option, see the final post in this Google Group conversation: https://groups.google.com/g/awx-project/c/4tmP0TlRODU.***

After resetting the CPU type, rebooting the vm and re-running the kustomize build, I was able to make it quite a bit further. The logs looked like there were no issues, then towards the end the script once again failed. This time I was seeing the following error: “awx unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1:”. The Pod itself was also down with a CrashLoopBackOff error. From there I found the following link which was able to get me past all of my installation issues: https://stackoverflow.com/questions/62442679/could-not-get-apiversions-from-kubernetes-unable-to-retrieve-the-complete-list

I ran: kubectl api-resources which listed the resources and metrics.k8s.io/v1beta1 was in fact down.

Next I ran: kubectl delete apiservice/v1beta1.metrics.k8s.io

From there I re-ran the kustomize build command and awx installation completed successfully after the installation. I did have to open the firewall ports in Alma to allow my browser to access AWX.

Steps to Install AWX:

#Install Rancher
curl -sfL https://get.k3s.io | sh -

#Install Kustomize
curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh"  | bash

#Move Kustomize binary
mv kustomize /usr/local/bin/

#Goto AWX Readme and follow along from there:
# https://github.com/ansible/awx-operator#basic-install

Feel free to contact me if you have any comments or questions

Veeam Backup Failing (VSS_WS_FAILED_AT_PREPARE_SNAPSHOT) (Resolved)

Veeam Backup Failing (‘VSS_WS_FAILED_AT_PREPARE_SNAPSHOT’)

I had a Veeam backup job that was failing with: Retrying snapshot creation attempt (Writer ‘Microsoft Hyper-V VSS Writer’ is failed at ‘VSS_WS_FAILED_AT_PREPARE_SNAPSHOT’. The writer experienced a non-transient error. If the backup process is retried, the error is likely to reoccur. –tr:Failed to verify writers state. –tr:Failed to perform pre-backup tasks.)

Researching this error online was telling me the issue was on the host, but I wasn’t believing that as all of my other vm’s were backing up without issue daily.

To play it safe I checked the host by running: vssadmin list writers

I received the following error on the host:

microsoft hyper-v vss writer non-retryable error

Looking further on the host’s event logs for the error I saw this:

At this point I was still convinced the host wasn’t at fault due to the fact all other vm’s still backed up fine, so I logged onto the vm in question and ran: vssadmin list writers

I received the following on the vm:

sqlserverwriter non-retryable error

Looking into the event viewer I saw:

Researching these errors online I found several solution saying to delete the old backup software. This server used to use another backup solution prior to Veeam called Altaro, which I was pretty sure I had removed a long time ago. I checked add/remove programs and verified Altaro wasn’t listed. I even checked the vss writers for any other backup software listed and found nothing. Running out of ideas, I checked Windows backup to make sure it wasn’t running and no backup jobs were listed. I then looked into Task Scheduler and found a few manual backup jobs listed. I disabled and deleted these jobs. I then restarted the SQL VSS writer service, restarted SQL VSS service, verified it showed no errors after re-running vssasdmin list writers. I then retried the Veeam backup again and it failed out once again.

Re-running vss list writer I received the same error. I was now convinced this was tied to the old task scheduled backups I had removed.

Next, I tried: vssadmin delete shadows /all

After running that command, I received:

Error: Snapshots were found, but they were outside of your allowed context.  Try
 removing them with the backup application which created them.

After much more research, I found an outside the box way of deleting the snapshots from another site.

How to Fix “outside of your allowed context” Errors

In order to get rid of these kinds of shadows we need to apply a “trick”. Basically the VSS diff area storage is where VSS keeps these shadows “alive”.

By seriously cutting this limit to the bare minimum we invoke a mechanism in VSS itself that causes it to dump all shadows.

So we proceed by telling VSS to cut the limit down to 401 MB. For some reason the user interface will claim the bottom is 300MB but on several versions of Windows it refuses and reports:

Error: Specified number is invalid

The command that works uses 401MB and is (adapt it to your drive letter as needed):

vssadmin resize shadowstorage /for=D: /on=D: /maxsize=401MB  

*****I ran this against the C: and D: drive of my VM*****

Then once you get “success” you can increase the limit once again to the recommended “unbounded” setting, or an actual limit value if you are using shadow copies for other purposes:

vssadmin resize shadowstorage /for=d: /on=D: /maxsize=unbounded

*****I ran this against the C: and D: drive of my VM*****

Then, vssadmin happily reports:

Successfully resized the shadow copy storage association

and a quick check using

vssadmin list shadows

reveals all VSS shadow copies are now gone!

I then re-ran Veeam the Veeam backup job against the VM and it ran successfully!

STOP 0x0000007B Resolved on P2V’d Windows SBS 2011

***The following was on a Hyper-V vm, but this also applies to VMware.***

****This should work on most versions of Windows (doesn’t have to be SBS)****

The other week we picked up a new client with an emergency issue. They had an SBS 2011 Server on failing hardware. The hardware was so bad that we didn’t think it would last until the replacement server would arrive. We had an older Server that had enough power to handle their server virtualized until their new hardware arrived. So I started the virtualization process. This is where the fun began. (There were several issues minor issues, but I’ll stick to the major problem here.)

After creating the vm without any disk drives, I attached the newly created drives and powered up the vm and was greeted by the BSOD: STOP 0x0000007B.

Luckily there is an easy fix for this and  you don’t need restart the p2v.

  • Boot the vm off any Windows CD/DVD (Windows 7 & up. Doesn’t have to be the same OS as vm. You could also mount the drive on the host or another vm. If you mount the drive, just run regedit)
  • After booting off OS cd, when you encounter the language selection, hit Shift-F10 for a command prompt
  • At the command prompt, run regedit
  •  In regedit, highlight Hkey_Local_Machine
  • With Hkey_Local_Machine highlighted, goto File, and Load Hive
  • In Load Hive, select the drive letter where Windows OS was installed (C: in this case), then go to: Windows\System32\config\system
  • Name the Hive whatever you want (IE: recovery)
  • Expand HKEY_LOCAL_MACHINE\recovery\ControlSet1\Services\intelide
  • Change the data for value “Start” from “3” to “0”
  • Now goto File and “Unload Hive” (If you run into issues make sure Hkey_Local_Machine is highlighted)
  • Exit regedit and reboot the machine and you’re good to go

If you still have issues after reboot, check the following keys and set them to:

Aliide = 3
Amdide =3
Atapi = 0
Cmdide = 3
iaStorV = 3
intelide = 0
msahci = 3
pciide = 3
viaide = 3