Using AWX Container groups for Kerberos authentication of playbooks/templates running against Windows servers/hosts

I have been porting some of my Ansible playbooks for Windows over to AWX and while they worked in my home lab, they didn’t cooperate when I moved them over to my work environment. This is because initially I was testing on stand-alone windows servers and clients in my home lab. In my office environment we obviously use a Windows AD domain. In Ansible cli, I would just setup Kerberos authentication on my Ansible host. This is not as easy when dealing with AWX running on Kubernetes Pods.

In this situation I will use the stock “AWX EE (latest)” Execution Environment, but with that you will need to configure AWX on how to access your Kerberos server (AD server). We will need to configure a Container Group that will be linked to the Ansible Execution Environment which lets Ansible know about your Kerberos environment. If you haven’t already configured your Windows hosts for connections via WinRM, you can read the following documentation. My environment was already setup for this since I have already been controlling/automating my Windows servers via Ansible cli.

To prepare Kubernetes for this container group, you will need to create a config map that will handle your Kerberos authentication. In your favorite editor (mines vi), create a file in your home directory or “/tmp” called krb5.conf. In my example below I have two domains listed because my AWX host works on two domains.

[libdefaults]
 default_realm = CONTOSO.COM

[realms]
 CONTOSO.COM = {
  kdc = DC2.CONTOSO.COM
 }
 STUFF.COM = {
  kdc = DOUBLE.STUFF.COM
}

[domain_realm]
.contoso.com = CONTOSO.COM
contoso.com = CONTOSO.COM
.stuff.com = STUFF.COM
stuff.com = STUFF.COM

Now we can map this file with Kubernetes by doing the following:

kubectl -n awx create configmap awx-kerberos-config --from-file=krb5.conf

Now your krb5.conf is mapped in Kubernetes, you will want to ensure it has been created by running the following:

kubectl -n awx get configmap awx-kerberos-config -o yaml

You should see output in yaml format that shows your krb.conf. Now in AWX, on the left column, click on “Instance Groups” under the Administration section:

In the “Instance Groups” menu, click “Add”, then “Add Container group”

In the new Container group menu, you can name it what you want, In my case I am naming it: Kerberos. The only other thing you will need to do is make sure you check: “Customize pod specification”

Now you will want to edit the “Custom pod spec” YAML, mine looks like:

apiVersion: v1
kind: Pod
metadata:
  namespace: awx
spec:
  serviceAccountName: default
  automountServiceAccountToken: false
  containers:
    - image: 'quay.io/ansible/awx-ee:latest'
      name: worker
      args:
        - ansible-runner
        - worker
        - '--private-data-dir=/runner'
      resources:
        requests:
          cpu: 250m
          memory: 100Mi
      volumeMounts:
        - name: awx-kerberos-volume
          mountPath: /etc/krb5.conf
          subPath: krb5.conf
  volumes:
    - name: awx-kerberos-volume
      configMap:
        name: awx-kerberos-config

Make sure you save when your done. Now we will need to link this Container group to your template (same as a playbook in Ansible cli). To link the Container group, edit your template/playbook and towards the bottom of the page, you will see “Instance Groups”, from there you will select you Container group.

Now you should be able to run your windows based playbooks/templates in AWX. For me my issue was not solved there. I had some extra trouble shooting that I had to do which turned out to be Kubernetes k3s DNS issues that I will talk about in my next post. If you need assistance troubleshooting you can refer to the README located here. You can always contact me as well.

Installing AWX on AlmaLinux 9

I ran into some issues installing AWX on AlmaLinux 9 on Proxmox (I had the same issues with Alma 8.7). This also applies to RockyLinux 9.

I was installing AWX via Rancher following https://github.com/ansible/awx-operator#basic-install. I made it all the way to the section where you create the awx-demo.yaml, add it to your kustomization.yaml and build via kustomize build . | kubectl apply -f -. From there I was receiving errors such as “unable to determine if virtual resource”,”gvk”:”apps/v1″ and the build would ultimately fail out.

In order to make it past that error I found a found a few posts which suggested changing the CPU type from “Default (kvm64)” to Host. This sets the VM to match the CPU of the host.

***If you are running HyperV, there is a similar option, see the final post in this Google Group conversation: https://groups.google.com/g/awx-project/c/4tmP0TlRODU.***

After resetting the CPU type, rebooting the vm and re-running the kustomize build, I was able to make it quite a bit further. The logs looked like there were no issues, then towards the end the script once again failed. This time I was seeing the following error: “awx unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1:”. The Pod itself was also down with a CrashLoopBackOff error. From there I found the following link which was able to get me past all of my installation issues: https://stackoverflow.com/questions/62442679/could-not-get-apiversions-from-kubernetes-unable-to-retrieve-the-complete-list

I ran: kubectl api-resources which listed the resources and metrics.k8s.io/v1beta1 was in fact down.

Next I ran: kubectl delete apiservice/v1beta1.metrics.k8s.io

From there I re-ran the kustomize build command and awx installation completed successfully after the installation. I did have to open the firewall ports in Alma to allow my browser to access AWX.

Steps to Install AWX:

#Install Rancher
curl -sfL https://get.k3s.io | sh -

#Install Kustomize
curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh"  | bash

#Move Kustomize binary
mv kustomize /usr/local/bin/

#Goto AWX Readme and follow along from there:
# https://github.com/ansible/awx-operator#basic-install

Feel free to contact me if you have any comments or questions

Disabling Inactive Domain User and Computer Accounts in Active Directory with Ansible

In my last article I wrote about having Ansible run several audit requests including: “We need a list of all inactive user accounts” as well as “We need a list of inactive computer accounts”. Now that we have those listed, we can let Ansible clean those up. I preferred to create a new playbook for these tasks. First it will list the Users and Computers it will be handling first, next it will disable the account, followed by moving it to either the Inactive_Users or Inactive_Computers OU. I never delete the accounts as we prefer to disable, then move them.

Below is my ansible playbook “fix_AD_Inactive-Users-AND-Computers-90days.yml”

---
- hosts: pdc
  gather_facts: no
  tasks:
     - name: copy file to windows
       win_copy:
          src: files/fix_inactive_usr.ps1
          dest: c:\it\fix_inactive_usr.ps1

     - name: copy file to windows
       win_copy:
          src: files/fix_inactive_pc.ps1
          dest: c:\it\fix_inactive_pc.ps1

     - name: Fix inactive users - 90 days
       win_shell: c:\it\fix_inactive_usr.ps1
       register: inactive_usr

     - debug: var=inactive_usr.stdout_lines

     - name: Fix inactive computers - 90 days
       win_shell: c:\it\fix_inactive_pc.ps1
       register: inactive_computer

     - debug: var=inactive_computer.stdout_lines

Below is the code for “fix_inactive_usr.ps1”

$date = (get-date).AddDays(-90)

$USR = (Get-ADUser -Filter {LastLogonDate -lt $date} -Property Enabled | Where-Object {$_.Enabled -like "true"} | Select DistinguishedName).DistinguishedName
echo $USR
ForEach ($Item in $USR){
   Disable-ADAccount $Item
   Move-ADObject -Identity $Item -TargetPath "OU=Disabled_Accounts,DC=contoso,DC=com"
   }

Please note in the PowerShell scripts above and below, you will need to change “DC=contoso,DC=com” to reflect your actual domain

Below is the code for “fix_inactive_pc.ps1”

# Specify inactivity range value below
$DaysInactive = 90
# $time variable converts $DaysInactive to LastLogonTimeStamp property format for the -Filter switch to work

$time = (Get-Date).Adddays(-($DaysInactive))

# Identify inactive computer accounts

$PC = (Get-ADComputer -Filter {LastLogonTimeStamp -lt $time} -Property Enabled | Where-Object {$_.Enabled -like "true"} | Select DistinguishedName).DistinguishedName
echo $PC
ForEach ($Item in $PC){
   Disable-ADAccount $Item
   Move-ADObject -Identity $Item -TargetPath "OU=Disabled_Computers,DC=contoso,DC=com"
   }

Audit Active Directory with Ansible

Everyone loves an audit right? We have to deal with audits quite a bit and that requires remedial tasks like “We need a list of AD user accounts that have been locked out”, “We need a list of all inactive user accounts”, “We need a list of inactive computer accounts”, “We need a list of all members of Domain Admins group” as well as “We need a list of all AD accounts”. All of these requirements can easily be scripted with PowerShell. Since I love to automate things and I would rather not run these commands separately, I figured I would just create an Ansible script to run all request at the same time. that way I could logon once, select my Ansible playbook and let it run and I don’t even need to logon to the DC to run theses tasks. I can sit back and let Ansible deal with this.

This simple Ansible playbook uses 3 PowerShell commands and 2 PowerShell scripts that I’m sure most Windows Administrators are familiar with.

---
- hosts: pdc
  gather_facts: no
  tasks:
     - name: copy audit_AD_inactive_users.ps1 to Windows
       win_copy:
          src: files/audit_AD_inactive_users.ps1
          dest: c:\cit\audit_AD_inactive_users.ps1

     - name: copy audit_AD_inactive_computers.ps1 to Windows
       win_copy:
          src: files/audit_AD_inactive_computers.ps1
          dest: c:\cit\audit_AD_inactive_computers.ps1

     - name: Run Audit for Locked-Out Accounts
       win_shell: Search-AdAccount -LockedOut | select Name, LockedOut,LastLogonDate,distinguishedName
       register: lockedoutaccounts

     - debug: var=lockedoutaccounts.stdout_lines

     - name: Run Audit of inactive users - 90 days
       win_shell: c:\cit\audit_AD_inactive_users.ps1
       register: inactive_users

     - debug: var=inactive_users.stdout_lines

     - name: Run Audit of inactive computers - 90 days
       win_shell: c:\cit\audit_AD_inactive_computers.ps1
       register: inactive_computers

     - debug: var=inactive_computers.stdout_lines

     - name: Run Audit for members of Domain Admins group
       win_shell: Get-ADGroupMember -Identity 'Domain Admins' | Select-Object name, objectClass,distinguishedName
       register: dom_admin_users

     - debug: var=dom_admin_users.stdout_lines

     - name: Run Audit for all domain users
       win_shell: Get-ADUser -Filter * -SearchBase "dc=contoso,dc=com" | select Name, objectClass,distinguishedName
       register: all_dom_users

     - debug: var=all_dom_users.stdout_lines

Not bad right? Ansible Rocks! The only complaint I may see is I’m not outputting the results to a CSV file, but if you run this script often, you shouldn’t need the fancy format.

Below is the first PowerShell script “audit_AD_inactive_users.ps1”

$date = (get-date).AddDays(-90)

Get-ADUser -Filter {LastLogonDate -lt $date} -Property Enabled | Where-Object {$_.Enabled -like “true”} | Select Name, SamAccountName, DistinguishedName

Below is the second PowerShell script “audit_AD_inactive_computers.ps1”

# Specify inactivity range value below
$DaysInactive = 90
# $time variable converts $DaysInactive to LastLogonTimeStamp property format for the -Filter switch to work

$time = (Get-Date).Adddays(-($DaysInactive))

# Identify inactive computer accounts

Get-ADComputer -Filter {LastLogonTimeStamp -lt $time} -ResultPageSize 2000 -resultSetSize $null -Properties Name, OperatingSystem, SamAccountName, DistinguishedName, LastLogonDate | Select DNSHostName, LastLogonDate, DistinguishedName

Migrating VM’s on XEN to VMware

2 years ago I was tasked with migrating some vm’s off XEN to VMware, these were my notes:

1.) Take SNAPSHOT!!!!

2.) Uninstall Citrix via add/remove programs (dont restart)

3.) Manually run C:\programfilesx86\citrix\xentools uninistaller.exe (dont restart)

4.) Device Manager (uninstall devices w/Citrix driver) (dont reboot) (May have to uninstall twice)

5.) Device mgr (show hidden devices) look for citrix drivers and uninstall if any are shown

6.) Restart machine – take another snapshot (just incase)

7.) open device MGR, double check for XEN drivers (shouldnt be any)

8.) Open the registry editor (regedit) and navigate to:

HKLM\SYSTEM\CurrentControlSet\Services\

Delete all Keys that begin with “XEN*” and repeat it for all “CurrentControlSet” Keys you may have for example

HKLM\SYSTEM\CurrentControlSet1\Services\
HKLM\SYSTEM\CurrentControlSet2\Services\

Now navigate to:

HKLM\SYSTEM\CurrentControlSet\Control\Class\

and delete the “UpperFilters” value found under the contents of the following two Keys:

{4D36E96A-E325-11CE-BFC1-08002BE10318}
{4D36E97D-E325-11CE-BFC1-08002BE10318}

Repeat it for all “CurrentControlSet” Keys you may have for example:

HKLM\SYSTEM\CurrentControlSet1\Control\Class{4D36E96A-E325-11CE-BFC1-08002BE10318}
HKLM\SYSTEM\CurrentControlSet1\Control\Class{4D36E97D-E325-11CE-BFC1-08002BE10318}
HKLM\SYSTEM\CurrentControlSet2\Control\Class{4D36E96A-E325-11CE-BFC1-08002BE10318}
HKLM\SYSTEM\CurrentControlSet2\Control\Class{4D36E97D-E325-11CE-BFC1-08002BE10318}

9.) goto c:\windows\system32 & delete all xen drivers

10.) reboot & make sure no BSOD

11.) run vmware converter

Ansible Automation: Gather list of all services on windows servers and clients

I had another audit request to gather all services on windows servers in an environment of about 70+ servers. I knew doing this through Ansible would be allot faster than going to each server individually. In the end it took less than 5 minutes to gather the services on 70+ servers.

When running the script I usually tee the output to text:

IE: ansible-playbook Audit_win_list_all_services.yml | tee /tmp/audit/Windows_services.txt

Here is my playbook:

Audit_win_list_all_services.yml

Ansible Automation: Gather list of all software installed on windows servers and clients

I had a request to gather all software installed on windows servers in an environment of about 70+ servers. I knew doing this through Ansible would be allot faster than going to each server individually. In the end it took less than 5 minutes to gather the installed software on 70+ servers.

I had seen a few playbooks online from other Ansible Admins doing this via Win32_Product, but I have seen warnings about using Win_32Product causing problems.

So after reading this article, I created the following playbook (I initially used a normal debug statement, but the output had allot of unnecessary info, so I split the output by newline and printed that list):

Below is my playbook:

win_list_all_programs.yml

Automating with Ansible: Adding new windows server clients to Prometheus/Grafana

I needed a way to install the Windows_Exporter on our Window systems as well as automating the configuration of the client in Prometheus. I came up with this Ansible playbook to handle this task. I’m sure there may be other ways of doing this and I’m always open to any suggestions. Here is what I have:

Playbooks (Can be downloaded):

win_install_prometheus.yml which calls install_prometheus_part2.yml

I imported a dashboard from Grafana.com, but at the time it only exported the older wmi_exporter. I was able to edit the dashboard and update it to work with the new exporter. Here is my dashboard (in JSON format for importing):

Creating an AD Realm on a Cisco ASA 5508-x running FTD (via FTM)

Creating an AD Realm on a Cisco ASA 5508-x running FTD (via FTM)

This was done on FTD vs 6.2.3-83. 

  1. In the Top Menu (Monitoring, Policies, Objects, Device), Select Objects
  2.  Under the Object types side menu, select Identity Realm
  3.  Enter a Realm name (I entered Client domain).
  4. For me, the Type: Active Directory was grayed out (it was my only choice anyway)
  5. For base DN, I entered: dc=example,dc=com
  6. for AD Primary domain, I entered our domain name
  7. Hostname, I entered the ip of the AD server and port I left at the default of 389
  8. I left encryption as None. I then tested satisfactory and saved the config.

Please check out my related article:

Setting up AnyConnect VPN’s on the Cisco ASA 5508x (FTD)

Veeam Backup Failing (VSS_WS_FAILED_AT_PREPARE_SNAPSHOT) (Resolved)

Veeam Backup Failing (‘VSS_WS_FAILED_AT_PREPARE_SNAPSHOT’)

I had a Veeam backup job that was failing with: Retrying snapshot creation attempt (Writer ‘Microsoft Hyper-V VSS Writer’ is failed at ‘VSS_WS_FAILED_AT_PREPARE_SNAPSHOT’. The writer experienced a non-transient error. If the backup process is retried, the error is likely to reoccur. –tr:Failed to verify writers state. –tr:Failed to perform pre-backup tasks.)

Researching this error online was telling me the issue was on the host, but I wasn’t believing that as all of my other vm’s were backing up without issue daily.

To play it safe I checked the host by running: vssadmin list writers

I received the following error on the host:

microsoft hyper-v vss writer non-retryable error

Looking further on the host’s event logs for the error I saw this:

At this point I was still convinced the host wasn’t at fault due to the fact all other vm’s still backed up fine, so I logged onto the vm in question and ran: vssadmin list writers

I received the following on the vm:

sqlserverwriter non-retryable error

Looking into the event viewer I saw:

Researching these errors online I found several solution saying to delete the old backup software. This server used to use another backup solution prior to Veeam called Altaro, which I was pretty sure I had removed a long time ago. I checked add/remove programs and verified Altaro wasn’t listed. I even checked the vss writers for any other backup software listed and found nothing. Running out of ideas, I checked Windows backup to make sure it wasn’t running and no backup jobs were listed. I then looked into Task Scheduler and found a few manual backup jobs listed. I disabled and deleted these jobs. I then restarted the SQL VSS writer service, restarted SQL VSS service, verified it showed no errors after re-running vssasdmin list writers. I then retried the Veeam backup again and it failed out once again.

Re-running vss list writer I received the same error. I was now convinced this was tied to the old task scheduled backups I had removed.

Next, I tried: vssadmin delete shadows /all

After running that command, I received:

Error: Snapshots were found, but they were outside of your allowed context.  Try
 removing them with the backup application which created them.

After much more research, I found an outside the box way of deleting the snapshots from another site.

How to Fix “outside of your allowed context” Errors

In order to get rid of these kinds of shadows we need to apply a “trick”. Basically the VSS diff area storage is where VSS keeps these shadows “alive”.

By seriously cutting this limit to the bare minimum we invoke a mechanism in VSS itself that causes it to dump all shadows.

So we proceed by telling VSS to cut the limit down to 401 MB. For some reason the user interface will claim the bottom is 300MB but on several versions of Windows it refuses and reports:

Error: Specified number is invalid

The command that works uses 401MB and is (adapt it to your drive letter as needed):

vssadmin resize shadowstorage /for=D: /on=D: /maxsize=401MB  

*****I ran this against the C: and D: drive of my VM*****

Then once you get “success” you can increase the limit once again to the recommended “unbounded” setting, or an actual limit value if you are using shadow copies for other purposes:

vssadmin resize shadowstorage /for=d: /on=D: /maxsize=unbounded

*****I ran this against the C: and D: drive of my VM*****

Then, vssadmin happily reports:

Successfully resized the shadow copy storage association

and a quick check using

vssadmin list shadows

reveals all VSS shadow copies are now gone!

I then re-ran Veeam the Veeam backup job against the VM and it ran successfully!