HAProxy to the rescue

I have a client that did not want some of their employees having internet access due to loss of productivity. The employee workstations were on their own network that was firewalled off the regular network. The firewall allowed very limited access to the internal office network and no access to the internet.

They ran into an issue where some of the employees required access to a certain website to do their job. I could easily open a hole in the firewall to that site, but this site was hosted by AWS and the IP’s changed daily. I could continue adding in new IP’s, or I could go the proxy route. In the past I setup and configured a Squid proxy server to handle this, but I really wanted to see if I could get HAProxy to handle this. I knew HAProxy could forward web traffic, but that was to a specific site with static IP’s. I tested using HAProxy in http mode as well as tcp pointing to known ips and it would work until the ip changed.

After some searching, I found HAProxy was able to use DNS service discovery to detect server changes on the fly and then apply them to your system automatically. All I needed to do was add a DNS Resolvers configuration to my HAProxy config along with load balancing. I will post my configuration below with an explanation following. In the code below, I’m changing the name of the website the client is using to a more generic name like “fedex.com”

global
   stats socket :9000 mode 660 level admin

resolvers dns1
   nameserver dns1 192.168.3.53:53
   accepted_payload_size 8192 # allow larger DNS payloads

frontend https
   bind *:443
   option tcplog
   mode tcp
   default_backend fedex-https


backend fedex-https
   mode tcp
   balance source
   server-template fedex1 3 www.fedex.com:443 check resolvers dns1 init-addr none check inter 2000 rise 2 fall 5 verify none

The frontend listens on port 443 (Clients are directed to this in their proxy configuration via AD GPO). The backend server template will add (3) entries from DNS lookups to the backend. You would determine the number you want by first running a manual nslookup against the host you are looking to connect to and see how many results you get back, in my case I got 6, so I added 3 (you never want to go above the amount of servers your manual nslookup resolves). I could have easily set this number at 2 and the backend would swap between the first 2 host it gets when it checks DNS. In my actual configuration that I’m not showing, I set the number to (2). The “init-addr none” allows HAProxy to run if it is unable to resolve the hostname on startup.

Now I have a hole in my firewall allowing access from the firewalled employees to my HAProxy server only via port 443. I have an AD GPO that sets their computers to use my HAProxy server for internet access. They can try to go any other site and they get nothing. It only allows them to fedex.com.

A more detailed explanation can be found here:

https://www.haproxy.com/blog/dns-service-discovery-haproxy/

and:

https://www.haproxy.com/blog/client-ip-persistence-or-source-ip-hash-load-balancing/