IceNineMemberJan 22, 2015 at 4:01 pm #164801
Not sure if this belongs here and if it doesn’t please accept my apologies and move it to the proper forum. Also, grab some coffee, this will be a long one.
We use citrix netscaler HLBs at work. We have setup up a VirtualIP which checks both our view connection servers and routes traffic to one with least connections. Currently it is using ping to determine if the server is live or not. I feel this is not robust enough a check to determine health of the view connection server. This past Monday my fears came true.
Back story. An offshore resource was charged with doing maintenance on the VMware farm and specifically, to update VMware tools on all servers. Procedure was not followed when they finally got to one of the Connection servers, they went ahead and updated vmware tools. For whatever reason, the server was not restarted and during the update ADAM process crashed. The person just restarted the VMware tools services and called it a day.
That particular connection server stopped passing authentications to AD (unbeknown to yours truly at the time). Of course this happens on monday morning and the few people that failed to authenticate was a Principal and the CTO. FML! times a million. AS luck would have it, i was about 5 minutes from the office and was able to jump on this issue.
a. User tries to log in using Vmware view client but get an error stating they are not entitled to any pools.
b. When trying to log in to the view admin console get an error stating authentication failure.
Because the server was still pingable, netscaler HLB was still serving it up and users were having issues.
Possible fixes in the heat of battle: restarted View services on the server without success. Finally ended up restarting the server which then fixed everything. Worked with support to find out that ADAM process had crashed and authentications were not being forwarded to AD.
OK on to my actual question now. Is there a way to check authentication with netscaler somehow in order to possibly avoid this fluke occurence? What do you guys use in your own environments to load balance connection servers?
All of our users are internal so we don’t use security servers in our deployment.
Any help is appreciated.
You must be logged in to reply to this topic.