Wednesday, May 18, 2016

Updates - Sprinting down the yellow-brick road

It feels like a whole geologic age has passed since my last, triumphant return to this blog  that I never updated any thereafter.  Old stuff has been fixed and revamped, new stuff has come online, and plenty of new mysteries have been revealed and illuminated in equal measure.

Given what all's happened, and what all is happening soon, and since I don't get much work done sitting in a touch-down away from my desk and monitors and Han Solo, I thought it a good opportunity to regroup and update this thing.

VMWare Horizon View

One of the big projects that got built, rebuilt, expanded, and will soon have to be migrated, is the Horizon View project.  For my part, it was a lot of learning the types of traffic and methods for moving it and building it accordingly on the NetScalers.

Here are some of the highlights:
  1. The Workspace aggregate portal, as well as the VMWare-only View portal, and all their accessory pieces were stood up on our NetScaler MPX 7500's.  Using separate DNS zones, our internal traffic leverages the internal configuration directly, while external clients proxy through a special View AccessPoint proxying configuration via our DMZ appliances.
  2. The final config involves separate VIP for separate View configurations based on the current separation of our Carrollton and Ausitn datacenter resources.  Frankly, this split led to a pretty messy configuration - one that I hope to change in future revisions of the build - but each, separate TM piece is in itself pretty simple TM build.
  3. Though I didn't get to finish developing it, I remain convinced CSW is absolutely viable with this product so long as two things are true: people at the table aren't afraid to buck VMWare's recommended build documentation, and you're fully offloading SSL at the ADC despite said documentation.  Workspace is particularly simple, and most of View's complexity lies in their insistence that you use SSL (with stritch thumbprint verification) all the way to the back-end.
  4. Persistence groups pretty soundly manage persistence requirements of the product, something I was glad to see after fears of persistence were on track to drive an even more [unnecessarily] expansive implementation.  Adding each View-related virtual server in the config (TCP 4172, UDP 4172, BLAST 8443, SSL 443) to a SOURCEIP-enabled persistence group (b/c you can't use cookies when a UDP resource is added) keeps the Access Point appliances pretty happy.
We've learned a lot, and there's yet more to figure out in future revisions.  Since the whole things' currently built on NetScaler MPX appliances for which I've an active project (described in a later section) to evacuate and decommission, we'll have a few more opportunities to further refine the config.  I plan to add GSLB support and hopefully cut down on the separate Carrollton/Austin namespaces we have today in the process.  I also hope to test using the NetScaler as the IDP for authentication and pass through authenticated clients to both Workspace and View, mostly as a larger endeavor to test using central, front-end Universal Gateway portals for collections of CHST applications that ease the experience with recently mandated two-factor authentication.

New SDX platform

Arguably the biggest new thing that's come along is our new NetScaler SDX 11520 environment.  Each of our datacenters has two of them, and I've now stood up several internal- and DMZ-facing virtual instances to use for all manner of fun stuff.  

I've had to learn a ton about NetScaler as a network appliance to make all this happen.  one of the biggest, hardest lessons so far has been routing and network interfaces.  The short version of this lesson goes something like this: the default gateway being bound to the management interface (eg, int 0/1) means data traffic you'd rather go along an LACP channel (eg, LA/1) absolutely will use the management interface instead without the proper routing build.  

What exactly constitutes "proper routing build" is still a subject of learning for me, but in my own testing I've found it to include a couple of things:
  • Policy-based routes that ensures traffic to and from the NSIP use the same MAC both directions (ie, int 0/1), and that anything that's not management-related traffic (ie, sourced from the NSIP or the management network SNIP and VIP network traffic) uses the same MAC both directions (ie, LA/1).
  • Static routes that define a gateway for subnets for which the appliance has no subnet IP.  I found this particularly important when trying to use a subnet IP for one network to route to a host in another network. 
The PBR side of things definitely took some trial and error, since I'm not hugely familiar with PBR's (or, more honestly, routing in general).  I did ultimately settle upon a configuration that, according to a slew of NSTRACE evaluation, successfully keeps SRC and DST MAC addresses looking as they should.

The static routes were a bit more straight-forward; the hardest part was figuring out why/when one was necessary.  There is a hierarchy to how the NetScaler determines which SNIP to use to communicate to which network: it will, unless explicitly configured to do otherwise, always use the SNIP that's logically closest to the target network.  

I knew this going into the configuration, but I didn't account for how that affected which interface it used .  Defining a SNIP adds that SNIP as the default gateway for that particular network; however, to use that SNIP to talk to another network, the NetScaler moves up its routing chain to figure out where to go.  Without the addition of a static route that explicitly defines the gateway as a resource in the subnet of the SNIP you wish to use, the NS has no other choice but use its default route.  Its default route, however, leverages its management interface by default, resulting in traffic entering one way but leaving another.

My challenge now, of course, is taking what I've recently learned about all this and retroactively correcting the instances I'd already stood up.  I didn't notice the problem initially because I was dealing only with internal-only instances, and there was no internal firewall stopping these asymmetrically transmitted packets.  Everything looked OK until we got to the DMZ, where a FW was blocking traffic from the management network to the interior.  It wasn't until we started coming through packet capture data that the SRC and DST MACs were very obviously not matching up, and down the rabbit hole we went.

AirWatch

The main center-stage attraction in the works right now is AirWatch, and it's aimed at replacing the current McAfee mobile management suite in use today.  The build on my end has been pretty interesting, since it's involved a lot of content switching.  I was, in this build, able to leverage some new skills involving string maps to build the CSW action and policy.  This build allows me to build a single action, and a single policy, both of which trigger based  upon the key:value pairs kept in the referenced string map.  The net effect is any new additions to the config we wish to add simply require the addition of a new key:value pair (and any LB build that's required, obviously), rather than a whole new CS action, policy, and binding.  

Computationally, as I understand it, evaluating string maps is a lot cheaper than evaluating even a basic policy expression, especially when there are a lot of policy expressions through which to iterate for a given request.  Realistically, for the amount of traffic we push, this difference in computational overhead - if any exists - likely has no bearing on end user experience - it's just a much more elegant approach to an already cool build process.  :)

The product is comprised of a manaement console, device management servers, secure email gateway ("SEG") servers, some application tunneling proxies, and a few other extraneous pieces.  Each set of servers leverages a distinct web app or service path, so the CSW logic has been fairly straight forward to define.  It has taken some trial and error, some awkward questions to the vendor, and some good ol' fashioned Fiddler tracing to figure some pieces out, but overall the build has gone well.  We've successfully tested all components in an internal-only build, and we've just this week tested some of the public-facing components as well.

There's a ton more, and a tone more detail in which I plan to explore all this - I've just gotten woefully behind in documentation in recent weeks.

More to come!

~Fin~