Here’s the scenario:
The Epic MyChart patient portal is accessible to clients
through two logical pathways: to the patient population at large from the
public internet using the Epic DMZ NetScaler appliance and to Hyperspace users
from the internal network using the existing McKesson NetScaler appliance.
Each path to MyChart uses the core services available to it,
like DNS. Patients connecting from the
outside to mychart.baptistoncare.org resolve the name publicly, while
Hyperspace users connecting from the inside to mychart.bmhcc.org resolve the
name using internal DNS resources.
Check it.
Graph 1: External Access
Graph 2: Internal Access
Wat. :|
Notice that the “stair step” between each transfer is almost
exactly 1 second. I’m not good enough at
math to appreciate with any accuracy the transfer performance difference
here. Instead, my plebian mind conjures terms
like terribad, or silly-bad, to fill the gap.
The game is afoot!
The first thing I was able to confirm with certainty was that
performance to the server itself was just like it was from the outside. My problems apparel lie in the NetScaler
config for this particular application, since James confirmed we’re not aware
of any reports from any of the other applications running through it.
Despite no other reports of trouble, I checked NetScaler
resource utilization first. It was
pretty uninteresting.
From here, I started pulling off SSL at the various steps of
the config to identify any problems that might exist in my certificate build.
The first thing was to create an HTTP-only service group.
I then had to create a few different test virtual servers to
host different configurations without affecting actual MyChart traffic. James had already established a TCP test
server, but I needed something a bit more fanciful. Enter my test HTTP and SSL virtual servers.
I used different combinations of my SSL and HTTP virtual
servers and service groups to better understand where in the transaction my
slowness was manifesting itself.
Here’s the breakdown of the results:
Virtual Server
|
Service Group
|
Result (Pass = Normal/Fail = Slow)
|
HTTP
|
HTTP
|
Pass
|
HTTP
|
SSL
|
Pass
|
SSL
|
HTTP
|
Pass
|
SSL
|
SSL
|
Fail
|
Wat wat. :|
Not 100% sure what this means, I performed an NSTRACE while
using the problematic config.
Weeding through an NSTRACE file sucks
It is, however, a useful exercise, as it reveals all manner
of network gossip going on around the resources in question. I tried to configure a trace that only looked
at the traffic for a particular virtual server, but it gave me everything
anyway.
There are only a couple of things that really stand out to
me from other applications’ traffic:
- Lots of out-of-order TCP traffic
- Tons of TLSv1 “Server Hello” and cipher renegotiation
What these two facts mean is still beyond me. I’ve reached out to Chris Lancaster, Citrix
Bro and NetScaler Extraordinaire, to meet with me later this afternoon and work
through my methods.
To sum this thing so far, there is something in how the
7500’s configured that creates major per-transaction delays when brokering a
client HTTPS connection to an HTTPS-enabled server. That problem does not exist in isolation on
either side; that is, only HTTPS from the client or only HTTPS to the server
creates the issue.
Stay tuned.