Tuesday, May 6, 2014

MyChart Performance Weirdness - Part 1


Here’s the scenario:


The Epic MyChart patient portal is accessible to clients through two logical pathways: to the patient population at large from the public internet using the Epic DMZ NetScaler appliance and to Hyperspace users from the internal network using the existing McKesson NetScaler appliance.


Each path to MyChart uses the core services available to it, like DNS.  Patients connecting from the outside to mychart.baptistoncare.org resolve the name publicly, while Hyperspace users connecting from the inside to mychart.bmhcc.org resolve the name using internal DNS resources.


Check it.

Graph 1: External Access




Graph 2: Internal Access




Wat. :|


Notice that the “stair step” between each transfer is almost exactly 1 second.  I’m not good enough at math to appreciate with any accuracy the transfer performance difference here.  Instead, my plebian mind conjures terms like terribad, or silly-bad, to fill the gap.

The game is afoot!


The first thing I was able to confirm with certainty was that performance to the server itself was just like it was from the outside.  My problems apparel lie in the NetScaler config for this particular application, since James confirmed we’re not aware of any reports from any of the other applications running through it.

Despite no other reports of trouble, I checked NetScaler resource utilization first.  It was pretty uninteresting.


From here, I started pulling off SSL at the various steps of the config to identify any problems that might exist in my certificate build.


The first thing was to create an HTTP-only service group.




I then had to create a few different test virtual servers to host different configurations without affecting actual MyChart traffic.  James had already established a TCP test server, but I needed something a bit more fanciful.  Enter my test HTTP and SSL virtual servers.




I used different combinations of my SSL and HTTP virtual servers and service groups to better understand where in the transaction my slowness was manifesting itself.

Here’s the breakdown of the results:


Virtual Server
Service Group
Result (Pass = Normal/Fail = Slow)
HTTP
HTTP
Pass
HTTP
SSL
Pass
SSL
HTTP
Pass
SSL
SSL
Fail



Wat wat. :|


Not 100% sure what this means, I performed an NSTRACE while using the problematic config.

Weeding through an NSTRACE file sucks


It is, however, a useful exercise, as it reveals all manner of network gossip going on around the resources in question.  I tried to configure a trace that only looked at the traffic for a particular virtual server, but it gave me everything anyway.  


There are only a couple of things that really stand out to me from other applications’ traffic:

  • Lots of out-of-order TCP traffic
  • Tons of TLSv1 “Server Hello” and cipher renegotiation

What these two facts mean is still beyond me.  I’ve reached out to Chris Lancaster, Citrix Bro and NetScaler Extraordinaire, to meet with me later this afternoon and work through my methods.


To sum this thing so far, there is something in how the 7500’s configured that creates major per-transaction delays when brokering a client HTTPS connection to an HTTPS-enabled server.  That problem does not exist in isolation on either side; that is, only HTTPS from the client or only HTTPS to the server creates the issue.


Stay tuned.