A Fortuitous Concatenation: Z-Scores and Standardized Reporting of Abnormal Echo Measurements

In what appears to be some small bit of determinism of my own, I have arrived at this point in the implementation of a data-driven, standardized echo report:

Can we use z-scores to determine the severity of abnormal findings?

Certainly, we are already doing this- with widely varying effect:

And, from my recent ad hoc survey:

To me, it seems as though there continues to be no consensus- no standardization- in spite of our obvious need.

A scientifically rigorous approach to the categorization of abnormal findings, that takes into consideration the spectrum of disease, is proposed by Vasan et al.:

We classified values of each echocardiographic variable into the following five categories based on sex- and height-specific percentiles (indicating increasing deviation from the reference limits):

category 0 (reference limits), value <=95th percentile of the reference sample;

category 1, 95th percentile of reference sample<value<=95th percentile of broad sample;

category 2, 95th percentile of broad sample<value<=98th percentile of broad sample;

category 3, 98th percentile of broad sample<value<=99th percentile of broad sample; and

category 4, value >99th percentile of broad sample.

What we don't yet have in pediatric echo is the benefit of their "broad sample"- the larger population that includes individuals with disease and echocardiographic abnormalities. All we have to go by are the normal patients used to construct the z-scores... and so we have to make educated guesses at what this "other" population might look like.
Here's my (somewhat) educated (and exaggerated, for effect) guess:

(typical "normal" sample in blue, speculated "broad" sample in purple)

In the absence of precise (or, for that matter, any) knowledge of the "broad sample", an approach we can still use is advocated by the authors of the Strong Heart Study:

... by the simple procedure, which we have used previously, of considering values

2 to 3 standard deviations from the normal mean as mildly abnormal,

3 to 4 standard deviations as moderately abnormal, and

>4 standard deviations as severely abnormal.

As a simple and purely anecdotal test, let's see how two different hypothetical patients stand up to this classification- both chosen to be deliberately on the severe side of the spectrum:

Infant with HLHS: BSA = 0.21; aortic root = 0.2 cm
Teenager with Marfan's: BSA = 2.0; aortic root = 5 cm

The aortic root z-scores are:

-5.9
+6.3

Both are appropriately classified by this scheme and easily meet criteria as severe.

Works for me, so far.

This appears to work for the ASE, as well. This same scheme was adopted by the ASE in their most recent Recommendations for Chamber Quantification.

Short of requiring subjective gradation of every measurement, and in the absence of anything better, this is what will probably be adopted in an effort to avoid reporting errors within our echo reports. Some fine-tuning may be required to address any possible differences between the severity of hypoplasia vs. dilatation, and I would also not be surprised to discover some minor differences in how individual cardiac structures tolerate variations from the mean.

So, back to the question: Can we use z-scores to determine the severity of abnormal findings?

I think so.
What do you think? Is there a better way?

references and related links