Tuesday, January 13, 2009

Line Fitting for Pediatric Cardiology (and everyone else)

Described as "one of the fundamental tasks of scientific inquiry", model selection could consume the better part of an afternoon and an important part of one's budgeted time with a statistician.

Enter ZunZun.com.

If you're looking for quality curve fitting and surface fitting, this is the site for you!

The power law applied by Sable et al. in their description of coronary artery reference values caught my attention. Particularly, the scaling exponents for the individual coronary arteries are all different, and not what I would have intuitively guessed them to be, based on the principle of geometric similarity. So, I wanted to test a theory: perhaps the coronary arteries scale well with something besides BSA.

Consider this small data set of 10 hypothetical patients:

Ht (cm) WT (kg) BSA (Haycock)
57 6.1 0.3187
61 7 0.3525
83 12.6 0.5464
98 14.2 0.6223
104 16.6 0.6930
120 20.5 0.8215
148 41 1.2961
172 88.6 2.0820
176 58 1.6729
178 65.5 1.7940

From this I predicted the diameter of the LAD and height-based LV mass for each hypothetical subject.
I then constructed a second table of super hypothetical data:

LV Mass (g) LAD (mm)
16.86 1.38
19.19 1.44
34.42 1.72
45.69 1.82
50.17 1.90
63.28 2.04
99.64 2.46
149.46 2.99
158.99 2.73
163.79 2.81

Then I did some line fitting:

hypothetical LAD vs. LVM

The model fitted is:

y = a * xb

The reported coefficients are:

a =  5.2619076425282296E-01
b =  3.3269001508780827E-01

The "b" term is the scaling exponent: 0.333.
That is to say, in this small sample of hypothetical data, the LAD (a linear measure) scales with LV mass (a volumetric measure) to the 1/3 power.
Maybe that is just random.
Or, maybe that is just… cool.

 

Of course, selection of the best model depends on numerous factors some of which are the regression "fit" statistics and things like the "Bayesian information criterion". Excel won't report these bits, but ZunZun.com throws a bunch at you.

It's free, by the way- unlike the statistician's time.