Enhancing a somatic maturity prediction model

Purpose: Assessing biological maturity in studies of children is challenging. Sex-specific regression equations developed using anthropometric measures are widely used to predict somatic maturity. However, prediction accuracy was not established in external samples. Thus, we aimed to evaluate the fit of these equations, assess for overfitting (adjusting as necessary), and calibrate using external samples. Methods: We evaluated potential overfitting using the original Pediatric Bone Mineral Accrual Study (PBMAS; 79 boys and 72 girls; 7.5–17.5 yr). We assessed change in R2 and standard error of the estimate (SEE) with the addition of predictor variables. We determined the effect of within-subject correlation using cluster-robust variance and fivefold random splitting followed by forward-stepwise regression. We used dominant predictors from these splits to assess predictive abilities of various models. We calibrated using participants from the Healthy Bones Study III (HBS-III; 42 boys and 39 girls; 8.9–18.9 yr) and Harpenden Growth Study (HGS; 38 boys and 32 girls; 6.5–19.1 yr). Results: Change in R2 and SEE was negligible when later predictors were added during step-by-step refitting of the original equations, suggesting overfitting. After redevelopment, new models included age × sitting height for boys (R2, 0.91; SEE, 0.51) and age × height for girls (R2, 0.90; SEE, 0.52). These models calibrated well in external samples; HBS boys: b0, 0.04 (0.05); b1, 0.98 (0.03); RMSE, 0.89; HBS girls: b0, 0.35 (0.04); b1, 1.01 (0.02); RMSE, 0.65; HGS boys: b0, −0.20 (0.02); b1, 1.02 (0.01); RMSE, 0.85; HGS girls: b0, −0.02 (0.03); b1, 0.97 (0.02); RMSE, 0.70; where b0 equals calibration intercept (standard error (SE)) and b1 equals calibration slope (SE), and RMSE equals root mean squared error (of prediction). We subsequently developed an age × height alternate for boys, allowing for predictions without sitting height. Conclusion: Our equations provided good fits in external samples and provide an alternative to commonly used models. Original prediction equations were simplified with no meaningful increase in estimation error.