Information on data

Published

1 Oct 2023

Kyphosis

The kyphosis data set is originally from Hastie and Tibshirani (1990):

Data were collected on 83 patients undergoing corrective spinal surgery. The objective was to determine important risk factors for kyphosis, or the forward flexion of the spine of at least 40 degrees from vertical, following surgery. The risk factors are location of the surgery along the spine and age. These data are analysed in some detail in Chapter 10, and listed in Appendix A.

Bell et al. (1989) studied multiple level thoracic and lumbar laminectomy, a corrective spinal surgery commonly performed in children for tumour and congenital or developmental abnormalities such as syrinx, diastematomyelia and tethered cord. The incidence of postoperative deformity is not known. The purpose of the study is to delineate the true incidence and nature of spinal deformities following this surgery and to assess the importance of age at time of surgery, as well as the effect of the number and location of vertebrae levels decompressed. The data in the study consists of retrospective measurements on 83 patients, one of the largest studies of this procedure to date.

The specific outcome of interest here is the presence (1) or absence (0) of kyphosis, defined to be a forward flexion of the spine of at least 40 degrees from vertical. The available predictors are age in months at time of the operation, the starting and ending range of vertebrae levels involved in the operation (start and end) and the number of levels involved (number). These last predictors are related by number = end — start +1. The goal of the analysis is to identify risk factors for kyphosis, and a natural approach is to model the prevalence of kyphosis as a function of the predictors. In order to investigate this relationship, we fit a number of generalized additive logistic models. By the results of Chapter 5, the exact linear dependence between the level variables cause an exact concurvity in any additive model involving all three of them. We therefore want to include only two of the three level variables in the model. The medical investigator felt a priori that number and start would be more interpretable, so we use these in the analysis. For start, the range 1-12 corresponds to the thoracic vertebrae, while 13-17 are the lumbar vertebrae; this dichotomy was felt to be an important one.

There are 65 zeros and 18 ones for the response kyphosis. This is not a large sample, especially for binary data; we have to bear in mind the warnings of section 6.10 about overinterpreting additive logistic fits to binary data. Despite the small sample size, this study is the largest and considered to be one of the most important of its kind for studying kyphosis.

The version of the data set included in the rpart package only has 81 observations; two observations were judged to be outliers and excluded. (In addition the end variable was dropped, the variable names were capitalized, and the 0/1 response variable was changed to a factor with levels “absent” and “present”.)

Gopher tortoises

From Ozgul et al. (2009)

In addition to the [capture-mark-recapture] analysis, we analyzed the number of fresh (<40 months old) tortoise shell remains found in each site and year to establish whether either (1) the number of shell remains found was correlated with seroprevalence [frequency of antibodies to disease] in a given site and year or (2) the number of shell remains found was greater in high- than in low-prevalence sites. Specifically, we wanted to fit a statistical model that would allow (given the total population on a site) overdispersion in shell counts, possible variation among sites and years, and a positive effect of prevalence on shell counts.

  • site: sampling site
  • year: sampling year
  • shells: number of shells found
  • type: fresh (<40 months) or old (>40 months)
  • area: site area (km^2?)
  • density: population density
  • prev: seroprevalence of M. agassizi

Get these data from here

References

Hastie, T. J., and R. J. Tibshirani. 1990. Generalized Additive Models. CRC Press.
Ozgul, Arpat, Madan K. Oli, Benjamin M. Bolker, and Carolina Perez-Heydrich. 2009. “Upper Respiratory Tract Disease, Force of Infection, and Effects on Survival of Gopher Tortoises.” Ecological Applications 19 (3): 786–98. https://doi.org/10.1890/08-0219.1.