Penalized Lasso Methods in Health Data: application to trauma and influenza data of Kerman

Document Type: Original Article

Authors

1 Department of Biostatistics and Epidemiology, Modeling in Health Research Center, Faculty of Health, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, Iran

2 Professor, Department of Biostatistics, Physiology Research Center, Institute of Basic and Clinical Physiology Sciences & Modeling in Health Research Center, Faculty of Health, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, Iran

3 Professor, Department of Biostatistics and Epidemiology, Modeling in Health Research Center, Faculty of Health, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, Iran

4 Associate Professor, Department of Biostatistics and Epidemiology, Social Determinants of Health Research Center, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, Iran

5 Department of Biostatistics and Epidemiology, HIV/STI Surveillance Research Center, and WHO Collaborating Centre for HIV Surveillance, Kerman University of Medical Sciences, Kerman, Iran

6 Associate Professor, Department of Emergency Medicine, Kerman University of Medical Sciences, Kerman, Iran

7 Department of Emergency Medicine, Kerman University of Medical Sciences, Kerman, Iran

Abstract

Background: Two main issues that challenge model building are number of Events Per Variable and multicollinearity among exploratory variables. Our aim is to review statistical methods that tackle these issues with emphasize on penalized Lasso regression model.  The present study aimed to explain problems of traditional regressions due to small sample size and multi-colinearity in trauma and influenza data and to introduce Lasso regression as the most modern shrinkage method.
Methods: Two data sets, corresponded to Events Per Variable of 1.5 and 3.4, were used. The outcomes of these two data sets were hospitalization due to trauma and hospitalization of patients suffering influenza respectively. In total, four models were developed: classic Cox and logistic regression models, as well as their penalized lasso form. The tuning parameters were selected through 10-fold cross validation.
Results: Traditional Cox model was not able to detect significance of any of variables. Lasso Cox model revealed significance of respiratory rate, focused assessment with sonography in trauma, difference between blood sugar on admission and 3 h after admission, and international normalized ratio. In the second data set, while lasso logistic selected four variables as being significant, classic logistic was able to identify only the importance of one variable.
Conclusion: The AIC for lasso models was lower than that for traditional regression models. Lasso method has practical appeal when Events Per Variable is low and multicollinearity exists in the data.

 
 

Keywords


  1. Austin PC, Steyerberg EW. The number of subjects per variable required in linear regression analyses. J Clin Epidemiol 2015; 68(6):627-36.
  2. Vittinghoff E, McCulloch CE. Relaxing the rule of ten events per variable in logistic and Cox regression. Am J Epidemiol 2007; 165(6):710-8.
  3. Lin FJ. Solving multicollinearity in the process of fitting regression model using the nested estimate procedure. Quality & Quantity 2008; 42(3):417-26.
  4. Clarke R, Ressom HW, Wang A, Xuan J, Liu MC, Gehan EA, et al. The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nat Rev Cancer 2008; 8(1):37-49.
  5. Li H, Gui J. Partial Cox regression analysis for high-dimensional microarray gene expression data. Bioinformatics 2004; 20(suppl 1):i208-15.
  6. Pinsky PF, Magder LS. Evaluating the tradeoff between bias and variance through use of prior probabilities. Commun Stat Simul Comput 1997; 26(2):399-421.
  7. Slinker BK, Glantz SA. Multiple regression for physiological data analysis: the problem of multicollinearity. Am J Physiol 1985; 249(1 Pt 2):R1-12.
  8. Hammami D, Lee TS, Ouarda TB, Lee J. Predictor selection for downscaling GCM data with LASSO. Journal of Geophysical Research 2012; 117(D17116).
  9. Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Series B Stat Methodol 1996; 58(1):267-88.
  10. Tian GL, Tang ML, Fang HB, Tan M. Efficient methods for estimating constrained parameters with applications to lasso logistic regression. Comput Stat Data Anal 2008; 52(7):3528-42.
  11. Jang DH, Anderson-Cook CM. Influence Plots for LASSO. [cited ?????] Available from: https://www.osti.gov/pages/biblio/1337112-influence-plots-lasso.
  12. Meinshausen N, Bühlmann P. High-dimensional graphs and variable selection with the lasso. Ann Stat 2006; 34(3):1436-62.
  13. Benner A, Zucknick M, Hielscher T, Ittrich C, Mansmann U. High‐dimensional Cox models: the choice of penalty as part of the model building process. Biom J 2010; 52(1):50-69.
  14. Roberts S, Nowak G. Stabilizing the lasso against cross-validation variability. Computational Statistics & Data Analysis 2014; 70:198-211.
  15. Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, et al. Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiology 2010; 21(1):128-38.
  16. Murphy TB, Dean N, Raftery AE. Variable selection and updating in model-based discriminant analysis for high dimensional data with food authenticity applications. Ann Appl Stat 2010; 4(1):396-421.
  17. Huang H. Controlling the false discoveries in LASSO. Biometrics 2017; 73(4):1102-10.
  18. Torabi M, Mazidi Sharaf Abadi F, Baneshi MR. Blood sugar changes and hospital mortality in multiple trauma. Am J Emerg Med 2018; 36(5):816-19.
  19. Isbell C, Cohn SM, Inaba K, O'Keeffe T, De Moya M, Demissie S, et al. Cirrhosis, operative trauma, transfusion, and mortality: a multicenter retrospective observational study. Cureus 2018; 10(8):e3087.
  20. Cirocchi R, Grassi V, De Sol A, Renzi C, Parisi A, Parisi G, et al. Diagnostic, therapeutic and health-care management protocol for major abdominal trauma at the "Santa Maria" Hospital of Terni. Analysis of the results after two years. Ann Ital Chir 2018; 89:540-51.
  21. Froberg L, Helgstrand F, Clausen C, Steinmetz J, Eckardt H. Mortality in trauma patients with active arterial bleeding managed by embolization or surgical packing: an observational cohort study of 66 patients. J Emerg Trauma Shock 2016; 9(3):107-14.
  22. Duane TM, Ivatury RR, Dechert T, Brown H, Wolfe LG, Malhotra AK, et al. Blood glucose levels at 24 hours after trauma fails to predict outcomes. J Trauma 2008; 64(5):1184-7.
  23. Ono S, Ono Y, Matsui H, Yasunaga H. Factors associated with hospitalization for seasonal influenza in a Japanese nonelderly cohort. BMC Public Health 2016; 16:922.
  24. Czaja CA, Miller L, Alden N, Wald HL, Cummings CN, Rolfes MA, et al. Age-related differences in hospitalization rates, clinical presentation, and outcomes among older adults hospitalized with influenza-U.S. Influenza Hospitalization Surveillance Network (FluSurv-NET). Open Forum Infect Dis 2019; 6(7): pii: ofz225.
  25. Homaira N, Briggs N, Oei JL, Hilder L, Bajuk B, Snelling T, et al. Impact of influenza on hospitalization rates in children with a range of chronic lung diseases. Influenza Other Respir Viruses 2019; 13(3):233-9.
  26. Tempia S, Walaza S, Moyes J, Cohen AL, von Mollendorf C, Treurnicht FK, et al. Risk factors for influenza-associated severe acute respiratory illness hospitalization in South Africa, 2012-2015. Open Forum Infect Dis 2017; 4(1):ofw262.
  27. Radchenko P, James GM. Variable Inclusion and shrinkage algorithms. J Am Stat Assoc 2008; 103(483):1304-15.
  28. Zou H. The adaptive lasso and its oracle properties. J Am Stat Assoc 2006; 101(476):1418-29.
  29. Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Series B 2005; 67(2):301-20.
  30. Mallick H, Yi N. Bayesian methods for high dimensional linear models. J Biom Biostat 2013; 1:005.
  31. Reid S, Tibshirani R, Friedman J. A study of error variance estimation in lasso regression. Statistica Sinica 2016; 26:35-67.
  32. Park T, Casella G. The bayesian lasso. J Am Stat Assoc 2008; 103(482):681-6.