Predicting Type Two Diabetes and Determination of Effectiveness of Risk Factors Applying Logistic Regression Model


1 MSc Student of Biostatistics, Research Center for Modelling in Health Institute for Future Studies in Health and Epidemiology & Biostatistics Department, School of Health Medical Sciences, Kerman, Iran

2 Professor of Bostatistics, Physiology Research Center and Epidemiology & Biostatistics Department, School of Health, Kerman University of Medical Sciences, Kerman, Iran

3 Professor of Physiology, Physiology Research Center, Kerman University of Medical Sciences, Kerman, Iran


Background & Aim: Diabetes is one of the chronic diseases with no curative treatment; also, it is the most common cause of amputation, blindness and chronic renal failure and the most important risk factor of heart diseases. Logistic regression is one of the statistical analysis models for predicting that can be used to find out the relationship between dependent and predictor independent variables and control of the confounding variables. The aim of this study was to determine the rate of effective variables on diabetes and estimation of the logistic regression model for predicting. Methods: 5357 persons in Kerman city, Iran, were enrolled. Diabetes considered as the response variable and weight, height, body mass index (BMI), waist circumference, hip circumference, waist-to-hip ratio (WHR), age, gender, occupation, education, drugs, drug abuse, activities, systolic and diastolic blood pressure, and levels of total cholesterol, the high-density lipoprotein (HDL), the low-density lipoprotein (LDL), and triglycerides were considered as independent variables in the model. Measures of sensitivity, specificity, accuracy, Kappa measure of agreement and ROC (receiver operating characteristic) curve was applied for determining the power of test. Results: The Sensitivity, specificity, accuracy rate, Kappa measure of agreement and area under the ROC curve for the model were 0.764, 0.725, 0.731, 0.312 and 0.822, respectively. The following variables were significant according to their impact and their importance, respectively: WHR (β = 2.66, OR=14.32), antihypertensive drug (β =1.279, OR= 3.59), sex (β =0.707, OR= 2.028), level of education, walking and cycling (β = 0.136, OR= 1.146), waist circumference (β =0.12, OR= 1.127), weight (β = 0.112, OR= 1.118), BMI (β = 0.053, OR= 1.054), systolic blood pressure (β =0.052, OR= 1.054), age (β =0.046, OR= 1.047), diastolic blood pressure (β =0.043, OR= 1.044), total cholesterol (β = 0.003, OR= 1.003), triglycerides (β =0.01, OR= 1.011), LDL (β = 0.001, OR= 1.001), hip circumference (β = - 0.025, OR= 1.025), height (β = -0.071, OR= 0.932), HDL (β = -0.078, OR= 0.925), an intense 10-minute work activities (β = -0.507, OR=0.602). Conclusion: According to the criteria of accuracy and power of prediction, and considering ROC curve value (0.822) which could perform test accuracy as well for the diagnosis of diabetes, the logistic regression model was an appropriate model for the prediction of diabetes in this study.