Document Type: Original Article
Assistant Professor, Department of Biostatistics and Epidemiology, Faculty of Health, Shahrekord University of Medical Sciences, Shahrekord, Iran
Assistant Professor, Chronic Kidney Disease Research Center, Labbafinejad Hospital, Shahid Beheshti University of Medical Sciences, Tehran, Iran
Professor, Urology and Nephrology Research Center, Labafinejad Hospital, Shahid Beheshti University of Medical Sciences, Tehran, Iran
Background: Chronic kidney disease (CKD), characterized by progressive loss of renal function, is becoming a growing problem in the general population. New analytical technologies such as “omics”-based approaches, including metabolomics, provide a useful platform for biomarker discovery and improvement of CKD management. In metabolomics studies, not only prediction accuracy is attractive, but also variable importance is critical because the identified biomarkers reveal pathogenic metabolic processes underlying the progression of chronic kidney disease. We aimed to use k-important neighbors (KIN), for the analysis of a high dimensional metabolomics dataset to classify patients into mild or advanced progression of CKD.
Methods: Urine samples were collected from CKD patients (n=73). The patients were classified based on metabolite biomarkers into the two groups: mild CKD (glomerular filtration rate (GFR)> 60 mL/min per 1·73 m2) and advanced CKD (GFR<60 mL/min per 1·73 m2). Accordingly, 48 and 25 patients were in mild (class 1) and advanced (class 2) groups respectively. Recently, KIN was proposed as a novel approach to high dimensional binary classification settings. Through employing a hybrid dissimilarity measure in KIN, it is possible to incorporate information of variables and distances simultaneously.
Results: The proposed KIN not only selected a few number of biomarkers, it also reached a higher accuracy compared to traditional k-nearest neighbors (61.2% versus 60.4%) and random forest (61.2% versus 58.5%) which are currently known as the best classifieres.
Conclusion: Real metabolomics dataset demonstrate the superiority of proposed KIN versus KNN in terms of both classification accuracy and variable importance.