Classification of Chronic Kidney Disease Patients via k-important Neighbors in High Dimensional Metabolomics Dataset

Document Type: Original Article


1 Assistant Professor, Department of Biostatistics and Epidemiology, Faculty of Health, Shahrekord University of Medical Sciences, Shahrekord, Iran

2 Assistant Professor, Chronic Kidney Disease Research Center, Labbafinejad Hospital, Shahid Beheshti University of Medical Sciences, Tehran, Iran

3 Professor, Urology and Nephrology Research Center, Labafinejad Hospital, Shahid Beheshti University of Medical Sciences, Tehran, Iran


Background: Chronic kidney disease (CKD), characterized by progressive loss of renal function, is becoming a growing problem in the general population. New analytical technologies such as “omics”-based approaches, including metabolomics, provide a useful platform for biomarker discovery and improvement of CKD management. In metabolomics studies, not only prediction accuracy is attractive, but also variable importance is critical because the identified biomarkers reveal pathogenic metabolic processes underlying the progression of chronic kidney disease. We aimed to use k-important neighbors (KIN), for the analysis of a high dimensional metabolomics dataset to classify patients into mild or advanced progression of CKD.
Methods: Urine samples were collected from CKD patients (n=73). The patients were classified based on metabolite biomarkers into the two groups: mild CKD (glomerular filtration rate (GFR)> 60 mL/min per 1·73 m2) and advanced CKD (GFR<60 mL/min per 1·73 m2). Accordingly, 48 and 25 patients were in mild (class 1) and advanced (class 2) groups respectively. Recently, KIN was proposed as a novel approach to high dimensional binary classification settings. Through employing a hybrid dissimilarity measure in KIN, it is possible to incorporate information of variables and distances simultaneously.
Results: The proposed KIN not only selected a few number of biomarkers, it also reached a higher accuracy compared to traditional k-nearest neighbors (61.2% versus 60.4%) and random forest (61.2% versus 58.5%) which are currently known as the best classifieres.
Conclusion: Real metabolomics dataset demonstrate the superiority of proposed KIN versus KNN in terms of both classification accuracy and variable importance.


  1. Hocher B, Adamski J. Metabolomics for clinical use and research in chronic kidney disease. Nat Rev Nephrol 2017; 13(5):269-84.
  2. Nkuipou-Kenfack E, Duranton F, Gayrard N, Argilés À, Lundin U, Weinberger KM, et al. Assessment of metabolomic and proteomic biomarkers in detection and prognosis of progression of renal function in chronic kidney disease. PLoS One 2014; 9(5):e96955.
  3. Fernández-Delgado M, Cernadas E, Barro S, Amorim D. Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research 2014; 15:3133-81.
  4. Lin Y, Jeon Y. Random forests and adaptive nearest neighbors. J Am Stat Assoc 2006; 101(474):578-90.
  5. Lantz B. Machine Learning With R. Birmingham, Mumbai: Packt Publishing; 2013.
  6. 6.   Pal AK, Mondal PK, Ghosh AK. High dimensional nearest neighbor classification based on mean absolute differences of inter-point distances. Pattern Recognition Letters 2016; 74:1-8.
  7. Aggarwal CC, Hinneburg A, Keim DA. On the Surprising Behavior of Distance Metrics in High Dimensional Space. Berlin: Springer; 2001. p. 420-34.
  8. Lu CY, Min H, Gui J, Zhu L, Lei YK. Face recognition via weighted sparse representation. J Vis Commun Image Represent 2013; 24(2):111-6.
  9. Radovanović M, Nanopoulos A, Ivanović M. Hubs in space: popular nearest neighbors in high-dimensional data. The Journal of Machine Learning Research 2010; 11:2487-531.
  10. Beyer K, Goldstein J, Ramakrishnan R, Shaft U. When is “nearest neighbor” meaningful? London: Springer-Verlag; 1999. p. 217-35.
  11. Fern XZ, Brodley CE. Random projection for high dimensional data clustering: a cluster ensemble approach. Twentieth International Conference on Machine Learning; 2003 Aug 21-24; Washington, DC: AAAI Press; 2013.
  12. Deegalla S, Boström H, editors. Reducing high-dimensional data by principal component analysis vs. random projection for nearest neighbor classification. 5th International Conference on Machine Learning and Applications; 2006 Dec 14-16; Orlando, FL, USA: IEEE; 2006.
  13. Chan Yb, Hall P. Robust nearest-neighbor methods for classifying high-dimensional data. Ann Stat 2009; 37(6A):3186-203.
  14. Raeisi Shahraki H, Pourahmad S, Ayatollahi SM. Identifying the prognosis factors in death after liver transplantation via adaptive LASSO in Iran. J Environ Public Health 2016; 2016(1):1-6.
  15. Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Series B Stat Methodol 1996; 58(1):267-88.
  16. Raeisi Shahraki H, Bemani P, Jalali M. Classification of bladder cancer patients via penalized linear discriminant analysis. Asian Pac J Cancer Prev 2017; 18(5):1453-7.
  17. Witten DM, Tibshirani R. Penalized classification using fisher's linear discriminant. J R Stat Soc Series B Stat Methodol 2011; 73(5):753-72.
  18. Zou H, Hastie T, Tibshirani R. Sparse principal component analysis. J Comput Graph Stat 2006; 15(2):265-86.
  19. Witten DM, Tibshirani R. A framework for feature selection in clustering. J Am Stat Assoc 2010; 105(490):713-26.
  20. Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 2001; 96(456):1348-60.
  21. Raeisi Shahraki H, Pourahmad S, Zare N. K important neighbors: a novel approach to binary classification in high dimensional data. Biomed Res Int 2017; 2017:7560807.
  22. Levey AS, Coresh J. Chronic kidney disease. Lancet 2012; 379(9811):165-80.
  23. Sharma K, Karl B, Mathew AV, Gangoiti JA, Wassel CL, Saito R, et al. Metabolomics reveals signature of mitochondrial dysfunction in diabetic kidney disease. J Am Soc Nephrol 2013; 24(11):1901-12.
  24. Zhao L, Gao H, Lian F, Liu X, Zhao Y, Lin D. 1H-NMR-based metabonomic analysis of metabolic profiling in diabetic nephropathy rats induced by streptozotocin. Am J Physiol Renal Physiol 2011; 300(4):F947-56.
  25. Sekula P, Goek ON, Quaye L, Barrios C, Levey AS, Römisch-Margl W, et al. A metabolome-wide association study of kidney function and disease in the general population. J Am Soc Nephrol 2016; 27(4):1175-88.
  26. Qi S, Ouyang X, Wang L, Peng W, Wen J, Dai Y. A pilot metabolic profiling study in serum of patients with chronic kidney disease based on 1H‐NMR‐Spectroscopy. Clin Transl Sci 2012; 5(5):379-85.
  27. Rhee EP, Clish CB, Ghorbani A, Larson MG, Elmariah S, McCabe E, et al. A combined epidemiologic and metabolomic approach improves CKD prediction. J Am Soc Nephrol 2013; 24(8):1330-8.