Predicting mortality risk in patients suffering from hepatocellular carcinoma using machine learning.
Sarthak Vajpayee, ,
and Akansha Mangal,
Medical Hypotheses, Elsevier. (under review),
2021
Liver cancer is one of the most recurrent detected cancers in the world. The most common type of liver cancer is hepatocellular carcinoma (HCC) which begins in the cells called hepatocytes. It can be cured with surgery or transplant if detected early but is incurable in more advanced cases. The exact cause of HCC is unknown, but some factors like several demographic, risk factors, laboratory features, and underlying problems like hepatitis B and hepatitis C virus, autoimmune hepatitis, and heavy drinking increase the risk of death from HCC. In this paper, we have used some of these factors to predict their chances of survival of a patient diagnosed with HCC. For our study, we have used a publicly available dataset of 165 patients maintained and made available by a University Hospital in Portugal. The dataset contains 23 quantitative and 26 qualitative variables with 10.22% missing data. In our approach, we first standardized the data and then handled the missing values by comparing four imputation techniques: mean, median, KNN, and random-forest-based MICE. After imputation, ANOVA F-value and mutual information were used to select the relevant features. The prepared data was then studied on five classifiers: Logistic Regression, Support Vector Machine, Random-Forest, Bagging-Classifier, and Multilayer Perceptron (MLP). The MLP based classifier and random-forest-based MICE imputation technique and feature selection produced the best results with an accuracy of 90.02% and f1-score of 0.914. With these results, we can say that our machine-learning based approach can be used to test the performance with huge database and effectively be applied to predict mortality risk in patients and aid clinicians.