Predict Cancer Using Machine Learning Models
When a patient shows symptoms of cancer (often a conspicuous tumor in an internal organ), the tumor cell is taken out and genetically sequenced. There can be thousands of genetic mutations in a tumor. Now, if we skip the biological technicalities, each genetic mutation like this has a unique ID consisting of two fields: gene and variation.
Based on these two fields and some corresponding medical text data, we’ll classify genetic mutations into nine categories through multiclass classification. Some are malignant (drivers leading to tumor growth), and some are benign (passenger). The presence of any malignant mutation in the tumor cell puts the patient at significant risk of having cancer.
In this project, we’ll also perform data analysis by cleaning the textual data, checking for feature importance, comparing different machine learning (ML) models, and also different encodings.