Problem Definition and the Data

Get an overview of the problem definition and the data being utilized in the project.

We'll cover the following


Chronic Kidney Disease (CKD) is one of the leading causes of death around the globe and costs a significant amount of money to the global health care system. One of the major challenges of CKD is that it usually doesn’t show symptoms and can damage kidneys silently. People with early kidney disease may not know anything is wrong. They can’t feel the damage before any kidney function is lost. It happens slowly and in stages. Early detection with the proper treatment can slow kidney disease progression.

This is a real-world problem, and experts are spending resources to develop a medical diagnostic test that is better than our current diagnosis system for CKD. The existing clinical data from CKD patients could play a vital role in creating a machine learning algorithm that can predict CKD in high-risk individuals (those with diabetes, high blood pressure, family history of CKD, and an age older than 65 years) at its early stage. Most of the time, three simple laboratory tests measuring the amount of waste in the blood, the amount of protein in the urine, and a patient’s blood pressure are conducted for screening.

What could be the data science problem? We can develop a machine learning algorithm that can predict the early stage CKD with high accuracy. It should reduce the number of false positives and the number of false negatives.


Let’s work with the dataset on CKD.

Get hands-on with 1200+ tech skills courses.