How to implement Disease Prediction Project

Predicting diseases early can make a huge difference in improving treatment outcomes and reducing healthcare costs. With the rise of machine learning, it is now possible to analyze symptoms and medical data to identify potential health conditions even before they become serious. Machine learning models can learn patterns from large datasets of patient records and use this knowledge to predict what disease a person may have based on the symptoms they report.

These models not only help doctors make faster and more accurate decisions but also empower users to check their health status instantly through apps and websites. By combining medical knowledge with data-driven algorithms, disease prediction systems offer a smarter, more efficient way to support healthcare. In this blog, we’ll explore how machine learning can be used to predict diseases from symptoms, the types of models involved, and the benefits of using AI in modern healthcare.

On this project we have taken a dataset from Kaggle, and you can find other datasets related to diseases and apply them.

Prerequisites

Before applying this project, you need to install the following libraries.
1. pandas —data processing
2. scikit-learn – model building, train-test—data split
3. matplotlib—Data visualization

Data preprocessing & EDA (Exploratory Data Analysis)

First, we combined the train and test datasets into large dataset, checked target variable (prognosis) inside data by using unique function in pandas. We checked if any missing values were present or not. In this dataset, we did not find any missing values. Next we check if any imbalance variable values are present or not, so if there is imbalance data, then the model will be biased, so it’s important to verify this step before moving into further steps. We checked the imbalance data in our target variable (prognosis) using the value counts function in pandas. So in that case, we have not found any imbalanced dataset.

For clear understanding, we check the target variable (prognosis) data visualization using matplotlib.

The next steps come in data preprocessing is to encode categorical variables. This step is required because the models can understand only numeric values,not categorical values, So this step is ensure that it will convert the categorical values into numerical values, so we have used labelncoder that convert a converts values into 0 or 1. To implement this step, we have used the scikit-learn library for the label encoding.

Model Building

After the data preprocessing and EDA part, we create X and y variables, where X will be independent variables and y will be dependent variable.

Next we move for the train-test split; for this we used the scikit-learn library. We can split any ratio ,like 25%,or you can choose according to dataset.

Model training and testing

For our current dataset is based on classification, so we choose models which support the dataset. that support the classification case. Here the XGB Classifier and Random forest classifier. We used different parameters for the fine-tunning purpose based on the given models.

Next we fit our X_train and y_train set into our models and predict the model output using X_test. For the accuracy, we use the accuracy score and classification report for our predicted output with actual output. The accuracy score we got for the XGB classifier is 0.997% and the classification report output of precision and recall we got are 0.997%,got is around 1.00 receptively.

We also done hyper-parameter tuning of this model for preventing over overfitting of the model, so we use StratifiedKFold and cross-val score using scikit-learn library.

For StratifiedKFold we do 5 the splits in each iteration, where we got the cross validation score of these split with the average 0.994%.

Same above steps, we got 0.994% accuracy score. Following the same we do for the Random Forest classifier model before hyper-parameter, Following the same hyperparameter tuning accuracy we got 0.9 ;There was 0.9613% accuracy score and the classification report output of precision and recall we got was 0.9613%, we got was around 1:00, respectively.

At last we compare both the models that we tested and choose1:00, so we choosen Random forest Classifier because the accuracy is good and it prevents for over fitting.

Next we save the Random Forest Classifier model’s output and also the output variable classes for showing in frontend.

Web App building

So, far we have completed, the model building and testing part; next we start build the web app, For the tech stack we used was the Flask, HTML, and CSS.

Flask is an library that help us to build apis and use static files for the HTML and CSS file for rendering. So you can install this libary for your system. HTML and CSS help us to build frontend as you known already.

First, we created a separate file and imported the Flask library and Jinja template for the HTML file. So we have created a text box for the user to enter their disease details based on input variables present in the dataset, and for the styling we use CSS with color, font size, and background image with their required parameters set with each element based on HTML elements. We also created a Submit button for predicting the result.

Next we connect the CSS with our HTML file using the external link tag. And then we import the pickle file for reading input variables and connect this with our HTML file using Jinja templates. For showing results in the frontend. We also load our output variable pickle file when the user hits the submit button based on input values into the text box.

Testing and Deployment of Web App

Before deploying the web app into production, it’s essential to test it in a local environment to ensure that it’s working fine. To check whether your created flask works or not, you can run python app.py, and this will load the localhost URL. If you successfully configured the Flask app URL, then it will show you the localhost URL. So once you click the URL, you will be redirected to your URL, your created web app. If your web app is not looking like you wish, then you can debug the changes into your HTML or CSS file, or if your APIs are getting some issues, you can check the endpoint. If you follow the same process as we show here, it will be easier for you to create a website or web app.

So as in my case, a webcase, I also do debugging and make some changes as I need. Before moving on to the deployment part. You can deploy your web app as your choice; you can either host your app on the cloud or on other platforms.

I have deployed our web app on the Hostinger VPS server using WinSCP software by connecting the IP address. We’ve done some configuration on the nginx config file.

For uploading your app files in the server, you can follow this process as well.

Once you get your server IP address, you connect your server using WinSCP software, and after this, we can create a folder, and inside the folder, we can upload app files like pickle files, app.py, and HTML and CSS files. After that you need to configure the file, and it depends on which service provider you use for the deployment.

Here is our finally deployed Disease Prediction Web App.

How to implement Disease Prediction Project using Machine Learning