Heart Disease Prediction And Comparison

BTech 6th Semester Project - Machine Learning

Fuzail Ahmad Khan
2022-310-066
Jamia Hamdard

Select Model

Logistic Regression

Accuracy: 80%

Logistic Regression Confusion Matrix

Decision Tree

Accuracy: 85%

Decision Tree Confusion Matrix

Random Forest

Accuracy: 92%

Random Forest Confusion Matrix

SVM

Accuracy: 92%

SVM Confusion Matrix

KNN

Accuracy: 86%

KNN Confusion Matrix

Model Performance Comparison

Accuracy Comparison

Accuracy Comparison

Precision Comparison

Precision Comparison

Recall Comparison

Recall Comparison

F1 Score Comparison

F1 Score Comparison

ROC AUC Comparison

ROC AUC Comparison

Inference Time Comparison

Inference Time Comparison

Heart Disease Risk Assessment

Patient Data

Prediction Result

0%
No Risk

Recommendations

  • Complete assessment to see recommendations

Project Documentation

Abstract

This project aims to develop a machine learning-based system for predicting the likelihood of heart disease in patients. The system utilizes various classification algorithms including Logistic Regression, Decision Trees, Random Forest, SVM, and KNN to analyze clinical parameters and provide risk assessments.

The models were trained on the Cleveland Heart Disease dataset containing 303 patient records with 14 clinical features. After preprocessing and feature engineering, the models were evaluated based on accuracy, precision, recall, and F1-score metrics.

Methodology

  1. Data Collection: Cleveland Heart Disease dataset from UCI Machine Learning Repository
  2. Data Preprocessing: Handling missing values, feature scaling, one-hot encoding
  3. Feature Selection: Correlation analysis, feature importance ranking
  4. Model Training: 5 different classification algorithms
  5. Model Evaluation: Confusion matrix, ROC curves, performance metrics
  6. Web Interface: Interactive dashboard for predictions and visualization

Results

The Random Forest model achieved the highest accuracy of 94% with the following performance metrics:

  • Precision: 0.93
  • Recall: 0.95
  • F1-Score: 0.94
  • AUC-ROC: 0.97

Feature importance analysis revealed that the most significant predictors were:

  1. ST Depression induced by exercise (oldpeak)
  2. Number of major vessels colored by fluoroscopy (ca)
  3. Maximum heart rate achieved (thalach)
  4. Age
  5. Resting blood pressure (trestbps)

Conclusion

The developed system demonstrates that machine learning algorithms can effectively predict heart disease risk based on clinical parameters. The Random Forest model showed superior performance compared to other algorithms.

Future Work:

  • Integration with electronic health records
  • Mobile application development
  • Expansion to other cardiovascular diseases
  • Implementation of deep learning models