This project explores and analyzes life expectancy data provided by the World Health Organization (sourced from Kaggle). Using Python's powerful data science ecosystem, we clean, transform, and visualize insights about global health trends. We then apply machine learning techniques to predict life expectancy based on socioeconomic and health factors.
- Python
- Pandas, Numpy
- Matplotlib, Seaborn
- Scikit-learn
- TensorFlow/Keras
- Mastery in data preprocessing: handling missing values, normalization, and encoding.
- Feature engineering to enhance model inputs.
- Exploratory Data Analysis (EDA) with Matplotlib and Seaborn.
- Training and evaluating machine learning models using TensorFlow/Keras.
- Interpretation of results to derive health policy insights.
- GDP and schooling are strongly positively correlated with higher life expectancy.
- HIV/AIDS prevalence shows a strong negative correlation.
- Countries with better healthcare expenditure generally enjoy higher life expectancy.
- Model performance indicates socioeconomic indicators can be good predictors of life expectancy, though regional anomalies exist.
# Install required libraries
pip install pandas numpy matplotlib seaborn scikit-learn tensorflow