Data Analysis using python Pandas , Numpy, and Matplotlib
Updated
•2 min read
🔧 Tools Covered:
Python
Pandas
NumPy
Matplotlib
1.Introduction
Data is everywhere—and making sense of it is the real challenge. In this post, I’ll walk you through how I analyzed real-world datasets using Python libraries like Pandas, NumPy, and Matplotlib. Whether you're a beginner or brushing up your skills, this guide will help you get started.
2. Loading and Exploring the Dataset
Let’s begin by importing the necessary libraries and loading the dataset.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('netflix_titles.csv')
df.head()
This gives us a first look at the structure of our data.
3. Data Cleaning
Now, let’s clean the data by handling missing values and checking for duplicates.
```python
df.isnull().sum()
df.drop_duplicates(inplace=True)
We removed duplicates and can now deal with missing values accordingly.
4. Data Analysis
Let’s answer some interesting questions like:
- What’s the most popular genre?
- How many shows were released each year?
```python
df['release_year'] = pd.DatetimeIndex(df['date_added']).year
df['release_year'].value_counts().sort_index().plot(kind='bar', figsize=(10,4))
plt.title("Content Released Per Year")
plt.xlabel("Year")
plt.ylabel("Number of Titles")
plt.grid(True)
plt.tight_layout()
plt.show()
5. Visualizations
Visualization makes data stories clearer. Here’s a pie chart showing the distribution of content types:
```python
df['type'].value_counts().plot(kind='pie', autopct='%1.1f%%', colors=['#66b3ff','#ff9999'])
plt.title("Distribution of Content Types")
plt.ylabel("")
plt.show()
6. Conclusion
This project helped me understand the power of Python for data analysis. Libraries like Pandas, NumPy, and Matplotlib simplify the process from cleaning to visualizing. I’ll continue sharing more deep-dives into real-world datasets, so stay tuned!