Data Analysis using python Pandas , Numpy, and Matplotlib

Dipin Silwal — Wed, 02 Jul 2025 05:54:28 GMT

🔧 Tools Covered:

Python
Pandas
NumPy
Matplotlib

1.Introduction

Data is everywhere—and making sense of it is the real challenge. In this post, I’ll walk you through how I analyzed real-world datasets using Python libraries like Pandas, NumPy, and Matplotlib. Whether you're a beginner or brushing up your skills, this guide will help you get started.

2. Loading and Exploring the Dataset

Let’s begin by importing the necessary libraries and loading the dataset.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.read_csv('netflix_titles.csv')
df.head()

This gives us a first look at the structure of our data.

3. Data Cleaning

Now, let’s clean the data by handling missing values and checking for duplicates.

```python
df.isnull().sum()
df.drop_duplicates(inplace=True)

We removed duplicates and can now deal with missing values accordingly.

4. Data Analysis

Let’s answer some interesting questions like:

- What’s the most popular genre?
- How many shows were released each year?

```python
df['release_year'] = pd.DatetimeIndex(df['date_added']).year
df['release_year'].value_counts().sort_index().plot(kind='bar', figsize=(10,4))
plt.title("Content Released Per Year")
plt.xlabel("Year")
plt.ylabel("Number of Titles")
plt.grid(True)
plt.tight_layout()
plt.show()

5. Visualizations

Visualization makes data stories clearer. Here’s a pie chart showing the distribution of content types:

```python
df['type'].value_counts().plot(kind='pie', autopct='%1.1f%%', colors=['#66b3ff','#ff9999'])
plt.title("Distribution of Content Types")
plt.ylabel("")
plt.show()

6. Conclusion

This project helped me understand the power of Python for data analysis. Libraries like Pandas, NumPy, and Matplotlib simplify the process from cleaning to visualizing. I’ll continue sharing more deep-dives into real-world datasets, so stay tuned!

“Getting Started with Data Analysis"