Navigate and Filter Data Using Pandas DataFrames
Business Scenario
Welcome!
Today is your fifth day as a Junior Data Analyst at a retail analytics company.
Retail organizations store sales, customer, product, and inventory information in tabular formats. Data analysts use Pandas DataFrames to navigate through large datasets, access specific records, and filter information based on business requirements.
Understanding how to work with DataFrames and apply filtering techniques helps analysts quickly identify trends, monitor performance, and support business decisions.
Pre-Lab Preparation
Topic: Data Manipulation with Pandas
1) Navigating DataFrames and Series Using Pandas & Polars
2) Data Preparation Essentials
Click here to download previous lab file: DM LAB 4
git pull origin branchNameGit Pull
Click to download Dataset : Retail_Dataset
Task 1: Work with DataFrames
In retail analytics, data is often stored in rows and columns. Pandas DataFrames provide powerful tools to organize, explore, and manipulate this information efficiently.
What is a DataFrame?
A DataFrame is a two-dimensional tabular data structure consisting of rows and columns. It allows analysts to perform data exploration, cleaning, transformation, and analysis.
Open Google Colab
1
2
Import Required Libraries
import pandas as pd
import numpy as np3
Upload the Retail Dataset
4
Load Dataset Using Pandas
df = pd.read_csv("/content/Retail_Dataset_Modified.csv")
print("Dataset Loaded Successfully")Display First Five Records
df.head()5
7
6
Display Last Five Records
df.tail()Display Random Sample Records
df.sample(5)8
Display Column Names
print("Columns in Dataset:")
print(df.columns)9
8
Display Data Types
print("Data Types:")
print(df.dtypes)Display Dataset Information
df.info()10
Display Statistical Summary
print(df.describe())11
Check Missing Values
print("Missing Values:")
print(df.isnull().sum())12
13
Display Unique Values from Product Category
print(df['Category'].unique())Count Occurrences of Each Product Category
print(df['Category'].value_counts())14
Select Specific Columns
selected_columns = df[['Revenue',
'Units_Sold',
'Profit_Margin']]
print(selected_columns.head())15
16
Access a Specific Row
print(df.iloc[0])17
Display Dataset Shape
print("Dataset Shape:")
print(df.shape)Task 2: Filter Data
Retail analysts frequently filter data to identify specific business conditions such as high revenue transactions, low inventory products, or top-performing categories.
What is Filtering?
Filtering is the process of selecting rows that satisfy specific conditions.
1
Filter Products with Unit Price Greater Than 1000
high_price = df[df['Unit_Price'] > 1000]
high_price[['Product_Name', 'Category', 'Unit_Price']].head()Filter Products with Unit Price Less Than 500
2
low_price = df[df['Unit_Price'] < 500]
low_price[['Product_Name', 'Category', 'Unit_Price']].head()3
Filter Products with Unit Price Between 500 and 1500
medium_price = df[
(df['Unit_Price'] >= 500) &
(df['Unit_Price'] <= 1500)
]
medium_price[['Product_Name', 'Category', 'Unit_Price']].head()4
Sort Data by Unit Price (Highest to Lowest)
sorted_price = df.sort_values(
by='Unit_Price',
ascending=False
)
sorted_price[['Product_Name', 'Category', 'Unit_Price']].head()5
Display Top 10 Highest Unit Price Products
top_price = df.nlargest(10, 'Unit_Price')
print(top_price[['Unit_Price']])6
Count Records with Unit Price Greater Than 1000
print("Number of Products with Unit Price > 1000:")
print(len(high_price))
Great job!
You have successfully completed your lab on Navigate and Filter Data Using Pandas DataFrames.In this lab, you have: Loaded data into Pandas DataFrames, Explored rows and columns, Accessed specific records, Filtered data using conditions, Applied multiple filtering criteria, Sorted and analyzed business data
You are now ready to move to the next stage of Junior Data Analyst.
Checkpoint
Git Push
git push origin branchNameNext-Lab Preparation
Topic: Data Manipulation with Pandas
1) Polishing Data: From Messy To Meaningful