Navigate and Filter Data Using Pandas DataFrames

Business Scenario

Welcome!

Today is your fifth day as a Junior Data Analyst at a retail analytics company.

Retail organizations store sales, customer, product, and inventory information in tabular formats. Data analysts use Pandas DataFrames to navigate through large datasets, access specific records, and filter information based on business requirements.

Understanding how to work with DataFrames and apply filtering techniques helps analysts quickly identify trends, monitor performance, and support business decisions.

Pre-Lab Preparation

Topic: Data Manipulation with Pandas

1) Navigating DataFrames and Series Using Pandas & Polars

2) Data Preparation Essentials

Click here to download previous lab file: DM LAB 4

git pull origin branchName

Git Pull

Click to download Dataset : Retail_Dataset

Task 1: Work with DataFrames

In retail analytics, data is often stored in rows and columns. Pandas DataFrames provide powerful tools to organize, explore, and manipulate this information efficiently.

What is a DataFrame?

A DataFrame is a two-dimensional tabular data structure consisting of rows and columns. It allows analysts to perform data exploration, cleaning, transformation, and analysis.

Open Google Colab

1

2

Import Required Libraries

import pandas as pd
import numpy as np

3

Upload the Retail Dataset

4

Load Dataset Using Pandas

df = pd.read_csv("/content/Retail_Dataset_Modified.csv")

print("Dataset Loaded Successfully")

Display First Five Records

df.head()

5

7

6

Display Last Five Records

df.tail()

Display Random Sample Records

df.sample(5)

8

Display Column Names

print("Columns in Dataset:")
print(df.columns)

9

8

Display Data Types

print("Data Types:")
print(df.dtypes)

Display Dataset Information

df.info()

10

Display Statistical Summary

print(df.describe())

11

Check Missing Values

print("Missing Values:")
print(df.isnull().sum())

12

13

Display Unique Values from Product Category

print(df['Category'].unique())

Count Occurrences of Each Product Category

print(df['Category'].value_counts())

14

Select Specific Columns

selected_columns = df[['Revenue',
                        'Units_Sold',
                        'Profit_Margin']]

print(selected_columns.head())

15

16

Access a Specific Row

print(df.iloc[0])

17

Display Dataset Shape

print("Dataset Shape:")
print(df.shape)

Task 2: Filter Data

Retail analysts frequently filter data to identify specific business conditions such as high revenue transactions, low inventory products, or top-performing categories.

What is Filtering?

Filtering is the process of selecting rows that satisfy specific conditions.

1

Filter Products with Unit Price Greater Than 1000

high_price = df[df['Unit_Price'] > 1000]
high_price[['Product_Name', 'Category', 'Unit_Price']].head()

Filter Products with Unit Price Less Than 500

2

low_price = df[df['Unit_Price'] < 500]
low_price[['Product_Name', 'Category', 'Unit_Price']].head()

3

Filter Products with Unit Price Between 500 and 1500

medium_price = df[
    (df['Unit_Price'] >= 500) &
    (df['Unit_Price'] <= 1500)
]
medium_price[['Product_Name', 'Category', 'Unit_Price']].head()

4

Sort Data by Unit Price (Highest to Lowest)

sorted_price = df.sort_values(
    by='Unit_Price',
    ascending=False
)
sorted_price[['Product_Name', 'Category', 'Unit_Price']].head()

5

Display Top 10 Highest Unit Price Products

top_price = df.nlargest(10, 'Unit_Price')
print(top_price[['Unit_Price']])

6

Count Records with Unit Price Greater Than 1000

print("Number of Products with Unit Price > 1000:")
print(len(high_price))

 

Great job!

You have successfully completed your lab on Navigate and Filter Data Using Pandas DataFrames.In this lab, you have: Loaded data into Pandas DataFrames, Explored rows and columns, Accessed specific records, Filtered data using conditions, Applied multiple filtering criteria, Sorted and analyzed business data

You are now ready to move to the next stage of Junior Data Analyst.

Checkpoint

   Git Push

git push origin branchName

Next-Lab Preparation

Topic: Data Manipulation with Pandas

1) Polishing Data: From Messy To Meaningful