Measure Data Spread Using Dispersion Metrics

Business Scenario

Welcome!

Today is your ninth day as a Junior Data Analyst at a retail analytics company.

After calculating descriptive statistics such as Mean, Median, and Mode, businesses also need to understand how widely their data is spread. Two datasets may have the same average value but very different levels of variation

Pre-Lab Preparation

Click here to download previous lab file: DM LAB 8

Git Pull

git pull origin branchName

Dispersion metrics help analysts measure consistency, identify fluctuations in sales performance, evaluate customer behavior, and detect unusual patterns in retail operations.

Understanding data spread enables better forecasting, inventory planning, and business decision-making.

Topic: Decoding Your Data

1)  Measuring Data Spread: Dispersion Insights

Click to download Dataset : Retail_Dataset_Cleaned

Task 1: Understanding Variance and Standard Deviation

Retail organizations frequently analyze sales and revenue data to understand business performance. Knowing the average revenue alone is not enough.

Analysts must determine:

  • How much revenue varies across transactions
  • Whether sales are consistent
  • Whether extreme values exist

Two important dispersion metrics are:

Variance

Measures how far data points are spread from the mean.

Standard Deviation

Measures the average distance of data points from the mean.

What is Dispersion?

3

Upload the Retail Dataset

4

Load Dataset Using Pandas

df = pd.read_csv("/content/Retail_Dataset_Modified.csv")

print("Dataset Loaded Successfully")

Open Google Colab

1

2

Import Required Libraries

import pandas as pd
import numpy as np

Dispersion refers to the degree of spread or variability present in a dataset.

A small dispersion indicates data values are close to the average.

A large dispersion indicates data values are widely spread.

Display First Five Records

5

df.head()

6

Check Dataset Information

df.info()

7

Calculate Variance of Revenue

revenue_variance = df["Revenue"].var()

print("Revenue Variance:", revenue_variance)

 

                                Variance=

Formula

8

Calculate Standard Deviation of Revenue

revenue_std = df["Revenue"].std()

print("Revenue Standard Deviation:", revenue_std)

                            

                           Standard Deviation=

Formula

9

8

Compare Revenue Variance and Standard Deviation

comparison = pd.DataFrame({
    "Metric": ["Variance", "Standard Deviation"],
    "Value": [revenue_variance, revenue_std]
})

comparison

10

 Calculate Variance of Units Sold

units_variance = df["Units_Sold"].var()

print("Units Sold Variance:", units_variance)

11

Calculate Standard Deviation of Units Sold

units_std = df["Units_Sold"].std()

print("Units Sold Standard Deviation:", units_std)

12

Analyze Delivery Time Variability

delivery_std = df["Delivery_Time"].std()

print("Delivery Time Standard Deviation:", delivery_std)

Display Results

13

print("Revenue Variance:", revenue_variance)
print("Revenue Standard Deviation:", revenue_std)

print("Units Sold Variance:", units_variance)
print("Units Sold Standard Deviation:", units_std)

 

Great job!

You have successfully completed your lab on Measure Data Spread Using Dispersion Metrics.

Checkpoint

In this lab, you have: Calculated Variance for business metrics, Measured Standard Deviation of sales data, Analyzed Revenue variability, Evaluated Inventory fluctuations, Examined Customer Satisfaction consistency, Compared dispersion across multiple retail metrics, Extracted meaningful insights from data spread

You are now ready to move to the next stage of Junior Data Analyst.

   Git Push

git push origin branchName

Next-Lab Preparation

Topic: Decoding Your Data

1) Decoding Skewness: Understanding Data Distribution

DM9 LAB: Measure Data Spread Using Dispersion Metrics

By Content ITV

DM9 LAB: Measure Data Spread Using Dispersion Metrics

  • 25