Dynamic Pricing Strategy¶
This notebook provides a comprehensive analysis of retail store inventory data, focusing on demand forecasting, pricing strategies, and revenue optimization. It includes data loading, exploratory data analysis, demand trend analysis, competitor pricing impact, discount strategies, and the development of a pricing model. The insights and models aim to guide data-driven decisions for maximizing revenue while maintaining competitiveness in the market.
Importing the Required Libraries¶
- KaggleHub: For loading datasets directly from Kaggle.
- NumPy: For numerical computations, including array operations and mathematical functions.
- Pandas: For data manipulation and analysis.
- Plotly: For creating interactive visualizations.
- Python: Core programming language used for data analysis and modeling.
- Scikit-learn: For machine learning tasks, including regression modeling.
import kagglehub
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio
pio.renderers.default = 'notebook_connected' # Ensure plots are displayed in the notebook
from kagglehub import KaggleDatasetAdapter
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
Loading the dataset from Kaggle¶
This dataset, sourced from Kaggle, is titled Retail Store Inventory Forecasting Dataset and is provided by Anirudh Chauhan. It contains detailed information about retail store inventory, including pricing, discounts, demand forecasts, and competitor pricing. The dataset is ideal for analyzing demand trends, evaluating pricing strategies, and building predictive models for inventory and revenue optimization.
df = kagglehub.dataset_load(KaggleDatasetAdapter.PANDAS, 'anirudhchauhan/retail-store-inventory-forecasting-dataset', 'retail_store_inventory.csv')
Exploratory Data Analysis¶
- Dataset Overview: The dataset is loaded and basic information such as column names, data types, and missing values are inspected.
- Descriptive Statistics: Summary statistics are calculated to understand the distribution of numerical features.
- Duplicate and Missing Values: The dataset is checked for duplicate rows and missing values.
- Unique Values: The number of unique values in each column is analyzed to identify categorical variables and potential data inconsistencies.
print('Basic Information:')
print(df.info())
print(f'Descriptive Statistics:\n{df.describe()}')
print(f'Duplicate Rows: {df.duplicated().sum()}')
print(f'Missing Values: {df.isnull().sum().sum()}')
print("\nUnique Values per Column:")
for column in df.columns:
print(f"{column}: {df[column].nunique()} unique values")
Basic Information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 73100 entries, 0 to 73099
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 73100 non-null object
1 Store ID 73100 non-null object
2 Product ID 73100 non-null object
3 Category 73100 non-null object
4 Region 73100 non-null object
5 Inventory Level 73100 non-null int64
6 Units Sold 73100 non-null int64
7 Units Ordered 73100 non-null int64
8 Demand Forecast 73100 non-null float64
9 Price 73100 non-null float64
10 Discount 73100 non-null int64
11 Weather Condition 73100 non-null object
12 Holiday/Promotion 73100 non-null int64
13 Competitor Pricing 73100 non-null float64
14 Seasonality 73100 non-null object
dtypes: float64(3), int64(5), object(7)
memory usage: 8.4+ MB
None
Descriptive Statistics:
Inventory Level Units Sold Units Ordered Demand Forecast \
count 73100.000000 73100.000000 73100.000000 73100.000000
mean 274.469877 136.464870 110.004473 141.494720
std 129.949514 108.919406 52.277448 109.254076
min 50.000000 0.000000 20.000000 -9.990000
25% 162.000000 49.000000 65.000000 53.670000
50% 273.000000 107.000000 110.000000 113.015000
75% 387.000000 203.000000 155.000000 208.052500
max 500.000000 499.000000 200.000000 518.550000
Price Discount Holiday/Promotion Competitor Pricing
count 73100.000000 73100.000000 73100.000000 73100.000000
mean 55.135108 10.009508 0.497305 55.146077
std 26.021945 7.083746 0.499996 26.191408
min 10.000000 0.000000 0.000000 5.030000
25% 32.650000 5.000000 0.000000 32.680000
50% 55.050000 10.000000 0.000000 55.010000
75% 77.860000 15.000000 1.000000 77.820000
max 100.000000 20.000000 1.000000 104.940000
Duplicate Rows: 0
Missing Values: 0
Unique Values per Column:
Date: 731 unique values
Store ID: 5 unique values
Product ID: 20 unique values
Category: 5 unique values
Region: 4 unique values
Inventory Level: 451 unique values
Units Sold: 498 unique values
Units Ordered: 181 unique values
Demand Forecast: 31608 unique values
Price: 8999 unique values
Discount: 5 unique values
Weather Condition: 4 unique values
Holiday/Promotion: 2 unique values
Competitor Pricing: 9751 unique values
Seasonality: 4 unique values
Insights from the Output¶
- Dataset Overview: The dataset contains 73,100 rows and 21 columns, with no missing values. Key columns include
Price,Competitor Pricing,Discount, andDemand Forecast. - Descriptive Statistics: Numerical features such as
Price,Discount, andDemand Forecastexhibit diverse ranges, indicating variability in product pricing and demand. - Duplicate and Missing Values: The dataset has no duplicate rows or missing values, ensuring data integrity for analysis.
- Unique Values: Columns like
Category,Region, andSeasonalityhave a limited number of unique values, making them suitable for categorical analysis.
Data Wrangling¶
- Convert Date Column: The
Datecolumn is converted to a datetime format for easier time-based analysis. - Calculate Revenue: A new column
Revenueis created by calculating the revenue for each transaction using the formula:
Revenue = (Price - (Price * Discount / 100)) * Units Sold - Inspect Data: The updated DataFrame is inspected to ensure the transformations were applied correctly.
df['Date'] = pd.to_datetime(df['Date'])
df['Revenue'] = (df['Price'] - (df['Price'] * df['Discount'] / 100)) * df['Units Sold']
print(df.info())
print(df.head())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 73100 entries, 0 to 73099
Data columns (total 16 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 73100 non-null datetime64[ns]
1 Store ID 73100 non-null object
2 Product ID 73100 non-null object
3 Category 73100 non-null object
4 Region 73100 non-null object
5 Inventory Level 73100 non-null int64
6 Units Sold 73100 non-null int64
7 Units Ordered 73100 non-null int64
8 Demand Forecast 73100 non-null float64
9 Price 73100 non-null float64
10 Discount 73100 non-null int64
11 Weather Condition 73100 non-null object
12 Holiday/Promotion 73100 non-null int64
13 Competitor Pricing 73100 non-null float64
14 Seasonality 73100 non-null object
15 Revenue 73100 non-null float64
dtypes: datetime64[ns](1), float64(4), int64(5), object(6)
memory usage: 8.9+ MB
None
Date Store ID Product ID Category Region Inventory Level \
0 2022-01-01 S001 P0001 Groceries North 231
1 2022-01-01 S001 P0002 Toys South 204
2 2022-01-01 S001 P0003 Toys West 102
3 2022-01-01 S001 P0004 Toys North 469
4 2022-01-01 S001 P0005 Electronics East 166
Units Sold Units Ordered Demand Forecast Price Discount \
0 127 55 135.47 33.50 20
1 150 66 144.04 63.01 20
2 65 51 74.02 27.99 10
3 61 164 62.18 32.72 10
4 14 135 9.26 73.64 0
Weather Condition Holiday/Promotion Competitor Pricing Seasonality \
0 Rainy 0 29.69 Autumn
1 Sunny 0 66.16 Autumn
2 Sunny 1 31.32 Summer
3 Cloudy 1 34.74 Autumn
4 Sunny 0 68.95 Summer
Revenue
0 3403.600
1 7561.200
2 1637.415
3 1796.328
4 1030.960
Analyzing Demand Trends¶
- Daily Demand Trends: Groups data by date and calculates total demand to observe daily patterns.
- Weekly and Monthly Trends: Adds Week and Month columns to analyze demand at different time granularities.
- Visualizations: Plots line and bar charts to visualize demand trends over time.
# Group data by date and calculate total demand (assuming a 'Demand' column exists)
demand_trends = df.groupby('Date')['Demand Forecast'].sum()
# Plot the demand trends
fig = go.Figure()
fig.add_trace(go.Scatter(
x=demand_trends.index,
y=demand_trends.values,
mode='lines',
name='Daily Demand',
line=dict(color='#324fff'),
hovertemplate=(
"Date: %{x}<br>"
"Total Demand: %{y}<br>"
"Min Demand: %{customdata[0]:.2f}<br>"
"Mean Demand: %{customdata[1]:.2f}<br>"
"Median Demand: %{customdata[2]:.2f}<br>"
"Max Demand: %{customdata[3]:.2f}<extra></extra>"
),
customdata=np.stack((
[demand_trends.min()] * len(demand_trends),
[demand_trends.mean()] * len(demand_trends),
[demand_trends.median()] * len(demand_trends),
[demand_trends.max()] * len(demand_trends)
), axis=-1)
))
fig.update_layout(
title='Demand Trends Over Time',
xaxis_title='Date',
yaxis_title='Total Demand',
template='plotly_white',
)
fig.show()
# Analyze demand by week & month
df['Week'] = df['Date'].dt.isocalendar().week
df['Month'] = df['Date'].dt.month
weekly_demand = df.groupby('Week')['Demand Forecast'].sum()
monthly_demand = df.groupby('Month')['Demand Forecast'].sum()
# Plot weekly demand
fig_weekly = px.bar(x=weekly_demand.index, y=weekly_demand.values, labels={'x': 'Week', 'y': 'Total Demand'}, title='Weekly Demand')
fig_weekly.update_traces(marker=dict(color=weekly_demand.values, colorscale='viridis', colorbar=dict(title='Total Demand')))
fig_weekly.update_layout(template='plotly_white')
fig_weekly.show()
# Plot monthly demand
fig_monthly = px.bar(x=monthly_demand.index, y=monthly_demand.values, labels={'x': 'Month', 'y': 'Total Demand'}, title='Monthly Demand')
fig_monthly.update_traces(marker=dict(color=monthly_demand.values, colorscale='viridis', colorbar=dict(title='Total Demand')))
fig_monthly.update_layout(template='plotly_white')
fig_monthly.show()
Insights from Data Trends¶
- Daily Demand Forecast Trends: The line chart reveals fluctuations in daily demand forecasts, with noticeable peaks during specific periods, indicating potential seasonality or promotional effects.
- Weekly Demand Forecast Trends: The bar chart shows consistent weekly demand, with Week 52 exhibiting the highest demand forecast, likely due to holiday or year-end promotions.
- Monthly Demand Forecast Trends: The monthly demand forecast analysis highlights July as the peak month, suggesting increased activity during mid-year.
Competitor Pricing Impact¶
- Correlation Analysis: Calculates the correlation between competitor pricing and demand to understand their relationship.
- Scatter Plot: Visualizes how demand varies with competitor pricing.
- Price Difference Analysis: Computes the difference between your price and the competitor's price and analyzes its impact on demand.
- Scatter Plot for Price Difference: Visualizes the relationship between price difference and demand.
## Competitor Pricing Impact
# Check if 'Competitor Pricing' column exists
if 'Competitor Pricing' in df.columns:
# Correlation Analysis
print("\nCorrelation between Competitor Price and Demand:")
print(df[['Competitor Pricing', 'Demand Forecast']].corr())
# Scatter Plot
fig_competitor = px.scatter(
df,
x='Competitor Pricing',
y='Demand Forecast',
title='Demand vs Competitor Price',
labels={'Competitor Pricing': 'Competitor Pricing', 'Demand Forecast': 'Demand Forecast'},
opacity=0.6
)
fig_competitor.update_traces(marker=dict(size=8, color=df['Demand Forecast'], colorscale='Viridis', colorbar=dict(title='Demand Forecast')))
fig_competitor.update_layout(template='plotly_white')
fig_competitor.show()
# Price Difference Analysis
df['Price Difference'] = df['Price'] - df['Competitor Pricing']
print("\nCorrelation between Price Difference and Demand:")
print(df[['Price Difference', 'Demand Forecast']].corr())
# Scatter Plot for Price Difference
fig_price_diff = px.scatter(
df,
x='Price Difference',
y='Demand Forecast',
title='Demand vs Price Difference',
labels={'Price Difference': 'Price Difference (Your Price - Competitor Price)', 'Demand Forecast': 'Demand Forecast'},
opacity=0.6
)
fig_price_diff.update_traces(marker=dict(size=8, color=df['Demand Forecast'], colorscale='Viridis', colorbar=dict(title='Demand Forecast')))
fig_price_diff.update_layout(template='plotly_white')
fig_price_diff.update_traces(
hovertemplate='Price Difference=%{x}<br>Demand Forecast=%{y}<br>Mean Price=%{customdata[0]:.2f}<br>Mean Competitor Price=%{customdata[1]:.2f}<extra></extra>',
customdata=np.stack((df['Price'], df['Competitor Pricing']), axis=-1)
)
fig_price_diff.show()
else:
print("The dataset does not contain a 'Competitor Pricing' column.")
Correlation between Competitor Price and Demand:
Competitor Pricing Demand Forecast
Competitor Pricing 1.000000 0.000844
Demand Forecast 0.000844 1.000000
Correlation between Price Difference and Demand:
Price Difference Demand Forecast
Price Difference 1.00000 -0.00157
Demand Forecast -0.00157 1.00000