Dynamic Pricing Strategy¶
This notebook provides a comprehensive analysis of retail store inventory data, focusing on demand forecasting, pricing strategies, and revenue optimization. It includes data loading, exploratory data analysis, demand trend analysis, competitor pricing impact, discount strategies, and the development of a pricing model. The insights and models aim to guide data-driven decisions for maximizing revenue while maintaining competitiveness in the market.
Importing the Required Libraries¶
- KaggleHub: For loading datasets directly from Kaggle.
- NumPy: For numerical computations, including array operations and mathematical functions.
- Pandas: For data manipulation and analysis.
- Plotly: For creating interactive visualizations.
- Python: Core programming language used for data analysis and modeling.
- Scikit-learn: For machine learning tasks, including regression modeling.
import kagglehub
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio
pio.renderers.default = 'notebook_connected' # Ensure plots are displayed in the notebook
from kagglehub import KaggleDatasetAdapter
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
Loading the dataset from Kaggle¶
This dataset, sourced from Kaggle, is titled Retail Store Inventory Forecasting Dataset and is provided by Anirudh Chauhan. It contains detailed information about retail store inventory, including pricing, discounts, demand forecasts, and competitor pricing. The dataset is ideal for analyzing demand trends, evaluating pricing strategies, and building predictive models for inventory and revenue optimization.
df = kagglehub.dataset_load(KaggleDatasetAdapter.PANDAS, 'anirudhchauhan/retail-store-inventory-forecasting-dataset', 'retail_store_inventory.csv')
Exploratory Data Analysis¶
- Dataset Overview: The dataset is loaded and basic information such as column names, data types, and missing values are inspected.
- Descriptive Statistics: Summary statistics are calculated to understand the distribution of numerical features.
- Duplicate and Missing Values: The dataset is checked for duplicate rows and missing values.
- Unique Values: The number of unique values in each column is analyzed to identify categorical variables and potential data inconsistencies.
print('Basic Information:')
print(df.info())
print(f'Descriptive Statistics:\n{df.describe()}')
print(f'Duplicate Rows: {df.duplicated().sum()}')
print(f'Missing Values: {df.isnull().sum().sum()}')
print("\nUnique Values per Column:")
for column in df.columns:
print(f"{column}: {df[column].nunique()} unique values")
Basic Information: <class 'pandas.core.frame.DataFrame'> RangeIndex: 73100 entries, 0 to 73099 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date 73100 non-null object 1 Store ID 73100 non-null object 2 Product ID 73100 non-null object 3 Category 73100 non-null object 4 Region 73100 non-null object 5 Inventory Level 73100 non-null int64 6 Units Sold 73100 non-null int64 7 Units Ordered 73100 non-null int64 8 Demand Forecast 73100 non-null float64 9 Price 73100 non-null float64 10 Discount 73100 non-null int64 11 Weather Condition 73100 non-null object 12 Holiday/Promotion 73100 non-null int64 13 Competitor Pricing 73100 non-null float64 14 Seasonality 73100 non-null object dtypes: float64(3), int64(5), object(7) memory usage: 8.4+ MB None Descriptive Statistics: Inventory Level Units Sold Units Ordered Demand Forecast \ count 73100.000000 73100.000000 73100.000000 73100.000000 mean 274.469877 136.464870 110.004473 141.494720 std 129.949514 108.919406 52.277448 109.254076 min 50.000000 0.000000 20.000000 -9.990000 25% 162.000000 49.000000 65.000000 53.670000 50% 273.000000 107.000000 110.000000 113.015000 75% 387.000000 203.000000 155.000000 208.052500 max 500.000000 499.000000 200.000000 518.550000 Price Discount Holiday/Promotion Competitor Pricing count 73100.000000 73100.000000 73100.000000 73100.000000 mean 55.135108 10.009508 0.497305 55.146077 std 26.021945 7.083746 0.499996 26.191408 min 10.000000 0.000000 0.000000 5.030000 25% 32.650000 5.000000 0.000000 32.680000 50% 55.050000 10.000000 0.000000 55.010000 75% 77.860000 15.000000 1.000000 77.820000 max 100.000000 20.000000 1.000000 104.940000 Duplicate Rows: 0 Missing Values: 0 Unique Values per Column: Date: 731 unique values Store ID: 5 unique values Product ID: 20 unique values Category: 5 unique values Region: 4 unique values Inventory Level: 451 unique values Units Sold: 498 unique values Units Ordered: 181 unique values Demand Forecast: 31608 unique values Price: 8999 unique values Discount: 5 unique values Weather Condition: 4 unique values Holiday/Promotion: 2 unique values Competitor Pricing: 9751 unique values Seasonality: 4 unique values
Insights from the Output¶
- Dataset Overview: The dataset contains 73,100 rows and 21 columns, with no missing values. Key columns include
Price
,Competitor Pricing
,Discount
, andDemand Forecast
. - Descriptive Statistics: Numerical features such as
Price
,Discount
, andDemand Forecast
exhibit diverse ranges, indicating variability in product pricing and demand. - Duplicate and Missing Values: The dataset has no duplicate rows or missing values, ensuring data integrity for analysis.
- Unique Values: Columns like
Category
,Region
, andSeasonality
have a limited number of unique values, making them suitable for categorical analysis.
Data Wrangling¶
- Convert Date Column: The
Date
column is converted to a datetime format for easier time-based analysis. - Calculate Revenue: A new column
Revenue
is created by calculating the revenue for each transaction using the formula:
Revenue = (Price - (Price * Discount / 100)) * Units Sold - Inspect Data: The updated DataFrame is inspected to ensure the transformations were applied correctly.
df['Date'] = pd.to_datetime(df['Date'])
df['Revenue'] = (df['Price'] - (df['Price'] * df['Discount'] / 100)) * df['Units Sold']
print(df.info())
print(df.head())
<class 'pandas.core.frame.DataFrame'> RangeIndex: 73100 entries, 0 to 73099 Data columns (total 16 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date 73100 non-null datetime64[ns] 1 Store ID 73100 non-null object 2 Product ID 73100 non-null object 3 Category 73100 non-null object 4 Region 73100 non-null object 5 Inventory Level 73100 non-null int64 6 Units Sold 73100 non-null int64 7 Units Ordered 73100 non-null int64 8 Demand Forecast 73100 non-null float64 9 Price 73100 non-null float64 10 Discount 73100 non-null int64 11 Weather Condition 73100 non-null object 12 Holiday/Promotion 73100 non-null int64 13 Competitor Pricing 73100 non-null float64 14 Seasonality 73100 non-null object 15 Revenue 73100 non-null float64 dtypes: datetime64[ns](1), float64(4), int64(5), object(6) memory usage: 8.9+ MB None Date Store ID Product ID Category Region Inventory Level \ 0 2022-01-01 S001 P0001 Groceries North 231 1 2022-01-01 S001 P0002 Toys South 204 2 2022-01-01 S001 P0003 Toys West 102 3 2022-01-01 S001 P0004 Toys North 469 4 2022-01-01 S001 P0005 Electronics East 166 Units Sold Units Ordered Demand Forecast Price Discount \ 0 127 55 135.47 33.50 20 1 150 66 144.04 63.01 20 2 65 51 74.02 27.99 10 3 61 164 62.18 32.72 10 4 14 135 9.26 73.64 0 Weather Condition Holiday/Promotion Competitor Pricing Seasonality \ 0 Rainy 0 29.69 Autumn 1 Sunny 0 66.16 Autumn 2 Sunny 1 31.32 Summer 3 Cloudy 1 34.74 Autumn 4 Sunny 0 68.95 Summer Revenue 0 3403.600 1 7561.200 2 1637.415 3 1796.328 4 1030.960
Analyzing Demand Trends¶
- Daily Demand Trends: Groups data by date and calculates total demand to observe daily patterns.
- Weekly and Monthly Trends: Adds Week and Month columns to analyze demand at different time granularities.
- Visualizations: Plots line and bar charts to visualize demand trends over time.
# Group data by date and calculate total demand (assuming a 'Demand' column exists)
demand_trends = df.groupby('Date')['Demand Forecast'].sum()
# Plot the demand trends
fig = go.Figure()
fig.add_trace(go.Scatter(
x=demand_trends.index,
y=demand_trends.values,
mode='lines',
name='Daily Demand',
line=dict(color='#324fff'),
hovertemplate=(
"Date: %{x}<br>"
"Total Demand: %{y}<br>"
"Min Demand: %{customdata[0]:.2f}<br>"
"Mean Demand: %{customdata[1]:.2f}<br>"
"Median Demand: %{customdata[2]:.2f}<br>"
"Max Demand: %{customdata[3]:.2f}<extra></extra>"
),
customdata=np.stack((
[demand_trends.min()] * len(demand_trends),
[demand_trends.mean()] * len(demand_trends),
[demand_trends.median()] * len(demand_trends),
[demand_trends.max()] * len(demand_trends)
), axis=-1)
))
fig.update_layout(
title='Demand Trends Over Time',
xaxis_title='Date',
yaxis_title='Total Demand',
template='plotly_white',
)
fig.show()
# Analyze demand by week & month
df['Week'] = df['Date'].dt.isocalendar().week
df['Month'] = df['Date'].dt.month
weekly_demand = df.groupby('Week')['Demand Forecast'].sum()
monthly_demand = df.groupby('Month')['Demand Forecast'].sum()
# Plot weekly demand
fig_weekly = px.bar(x=weekly_demand.index, y=weekly_demand.values, labels={'x': 'Week', 'y': 'Total Demand'}, title='Weekly Demand')
fig_weekly.update_traces(marker=dict(color=weekly_demand.values, colorscale='viridis', colorbar=dict(title='Total Demand')))
fig_weekly.update_layout(template='plotly_white')
fig_weekly.show()
# Plot monthly demand
fig_monthly = px.bar(x=monthly_demand.index, y=monthly_demand.values, labels={'x': 'Month', 'y': 'Total Demand'}, title='Monthly Demand')
fig_monthly.update_traces(marker=dict(color=monthly_demand.values, colorscale='viridis', colorbar=dict(title='Total Demand')))
fig_monthly.update_layout(template='plotly_white')
fig_monthly.show()
Insights from Data Trends¶
- Daily Demand Forecast Trends: The line chart reveals fluctuations in daily demand forecasts, with noticeable peaks during specific periods, indicating potential seasonality or promotional effects.
- Weekly Demand Forecast Trends: The bar chart shows consistent weekly demand, with Week 52 exhibiting the highest demand forecast, likely due to holiday or year-end promotions.
- Monthly Demand Forecast Trends: The monthly demand forecast analysis highlights July as the peak month, suggesting increased activity during mid-year.
Competitor Pricing Impact¶
- Correlation Analysis: Calculates the correlation between competitor pricing and demand to understand their relationship.
- Scatter Plot: Visualizes how demand varies with competitor pricing.
- Price Difference Analysis: Computes the difference between your price and the competitor's price and analyzes its impact on demand.
- Scatter Plot for Price Difference: Visualizes the relationship between price difference and demand.
## Competitor Pricing Impact
# Check if 'Competitor Pricing' column exists
if 'Competitor Pricing' in df.columns:
# Correlation Analysis
print("\nCorrelation between Competitor Price and Demand:")
print(df[['Competitor Pricing', 'Demand Forecast']].corr())
# Scatter Plot
fig_competitor = px.scatter(
df,
x='Competitor Pricing',
y='Demand Forecast',
title='Demand vs Competitor Price',
labels={'Competitor Pricing': 'Competitor Pricing', 'Demand Forecast': 'Demand Forecast'},
opacity=0.6
)
fig_competitor.update_traces(marker=dict(size=8, color=df['Demand Forecast'], colorscale='Viridis', colorbar=dict(title='Demand Forecast')))
fig_competitor.update_layout(template='plotly_white')
fig_competitor.show()
# Price Difference Analysis
df['Price Difference'] = df['Price'] - df['Competitor Pricing']
print("\nCorrelation between Price Difference and Demand:")
print(df[['Price Difference', 'Demand Forecast']].corr())
# Scatter Plot for Price Difference
fig_price_diff = px.scatter(
df,
x='Price Difference',
y='Demand Forecast',
title='Demand vs Price Difference',
labels={'Price Difference': 'Price Difference (Your Price - Competitor Price)', 'Demand Forecast': 'Demand Forecast'},
opacity=0.6
)
fig_price_diff.update_traces(marker=dict(size=8, color=df['Demand Forecast'], colorscale='Viridis', colorbar=dict(title='Demand Forecast')))
fig_price_diff.update_layout(template='plotly_white')
fig_price_diff.update_traces(
hovertemplate='Price Difference=%{x}<br>Demand Forecast=%{y}<br>Mean Price=%{customdata[0]:.2f}<br>Mean Competitor Price=%{customdata[1]:.2f}<extra></extra>',
customdata=np.stack((df['Price'], df['Competitor Pricing']), axis=-1)
)
fig_price_diff.show()
else:
print("The dataset does not contain a 'Competitor Pricing' column.")
Correlation between Competitor Price and Demand: Competitor Pricing Demand Forecast Competitor Pricing 1.000000 0.000844 Demand Forecast 0.000844 1.000000
Correlation between Price Difference and Demand: Price Difference Demand Forecast Price Difference 1.00000 -0.00157 Demand Forecast -0.00157 1.00000