Importing the Libraries¶
In [1]:
import kagglehub
import os
import pandas as pd
Loading the Kaggle Dataset¶
In [2]:
from kagglehub import KaggleDatasetAdapter
df = kagglehub.dataset_load(KaggleDatasetAdapter.PANDAS, "yasserh/walmart-dataset", "Walmart.csv") # Loads the latest version
df.to_csv("Walmart.csv", index=False)
Data Overview¶
In [3]:
df = pd.read_csv("Walmart.csv")
print(df.columns)
print(df.dtypes)
Index(['Store', 'Date', 'Weekly_Sales', 'Holiday_Flag', 'Temperature', 'Fuel_Price', 'CPI', 'Unemployment'], dtype='object') Store int64 Date object Weekly_Sales float64 Holiday_Flag int64 Temperature float64 Fuel_Price float64 CPI float64 Unemployment float64 dtype: object
Transform data types¶
The 'Date' column is converted to datetime format and 'Holiday_Flag' is converted from a bit integer to a boolean type. This ensures that each column has an appropriate and meaningful data type for further processing.
In [4]:
df['Date'] = pd.to_datetime(df['Date'], format='%d-%m-%Y')
df['Holiday_Flag'] = df['Holiday_Flag'].astype('bool')
print(df.dtypes)
df.to_csv("Walmart.csv", index=False)
print("DataFrame saved to Walmart.csv")
Store int64 Date datetime64[ns] Weekly_Sales float64 Holiday_Flag bool Temperature float64 Fuel_Price float64 CPI float64 Unemployment float64 dtype: object DataFrame saved to Walmart.csv
Load the dataset to PostgreSQL¶
In [5]:
from sqlalchemy import create_engine
db_user = 'postgres'
db_password = '1234'
db_host = 'localhost'
db_port = '5432'
db_name = 'retail-data-insights'
engine = create_engine(f'postgresql://{db_user}:{db_password}@{db_host}:{db_port}/{db_name}')
df.to_sql('Walmart', engine, if_exists='replace', index=False)
print("DataFrame loaded to PostgreSQL table 'Walmart'.")
DataFrame loaded to PostgreSQL table 'Walmart'.