Regression on Price Promotion data

Patiparn Nualchan
10 min readJan 26, 2023

--

How to use regression model to predict offtake(sales_out) on sales_in , number of store on promotion data in retail business.

Hello, it again for me to talk about sales forecasting/prediction in retails. I worked in this domain so long, sales is always the first of all.

In previous article, I used to talk about sales forecasting on one-single sales data(univariate) by sktime package and one another article was sales forecasting with Deep learning technic.

This article I will work on Tabular Data/ Table Data / Datafram data (whatever name) but it mean we work on multivariate.

1. Data set

I have Tabular of tain and test data(use it as unseen data) in csv. It contain: date, sales_in, number of stores, offtake(average sales_out pre store) as our target in numerical type of data and one categirical data pro(price_promotion)

my interested in this project was the relation between sales_in — offtake and stores — offtake. how do they were related to? a lot of stores number should get high offtake? a lot of sales_in should get high offtake as well? Is that correct? , I would like to explore in this preject. let’s go together!

df.tail to see the data from the bottom by author

after loaded and did some data cleaning, my first action was simple to generate more useful feature from existing feature such as date(time) to get year, month and week_of_year and get dummy of categirical data

dataset afte extracted some feature such as Date(time) and get dummy of categirical data by author

now we expand from original 5 features to 12 features columns.

2. Data Visualization

Now the data have no any missing values, we can explore it to get more insigh before any model making.

We start with our offtake (target), stores and sales_in.

  • offtake: was stable, up and down in the range of 5–12(approximately)
  • Stores: Trend Line was noticable going down and decline
  • Sales_in: seem to be stable but we got 3 strange peak(investigated need)
sales_in after remove outlier .

For 3 peaks of Sales_in, author decided to call it as outlier that need to remove, so the new chart above show Sales_in after remove outlier.

For categirical data [‘Pro’], countplot show the normal situation of promotion. The normal price(RSP) was highest and so on in other price promotion.

I this article we will cut hevvy level of promotion such as BOGO (buy1 get1) and buy2+1 off. To keep just only normal price discount level.

hevvy promotion is predictable, the offtake is going to be high. We need to explore what we can’t expect or assump by our ideas, so we cut it off !!! (it mean we need the power of Machine Learning to help us!)

3. Feature checking

To describe each feature base on statistics measurement. we inspected: ‘types’, ‘counts’, ‘distincts’, ‘nulls’, ‘missing_ration’, ‘uniques’, ‘skewness’, ‘kurtosis’, corr_col : we define not a code to inspect below.

obs = df_train.shape[0]
types = df_train.dtypes
counts = df_train.apply(lambda x: x.count())
uniques = df_train.apply(lambda x: [x.unique()]).transpose()
nulls = df_train.apply(lambda x: x.isnull().sum())
distincts = df_train.apply(lambda x: x.unique().shape[0])
missing_ration = (df_train.isnull().sum()/ obs) * 100
skewness = df_train.skew()
kurtosis = df_train.kurt()

corr = df_train.corr()['offtake']
corr_col = 'corr ' + 'offtake'
str = pd.concat([types, counts, distincts, nulls, missing_ration, uniques, skewness, kurtosis, corr], axis = 1, sort=False)
cols = ['types', 'counts', 'distincts', 'nulls', 'missing_ration', 'uniques', 'skewness', 'kurtosis', corr_col ]
str.columns = cols
str.sort_values(by ='corr offtake', ascending=False)

‘corr offtake’ showed relation between each feature call correlation. It represent relation between feature in the range number between -1 till 1.

by Feature checking Table, we see pro_2for was strongest relation with offtake in the positive and pro_RSP was also strongest relation with offtake in the negative. our interested feature sales-in was abit in negative and stores was quite ok in positive.

This chart show correlation, easy to see and summarize. Our 2 main focus variables Sales_in and Stores was 0.03 and 0.20

scatterplot show and support score 0.03 and 0.20 of Sales_in and Stores.

Untill this step, we can roughly concluded that the offtake is going to be high and higher with promotion doing and go back with RSP price. Our interested feature Sales_in and Stores have no any effect or relate to the offtake much. It seem to be simple conclusion but It make sense and practical in the real world. The question at this step is how to take those features to predict the offtake and which is the best important feature by score.

4. Preprocessing

Any model need data format correctly to train, so Preprocessing is needed.

This article have 2 set of data train and test, We import and work anything on train and keep the test as unseen data to test final model.

Work on train data, start with split the data as train and valid set. In this project the data was small. it just 102 record so I decided to split train[:90] and test[90:]

train and valid set by author

it better to check Target(y) variation both of train and valid and it should be close and perform similarly each other as below

Target(y) check by author

standardized and normalized

Standardizing, the features around the center and 0 with a standard deviation of 1 is important when we compare measurements that have different units. Variables that are measured at different scales do not contribute equally to the analysis and might end up creating a bais

Normalization, the goal of normalization is to change the values of numeric columns in the dataset to a common scale, without distorting differences in the ranges of values. For machine learning, every dataset does not require normalization. It is required only when features have different ranges.

above article was a good article to explain standardized and normalized.

for my project, I used MinMaxScaler() for normalization and StandardScaler for Standardizing.

standardized and normalized step by author

again for displot for check variation if X both of train and Valid after standardized and normalized. It was good.

distribution after standardized and normalized by author

Now we have X_train_std ready to fit in the model.

5. Model & Measure regression performance

As show in the headline “Regression” so I will explore Regression model to predict the sales. in conclude I did 5 treebase models

  1. DecisionTreeRegressor (as baseline)
  2. DecisionTreeRegressor + TimeSeriesSplit Validation
  3. DecisionTreeRegressor + TimeSeriesSplit Validation + more params
  4. RandomForestRegressor + TimeSeriesSplit Validation +more params
  5. Xgboost + TimeSeriesSplit Validation + more params

For metric eveluation, I choose R-Square to get score (high is good) and MAE(mean_absolute_error) to see the error(low is good).

let’s me show some interesting part and compare it to you.

1. DecisionTreeRegressor (as baseline) was predict as show below.

1. DecisionTreeRegressor prediction plot by author
1. DecisionTreeRegressor model preformance plot by author

2. DecisionTreeRegressor + TimeSeriesSplit Validation.

for the second model we plus on baseline with TimeSeriesSplit for Validation.(I deciede to split sequentiallly to not shuffle the data, for me it make sense and pretical when use and explain to my team and boss)

cv_ls was a varible to contain set of train and valid that was splited. I set 5 for n_split, mean divide whole data as 5 fold. 10 for test_size, mean 10 data point had splited to valid in each fold and step ahead in next fold.

2. DecisionTreeRegressor + TimeSeriesSplit Validation prediction plot by author
2. DecisionTreeRegressor + TimeSeriesSplit Validation model preformance plot by author

both of 1 and 2, model well perform in Train set with morethan 0.90 of R-Square but bad in Valid with nagative R-Square and error was big. We need more better model.

3. DecisionTreeRegressor + TimeSeriesSplit Validation + more params

parameter adding for GridSearch by author
3. DecisionTreeRegressor + TimeSeriesSplit Validation + more params prediction plot by author
3. DecisionTreeRegressor + TimeSeriesSplit Validation + more params model preformance plot by author

It was improve but error still high. by eye we can see 2 data point in valid was big gap but other point was good.

4. RandomForestRegressor + TimeSeriesSplit Validation +more params

4. RandomForestRegressor + TimeSeriesSplit Validation +more params prediction plot by author
4. RandomForestRegressor + TimeSeriesSplit Validation +more params model preformance plot by author

RandomForest improve one of 2 big gap above better and error was lower than 1.0. it good but we explored more.

5. Xgboost + TimeSeriesSplit Validation + more params

5. Xgboost + TimeSeriesSplit Validation + more params prediction plot by author
5. Xgboost + TimeSeriesSplit Validation + more params model preformance plot by author

Xgboost was better from RandomForest both of R-squre and error in Valid and more better both same in Train set. It mean Xgboost was the best model in this project!!

Result Summay in each model focus on Valid set only

  • model_1: R-squre = -0.218 | MAE = 1.372
  • model_2: R-squre = -0.397 | MAE = 1.527
  • model_3: R-squre = 0.149 | MAE = 1.126
  • model_4: R-squre = 0.248 | MAE = 0.947
  • model_5: R-squre = 0.337 | MAE = 0.866 ~ (Best Model) ><

Feature Important

next step on Xgboost our best model to get score(.feature_importances_)

RSP and pro_2for were our best 2 features as we used to have the assumption on its correlation far above. Our main 2 interested features Sales_in and Stores were small, match with our assumption too.

anyway, it make sense and lastly we will take the best model to predict on unseen data (test set) and measure model performance in the same metric.

6. Predict on Unseen data (test set)

unseen data by author

We got 10 record ready to final test our best model Xgboost, but first of all we need to manipulate data same process as train data….. the complete test show as below

complete unseen data by author

next to preprocessing step to get data ready to predict

unseen data preprocessing by author
unseen data prediction by author
unseen data model performance by author

by eye we can see not big gap of test and predict data and by score was show R-squre = 0.635 | MAE = 0.575 on unseen(test). it good and better than train and Valid data in Model_5

Conclusion

  • We explore multivariate data(not only one-single data), how to predict offtake data on sales_in, stores and promotion data in retails business domain
  • Correlation is important to roughly see and prove the questions untill set assumption with our domain experience on data.
  • Valadation is important for any model.
  • Treebase Regressions are useful and flexible to use.

Next!!, This project the data set was so small. To make more and more powerful of the model, more train data is needed (at least 1000 for train record: 100 for test) Anyway those prediction was used to my real task already. This was enough for me from this project ^^

lastly, I have an idea for next project to combine and convert promotion price to class of data 1(promotion) or 0 (no promotion) then create Classification model to classify which data record belong to which class.

hope to see you in the next project Classification on Price Promotion data

>> full code here https://github.com/MossMojito/Regression

Thank for your reading :)

--

--