Data Analysis on Kroger Inventory Adjustments

A 30-minute invesigation of a month of inventory adjustments

Trachyphyllia

About

The data set examined below contains inventory accuracy records from a number of Kroger Grocery stores. The ‘adjustment unit quantity’ is the unit variance that 3rd party auditors confirmed upon visiting a store, e.g., the first example below (-1) means that the auditing company found one less bowl of tropical fruit than the stores perpetual inventory had on record. The subsequent example (+14) means that the auditors counted fourteen more peach fruit bowls than the stores perpetual inventory.

The objective was to work the data set in R and accomplish the following within 30 minutes:

  • 3 ‘actionable’ insights on inventory accuracy (by store, commodity, variance amounts, etc..)
  • Plots with supporting insights that visualize your story in ggplot
  • Documentation of your R script details

While a fairly straightforward task it felt worthwhile to document this sort of assessment.

The file is an .xlsb file which, when looking up, seemed notorious for being difficult to work with. Most recommendations I saw at the time were to utilize a local-server type setup so it could be accessed with SQL or using the excel.link package. As of this post there seems to have been a package specifically created on 04-13-2020 dedicated to solving this issue in R in a more elegant fashion here. When I had done this write-up this package was unfortunately unavailable, and juggling the file type wasn’t a good use of time.

However, mostly due to not being constrained by memory, my solution was to simply do a manual conversion from .xlsb to .csv with excel itself and was able to continue without any issue.

library(tidyverse)
library(ggplot2)

Exploratory Data Analysis

A quick examiniation of the distribution of the AUQ values:

summary(kroger$`Adjustment Unit Quantity`)
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -26384.000     -1.000      0.000      0.074      1.000  11268.000

The minimum and maximum shows some clear signs of there being outliers present in the data when considering the quartiles.

The middle 50% of adjustments being between 1 and -1 shows good signs of inventory following forecasted demand. Furthermore 90% of adjustments are between -5 and 5, and 95% between -10 and 10, and 99% between -30 and 30.

A histogram to help visualize the distribution:

fig.market <- kroger %>%
  ggplot(mapping = aes(x = `Adjustment Unit Quantity`,
                       y = (..count..)/sum(..count..)))
fig.market + 
  geom_histogram(bins = 20000,
                 fill = 'gray',
                 color = 'black') + 
  coord_cartesian(xlim = c(-50,50)) +
  ylab("Frequency (%)") +
  ggtitle("Histogram of overall AUQ")

A quick overview of the extreme AUQ values, both positive and negative:

kroger %>%
  arrange(desc(`Adjustment Unit Quantity`))
## # A tibble: 892,955 x 5
##    `Store #` `Commodity Name` `Base GTIN Numb~ `Base GTIN Desc~ `Adjustment Uni~
##    <chr>     <chr>            <chr>            <chr>                       <dbl>
##  1 00907     202 CANDY - PAC~ 0003400000475    REESES LOVR EGGS            11268
##  2 00934     202 CANDY - PAC~ 0001206594021    PAPAS MILK CHOC~             6826
##  3 00921     202 CANDY - PAC~ 0003400000475    REESES LOVR EGGS             4826
##  4 00935     202 CANDY - PAC~ 0003400000475    REESES LOVR EGGS             4320
##  5 00935     202 CANDY - PAC~ 0001206594010    PAPAS OPERA CRE~             3730
##  6 00929     842 GAMES        0019141855264    JUSTCE LG 3-W S~             2648
##  7 00431     202 CANDY - PAC~ 0001206594012    PAPAS CHERRY CR~             2588
##  8 00907     606 TROPICAL FR~ 0000000004046    AVOCADOS HASS S~             2565
##  9 00412     202 CANDY - PAC~ 0003400000475    REESES LOVR EGGS             2487
## 10 00431     202 CANDY - PAC~ 0001206594011    PAPAS FRENCH CR~             2268
## # ... with 892,945 more rows
kroger %>%
  arrange(`Adjustment Unit Quantity`)
## # A tibble: 892,955 x 5
##    `Store #` `Commodity Name` `Base GTIN Numb~ `Base GTIN Desc~ `Adjustment Uni~
##    <chr>     <chr>            <chr>            <chr>                       <dbl>
##  1 00420     973 BASIC GIRL   0088796136848    BRB CAREER SPOR~           -26384
##  2 00907     790 KITCHEN      0004759656176    BRD BRUSSELS MO~           -20002
##  3 00907     121 CIGARETTES   0001100000494    PYRMD GOLD KG M~           -17821
##  4 00447     923 BABY BASICS  0088530656052    3P FOL TG TRNNG~            -8847
##  5 00405     154 BROTH/DRY S~ 0002970001103    IDHN SH LOADED ~            -8513
##  6 00335     013 SPICES & EX~ 0001111090341    PRSL FINE SEA S~            -7884
##  7 00418     084 DRY NOODLES~ 0001111089916    HEMIS LUMACONI              -7884
##  8 00705     134 CROUTONS/BR~ 0007054000000    ANGO ITAL STYL ~            -7551
##  9 00421     114 PAPER HOUSE~ 0001111089218    KRO HMS COLOR T~            -7419
## 10 00420     098 EUROPEAN FO~ 0073449287005    LYLES GOLDEN SY~            -7293
## # ... with 892,945 more rows

An immediate observation are some commodities in need of drastic adjustments to their inventory quantities for future orders. These being:

rbind(kroger %>% arrange(desc(`Adjustment Unit Quantity`)) %>% head(1),
      kroger %>% arrange(`Adjustment Unit Quantity`) %>% head(3))
## # A tibble: 4 x 5
##   `Store #` `Commodity Name`  `Base GTIN Numb~ `Base GTIN Desc~ `Adjustment Uni~
##   <chr>     <chr>             <chr>            <chr>                       <dbl>
## 1 00907     202 CANDY - PACK~ 0003400000475    REESES LOVR EGGS            11268
## 2 00420     973 BASIC GIRL    0088796136848    BRB CAREER SPOR~           -26384
## 3 00907     790 KITCHEN       0004759656176    BRD BRUSSELS MO~           -20002
## 4 00907     121 CIGARETTES    0001100000494    PYRMD GOLD KG M~           -17821

Another observation worth noting is three of these four extreme adjustments all came from Store #00907.

Taking a broad look at average adjustment per commodity category we can see Tomatoes and Melons are in excess demand overall, whereas Tropical Fruit are at a lower demand overall.

# Sorting Average AUG per commodity from increasing to decreasing
kroger %>% group_by(`Commodity Name`) %>%
  summarise(avgAUQ = mean(`Adjustment Unit Quantity`)) %>% arrange(avgAUQ)
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 542 x 2
##    `Commodity Name`             avgAUQ
##    <chr>                         <dbl>
##  1 617 TOMATOES                -1805  
##  2 607 MELONS                  -1112  
##  3 109 WATER CONDITIONING SALT  -103. 
##  4 923 BABY BASICS               -53.9
##  5 435 SEAFOOD - SHRIMP          -46.5
##  6 433 SEAFOOD - CRAB            -43.5
##  7 817 FERTILIZERS               -27.7
##  8 973 BASIC GIRL                -27.2
##  9 098 EUROPEAN FOODS            -26.3
## 10 885 POWER TOOL/ACCSSR         -20.4
## # ... with 532 more rows
# Sorting Average AUG per commodity from decreasing to increasing
kroger %>% group_by(`Commodity Name`) %>%
  summarise(avgAUQ = mean(`Adjustment Unit Quantity`)) %>% arrange(desc(avgAUQ))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 542 x 2
##    `Commodity Name`             avgAUQ
##    <chr>                         <dbl>
##  1 606 TROPICAL FRUIT           824   
##  2 236 BOOKSTORE                 69.5 
##  3 627 NF BULK FOOD              33.3 
##  4 804 SUNGLASSES                23.8 
##  5 751 FIREWOOD/FIRESTARTS       18.3 
##  6 XXX NO COMMODITY DESCRIPTION  18   
##  7 836 BACKPACKS/DUFFELS/TOTES   16.3 
##  8 826 SOFT DECOR                 7.17
##  9 841 CHRISTMAS                  6.82
## 10 830 SEASONAL ACCESSORIES       5.63
## # ... with 532 more rows
# Creating a list of commodity names
kroger$`Commodity Name` %>% unique() %>% length() -> commodnames

Taking a broader examiniation of the spread of AUQ across stores we can clearly identify that some stores are heavier in adjustments:

# Boxplot of AUQ per store
fig.store <- kroger %>% 
  ggplot(aes(x = `Store #`, y = `Adjustment Unit Quantity`)) + 
  ggtitle("Boxplot of AUQ by Store")

fig.store.boxplot <- fig.store + 
  geom_boxplot()

fig.store.boxplot + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

By focusing on which stores have AUQ’s beyond a threshold of units in either direction we could narrow down some forecasting adjustments. Taking a progessively closer examination of the distribution of adjustments around zero we can identify if any stores lean in a particular direction:

fig.store.boxplot + coord_cartesian(ylim = c(-5000, 5000)) + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) + 
  ggtitle("Boxplot of AUQ by Store")

# Boxplot of AUQ per store - zoomed in
fig.store.boxplot + coord_cartesian(ylim = c(-10, 10)) + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) + 
  scale_y_continuous(breaks = seq(-10,10,1)) + ggtitle("Boxplot of AUQ by Store")

From the above graph we can clearly identify store 00384 and 00929 had more under-adjustments and 00390, 00420, 00933 had more over-adjustments.

Actionable Insights

1.) Current forecasts keep 90% of inventory within \(\pm5\) of expected demand, and 99% of inventory within \(\pm30\) of expected demand.

2.) Four identifiable goods that had extreme adjustments and in need of modificaitons for future predictions.

3.) Tomatoes and Melons are in excess demand, whereas Tropical Fruit has fallen in demand.

4.) Five identified stores in need of more large-scale, yet minor, adjustments towards inventory to maintain the middle 50% of inventory being within \(\pm1\) AUQ.

Related