# Data Analysis on Kroger Inventory Adjustments

A 30-minute invesigation of a month of inventory adjustments

Trachyphyllia

The data set examined below contains inventory accuracy records from a number of Kroger Grocery stores. The ‘adjustment unit quantity’ is the unit variance that 3rd party auditors confirmed upon visiting a store, e.g., the first example below (-1) means that the auditing company found one less bowl of tropical fruit than the stores perpetual inventory had on record. The subsequent example (+14) means that the auditors counted fourteen more peach fruit bowls than the stores perpetual inventory.

The objective was to work the data set in R and accomplish the following within 30 minutes:

• 3 ‘actionable’ insights on inventory accuracy (by store, commodity, variance amounts, etc..)
• Plots with supporting insights that visualize your story in ggplot
• Documentation of your R script details

While a fairly straightforward task it felt worthwhile to document this sort of assessment.

The file is an .xlsb file which, when looking up, seemed notorious for being difficult to work with. Most recommendations I saw at the time were to utilize a local-server type setup so it could be accessed with SQL or using the excel.link package. As of this post there seems to have been a package specifically created on 04-13-2020 dedicated to solving this issue in R in a more elegant fashion here. When I had done this write-up this package was unfortunately unavailable, and juggling the file type wasn’t a good use of time.

However, mostly due to not being constrained by memory, my solution was to simply do a manual conversion from .xlsb to .csv with excel itself and was able to continue without any issue.

library(tidyverse)
library(ggplot2)

# Exploratory Data Analysis

A quick examiniation of the distribution of the AUQ values:

summary(kroger$Adjustment Unit Quantity) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## -26384.000 -1.000 0.000 0.074 1.000 11268.000 The minimum and maximum shows some clear signs of there being outliers present in the data when considering the quartiles. The middle 50% of adjustments being between 1 and -1 shows good signs of inventory following forecasted demand. Furthermore 90% of adjustments are between -5 and 5, and 95% between -10 and 10, and 99% between -30 and 30. A histogram to help visualize the distribution: fig.market <- kroger %>% ggplot(mapping = aes(x = Adjustment Unit Quantity, y = (..count..)/sum(..count..))) fig.market + geom_histogram(bins = 20000, fill = 'gray', color = 'black') + coord_cartesian(xlim = c(-50,50)) + ylab("Frequency (%)") + ggtitle("Histogram of overall AUQ") A quick overview of the extreme AUQ values, both positive and negative: kroger %>% arrange(desc(Adjustment Unit Quantity)) ## # A tibble: 892,955 x 5 ## Store # Commodity Name Base GTIN Numb~ Base GTIN Desc~ Adjustment Uni~ ## <chr> <chr> <chr> <chr> <dbl> ## 1 00907 202 CANDY - PAC~ 0003400000475 REESES LOVR EGGS 11268 ## 2 00934 202 CANDY - PAC~ 0001206594021 PAPAS MILK CHOC~ 6826 ## 3 00921 202 CANDY - PAC~ 0003400000475 REESES LOVR EGGS 4826 ## 4 00935 202 CANDY - PAC~ 0003400000475 REESES LOVR EGGS 4320 ## 5 00935 202 CANDY - PAC~ 0001206594010 PAPAS OPERA CRE~ 3730 ## 6 00929 842 GAMES 0019141855264 JUSTCE LG 3-W S~ 2648 ## 7 00431 202 CANDY - PAC~ 0001206594012 PAPAS CHERRY CR~ 2588 ## 8 00907 606 TROPICAL FR~ 0000000004046 AVOCADOS HASS S~ 2565 ## 9 00412 202 CANDY - PAC~ 0003400000475 REESES LOVR EGGS 2487 ## 10 00431 202 CANDY - PAC~ 0001206594011 PAPAS FRENCH CR~ 2268 ## # ... with 892,945 more rows kroger %>% arrange(Adjustment Unit Quantity) ## # A tibble: 892,955 x 5 ## Store # Commodity Name Base GTIN Numb~ Base GTIN Desc~ Adjustment Uni~ ## <chr> <chr> <chr> <chr> <dbl> ## 1 00420 973 BASIC GIRL 0088796136848 BRB CAREER SPOR~ -26384 ## 2 00907 790 KITCHEN 0004759656176 BRD BRUSSELS MO~ -20002 ## 3 00907 121 CIGARETTES 0001100000494 PYRMD GOLD KG M~ -17821 ## 4 00447 923 BABY BASICS 0088530656052 3P FOL TG TRNNG~ -8847 ## 5 00405 154 BROTH/DRY S~ 0002970001103 IDHN SH LOADED ~ -8513 ## 6 00335 013 SPICES & EX~ 0001111090341 PRSL FINE SEA S~ -7884 ## 7 00418 084 DRY NOODLES~ 0001111089916 HEMIS LUMACONI -7884 ## 8 00705 134 CROUTONS/BR~ 0007054000000 ANGO ITAL STYL ~ -7551 ## 9 00421 114 PAPER HOUSE~ 0001111089218 KRO HMS COLOR T~ -7419 ## 10 00420 098 EUROPEAN FO~ 0073449287005 LYLES GOLDEN SY~ -7293 ## # ... with 892,945 more rows An immediate observation are some commodities in need of drastic adjustments to their inventory quantities for future orders. These being: rbind(kroger %>% arrange(desc(Adjustment Unit Quantity)) %>% head(1), kroger %>% arrange(Adjustment Unit Quantity) %>% head(3)) ## # A tibble: 4 x 5 ## Store # Commodity Name Base GTIN Numb~ Base GTIN Desc~ Adjustment Uni~ ## <chr> <chr> <chr> <chr> <dbl> ## 1 00907 202 CANDY - PACK~ 0003400000475 REESES LOVR EGGS 11268 ## 2 00420 973 BASIC GIRL 0088796136848 BRB CAREER SPOR~ -26384 ## 3 00907 790 KITCHEN 0004759656176 BRD BRUSSELS MO~ -20002 ## 4 00907 121 CIGARETTES 0001100000494 PYRMD GOLD KG M~ -17821 Another observation worth noting is three of these four extreme adjustments all came from Store #00907. Taking a broad look at average adjustment per commodity category we can see Tomatoes and Melons are in excess demand overall, whereas Tropical Fruit are at a lower demand overall. # Sorting Average AUG per commodity from increasing to decreasing kroger %>% group_by(Commodity Name) %>% summarise(avgAUQ = mean(Adjustment Unit Quantity)) %>% arrange(avgAUQ) ## summarise() ungrouping output (override with .groups argument) ## # A tibble: 542 x 2 ## Commodity Name avgAUQ ## <chr> <dbl> ## 1 617 TOMATOES -1805 ## 2 607 MELONS -1112 ## 3 109 WATER CONDITIONING SALT -103. ## 4 923 BABY BASICS -53.9 ## 5 435 SEAFOOD - SHRIMP -46.5 ## 6 433 SEAFOOD - CRAB -43.5 ## 7 817 FERTILIZERS -27.7 ## 8 973 BASIC GIRL -27.2 ## 9 098 EUROPEAN FOODS -26.3 ## 10 885 POWER TOOL/ACCSSR -20.4 ## # ... with 532 more rows # Sorting Average AUG per commodity from decreasing to increasing kroger %>% group_by(Commodity Name) %>% summarise(avgAUQ = mean(Adjustment Unit Quantity)) %>% arrange(desc(avgAUQ)) ## summarise() ungrouping output (override with .groups argument) ## # A tibble: 542 x 2 ## Commodity Name avgAUQ ## <chr> <dbl> ## 1 606 TROPICAL FRUIT 824 ## 2 236 BOOKSTORE 69.5 ## 3 627 NF BULK FOOD 33.3 ## 4 804 SUNGLASSES 23.8 ## 5 751 FIREWOOD/FIRESTARTS 18.3 ## 6 XXX NO COMMODITY DESCRIPTION 18 ## 7 836 BACKPACKS/DUFFELS/TOTES 16.3 ## 8 826 SOFT DECOR 7.17 ## 9 841 CHRISTMAS 6.82 ## 10 830 SEASONAL ACCESSORIES 5.63 ## # ... with 532 more rows # Creating a list of commodity names kroger$Commodity Name %>% unique() %>% length() -> commodnames

Taking a broader examiniation of the spread of AUQ across stores we can clearly identify that some stores are heavier in adjustments:

# Boxplot of AUQ per store
fig.store <- kroger %>%
ggplot(aes(x = Store #, y = Adjustment Unit Quantity)) +
ggtitle("Boxplot of AUQ by Store")

fig.store.boxplot <- fig.store +
geom_boxplot()

fig.store.boxplot +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

By focusing on which stores have AUQ’s beyond a threshold of units in either direction we could narrow down some forecasting adjustments. Taking a progessively closer examination of the distribution of adjustments around zero we can identify if any stores lean in a particular direction:

fig.store.boxplot + coord_cartesian(ylim = c(-5000, 5000)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) +
ggtitle("Boxplot of AUQ by Store")

# Boxplot of AUQ per store - zoomed in
fig.store.boxplot + coord_cartesian(ylim = c(-10, 10)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) +
scale_y_continuous(breaks = seq(-10,10,1)) + ggtitle("Boxplot of AUQ by Store")`

# Actionable Insights

1.) Current forecasts keep 90% of inventory within $$\pm5$$ of expected demand, and 99% of inventory within $$\pm30$$ of expected demand.

2.) Four identifiable goods that had extreme adjustments and in need of modificaitons for future predictions.

3.) Tomatoes and Melons are in excess demand, whereas Tropical Fruit has fallen in demand.

4.) Five identified stores in need of more large-scale, yet minor, adjustments towards inventory to maintain the middle 50% of inventory being within $$\pm1$$ AUQ.