Neural Network Image Classification

Utilizing keras in R on MNIST-style data sets

Cynarina (Cynarina lacrymalis)

Keras and beyond

Keras is a high-level neural networks API developed with a focus on enabling fast experimentation. Originally written in Python, the library keras is part of a series of R interfaces to TensorFlow.

TensorFlow is an open-source machine learning framework from Google which includes the Neural Network library of Keras.

The goal in this post is to do some low level examples on MNIST-like data sets and act as an introduction in applying neural network classifications techniques. For the sake of brevity this post is short on the details with the intention of being revisited.

Here’s a quick list of the packages we’ll be using:

library(keras)
library(tidyverse)
library(ggplot2)
library(gridExtra)
library(imager)

MNIST

The MNIST database is composed of handwritten digits and is a common “Hello, World!” data set for neural networks.

mnist <- dataset_mnist()

Taking a look at an example of the images:

# Plotting helper; mnist is global var
mnistplot <- function(picked){
  mnist$train$x[picked,,] %>% 
  # Wrangle
  as_tibble() %>% 
  rowid_to_column("Y") %>% 
  pivot_longer(-Y,names_to = "X", values_to = "PIX") %>% 
  mutate(X = as.numeric(gsub("V","",X))) %>%
  mutate(Y = abs(Y - max(Y)+1)) %>%
  # Graph
  ggplot(aes(X,Y, fill = PIX)) + 
  geom_tile() + 
  scale_fill_gradient(low="white", high="black", guide = FALSE) +
  ggtitle(paste("Handwritten number:",mnist$train$y[picked])) +
  theme_void()
}
# Grid graph
pickcount <- 9
mnistex <- lapply(1:pickcount, mnistplot)
mnistexplot <- arrangeGrob(grobs = mnistex,
                           ncol = 3,
                           nrow = 3)
plot(mnistexplot)

So now that we’ve gotten an idea for what it is we’re trying to predict on let’s get started.

The first step is to normalize the data to help with training time.

mnist_train = mnist$train
mnist_test  = mnist$test
# Normalization; Speeds training time
mnist_train$x <- mnist_train$x/255
mnist_test$x  <- mnist_test$x/255

Now we’re ready to put together the actual model. I’ll be doing a sequential model as it’s the easiest to work with in Keras. It gets its name as we actually build the model layer by layer, sequentially.

mnist_model <- keras_model_sequential() %>%
  layer_flatten(input_shape = c(28,28)) %>%
  layer_dense(units = 128, activation = "relu") %>%
  layer_dropout(0.2) %>%
  layer_dense(10, activation = "softmax")

Here we can look at a summary of the model we’ve put together detailing the parameters and layers.

summary(mnist_model)
## Model: "sequential"
## ________________________________________________________________________________
## Layer (type)                        Output Shape                    Param #     
## ================================================================================
## flatten (Flatten)                   (None, 784)                     0           
## ________________________________________________________________________________
## dense (Dense)                       (None, 128)                     100480      
## ________________________________________________________________________________
## dropout (Dropout)                   (None, 128)                     0           
## ________________________________________________________________________________
## dense_1 (Dense)                     (None, 10)                      1290        
## ================================================================================
## Total params: 101,770
## Trainable params: 101,770
## Non-trainable params: 0
## ________________________________________________________________________________

From here we need to actually compile our model. When we compile the model is when we define what loss function will be optimized and what optimizer will be used to accomplish this.

mnist_model %>% 
  compile(
    loss = "sparse_categorical_crossentropy",
    optimizer = "adam",
    metrics = "accuracy"
  )

Oddly enough, when building or compiling with Keras we’re actually modifying our model in place. This is a little bit of strange behavior in R that’s easy to forget.

Our next step is to actually fit the model to our data. Here we assign the cross validation ratio and the number of epochs and let it form a model to use for predictions.

mnist_model %>%
  fit(
    x = mnist_train$x, y = mnist_train$y,
    epochs = 5,
    validation_split = 0.3,
    verbose = 1
  )

Once our model is formed we can test it our on our testing data we withheld at the very beginning:

mnist_predictions <- predict(mnist_model, mnist_test$x)
mnist_model %>% 
  evaluate(mnist_test$x, mnist_test$y, verbose = 0)
## $loss
## [1] 0.08473677
## 
## $acc
## [1] 0.9733

To hit around 97% accuracy with little effort is pretty impressive. The MNIST data set is pretty boring, however, so lets try something harder.

Fashion MNIST

Fashion-MNIST is a dataset of images consisting of a training set of 60,000 images and a test set of 10,000 images. Each image is a 28×28 grayscale image, associated with a label from 10 different categories.

fashion <- dataset_fashion_mnist()

Taking a look at an example of the images:

# Plotting helper; fashion is global var
fashionplot <- function(picked){
  fashion$train$x[picked,,] %>% 
  # Wrangle
  as_tibble() %>% 
  rowid_to_column("Y") %>% 
  pivot_longer(-Y,names_to = "X", values_to = "PIX") %>% 
  mutate(X = as.numeric(gsub("V","",X))) %>%
  mutate(Y = abs(Y - max(Y)+1)) %>%
  # Graph
  ggplot(aes(X,Y, fill = PIX)) + 
  geom_tile() + 
  scale_fill_gradient(low="white", high="black", guide = FALSE) +
  ggtitle(paste("Clothing Category:",fashion$train$y[picked])) +
  theme_void()
}
# Grid graph
pickcount <- 9
fashionex <- lapply(1:pickcount, fashionplot)
fashionexplot <- arrangeGrob(grobs = fashionex,
                           ncol = 3,
                           nrow = 3)
plot(fashionexplot)

fashion_test = fashion$test
fashion_train = fashion$train
# Normalization; Speeds training time
fashion_train$x <- fashion_train$x/255
fashion_test$x  <- fashion_test$x/255
fashion_model <- keras_model_sequential() %>%
  layer_flatten(input_shape = c(28,28)) %>%
  layer_dense(units = 128, activation = "relu") %>%
  layer_dropout(0.2) %>%
  layer_dense(10, activation = "softmax")
fashion_model %>% 
  compile(
    loss = "sparse_categorical_crossentropy",
    optimizer = "adam",
    metrics = "accuracy"
  )
fashion_model %>%
  fit(
    x = fashion_train$x, y = fashion_train$y,
    epochs = 5,
    validation_split = 0.3,
    verbose = 1
  )
fashion_predictions <- predict(fashion_model, fashion_test$x)
fashion_model %>% 
  evaluate(fashion_test$x, fashion_test$y, verbose = 0)
## $loss
## [1] 0.371763
## 
## $acc
## [1] 0.8702

Its worth pointing out that handling the FASHION MNIST dataset required nearly no difference in setup that the MNIST dataset, and we still also achieved 85%+ prediction accuracy.

CIFAR-10

The CIFAR-10 dataset consists of 60000 32x32 color images in 10 categories, with 6000 images for each category.

cifar10 <- dataset_cifar10()

Taking a look at CIFAR-10 examples is a little trickier as it is a three-dimensional array. For this we will just utilize the imager library:

par(mfrow = c(2,2))
for(i in 1:4){
  plot(cifar10$train$x[i,,,] %>%
         as.cimg %>%
         imrotate(,angle = 90),
       main = cifar10$train$y[i],
       axes = FALSE)
}

cifar10_test  <- cifar10$test
cifar10_train <- cifar10$train

cifar10_train$x <- cifar10_train$x/255
cifar10_test$x  <- cifar10_test$x/255

As a proof of comparison… what if we treated this just like MNIST and FASHION MNIST and even tried dropping the color dimension?

cifar10_bw_train <- cifar10_train
cifar10_bw_test  <- cifar10_test
cifar10_bw_test$x  <- cifar10_bw_test$x[,,,1]
cifar10_bw_train$x <- cifar10_bw_train$x[,,,1]
par(mfrow = c(2,2))
for(i in 1:4){
  plot(cifar10_bw_train$x[i,,] %>%
         as.cimg %>%
         imrotate(,angle = 90),
       main = cifar10_bw_train$y[i],
       axes = FALSE)
}

cifar10_bw_model <- keras_model_sequential() %>%
  layer_flatten(input_shape = c(32,32)) %>%
  layer_dense(units = 128, activation = "relu") %>%
  layer_dropout(0.2) %>%
  layer_dense(10, activation = "softmax")

cifar10_bw_model %>% 
  compile(
    loss = "sparse_categorical_crossentropy",
    optimizer = "adam",
    metrics = "accuracy"
  )

cifar10_bw_model %>%
  fit(
    x = cifar10_bw_train$x, y = cifar10_bw_train$y,
    epochs = 5,
    validation_split = 0.3,
    verbose = 1
  )

cifar10_bw_predictions <- predict(cifar10_bw_model, cifar10_bw_test$x)

cifar10_bw_model %>% 
  evaluate(cifar10_bw_test$x, cifar10_bw_test$y, verbose = 0)
## $loss
## [1] 1.957393
## 
## $acc
## [1] 0.2944
cifar10_model <- keras_model_sequential() %>%
  layer_flatten(input_shape = c(32,32,3)) %>%
  layer_dense(units = 128, activation = "relu") %>%
  layer_dropout(0.2) %>%
  layer_dense(10, activation = "softmax")

cifar10_model %>% 
  compile(
    loss = "sparse_categorical_crossentropy",
    optimizer = "adam",
    metrics = "accuracy"
  )

cifar10_model %>%
  fit(
    x = cifar10_train$x, y = cifar10_train$y,
    epochs = 5,
    validation_split = 0.3,
    verbose = 1
  )

cifar10_predictions <- predict(cifar10_model, cifar10_test$x)

cifar10_model %>% 
  evaluate(cifar10_test$x, cifar10_test$y, verbose = 0)
## $loss
## [1] 1.78544
## 
## $acc
## [1] 0.3588

Unsurprisingly a more involved model is needed for CIFAR10. Training such models are a little more intensive than I want to host on my current hardware, so this is as far as I intend to take this post. It is interesting that including color wasn’t as drastic of an effect. Regardless hitting 30%+ in seconds of training is still noteworthy considering the CIFAR10 images are much less uniform in characteristics.

Related