Representing inter-relationships in data with Chord diagrams in R

Chord Diagrams: Gotta Catch 'Em All!

Acanthophyllia
library(circlize)
library(chorddiag)
library(tidyverse)

I’ve noticed Chord diagrams to be growing in popularity recently, and mostly by people utilizing Python or Flourish. Taking this as inspiration, I had hoped to do the same using R.

The circlize package is fairly straightforward, especially with its detailed documentation, but I often prefer the interactivity other methods offer. Chord diagrams quickly produce too much information and being able to browse individual Chords goes a long way in producing an effective presentation. Another alternative is with Plotly, but it requires a fair amount of effort. I had also recently learned that Bokeh can also produce these.

For the showcase data I’ll stick with a Pokemon dataset that I’ve seen used for various visualizations. The dataset in question is here, but another equally interesting dataset is here

The goal is to demonstrate the relationship (flow) between the primary and secondary type sets.

library(circlize)
library(chorddiag)
# The following libraries are used for
# quickly producing an adjacency matrix
library(igraph)
library(tidygraph)
# Quick cleaning of the data
# Removing Mega's and Forme duplications
pokemon <- 
  pokemon_data %>% 
  filter(str_detect(Name,"Mega ",
                    negate = TRUE)) %>%
  filter(!duplicated(`#`))

# To quickly plot in Circlize you need
# to form the data into a frequency table
pokecount <-
  pokemon %>%
  count(`Type 1`,`Type 2`, sort = TRUE) %>%
  mutate(`Type 2` = coalesce(`Type 2`,`Type 1`))

circlize::chordDiagram(pokecount)

Looking at the above, I think the “too much information” approach is demonstrated fairly well.

However, I did find a package to accomplish this that I’ve become rather fond of: chorddiag.

The chorddiag package isn’t hosted on CRAN, but you can find it on github here. The gist of chorddiag is that it allows you to create interactive chord diagrams by way of the JavaScript visualization library D3 (http://d3js.org). It works from within R using the htmlwidgets interfacing framework.

To use the default plotting function we will need an adjacency matrix rather than a frequency table. I did this the manual way at first:

pokecount_mat <- 
  pokecount %>%
  pivot_wider(names_from = 'Type 2',
              values_from = n) %>%
  replace(is.na(.),0)

pokecount_mat <- 
  pokecount_mat[c("Type 1",pokecount_mat %>% 
                                   select("Type 1") %>%
                                   unique() %>% 
                                   as.matrix %>%
                                   as.character())]
pokecount_mat <-
  pokecount_mat %>%
  column_to_rownames("Type 1") %>%
  as.matrix()


chorddiag::chorddiag(pokecount_mat)

Though I had also found there is also a more automated way using igraph and tidygraph:

# Using igraph and tidygraph to reshape
pokecount %>%
  tidygraph::as_tbl_graph() %>%
  igraph::as_adjacency_matrix(attr = "n") %>%
  # chorddiag requires a matrix, not a tibble
  as.matrix() %>%
  chorddiag::chorddiag()

Talk about an efficient work flow! Comparing the two I think it becomes strikingly clear how much interactivity adds to this plot. Of course interactivity isn’t always feasible for every medium, and that is where working off of circlize would likely have a distinct advantage.