Representing inter-relationships in data with Chord diagrams in R
Chord Diagrams: Gotta Catch 'Em All!
library(circlize) library(chorddiag) library(tidyverse)
I’ve noticed Chord diagrams to be growing in popularity recently, and mostly by people utilizing Python or Flourish. Taking this as inspiration, I had hoped to do the same using R.
circlize package is fairly straightforward, especially with its detailed documentation, but I often prefer the interactivity other methods offer. Chord diagrams quickly produce too much information and being able to browse individual Chords goes a long way in producing an effective presentation. Another alternative is with
Plotly, but it requires a fair amount of effort. I had also recently learned that
Bokeh can also produce these.
The goal is to demonstrate the relationship (flow) between the primary and secondary type sets.
library(circlize) library(chorddiag) # The following libraries are used for # quickly producing an adjacency matrix library(igraph) library(tidygraph) # Quick cleaning of the data # Removing Mega's and Forme duplications pokemon <- pokemon_data %>% filter(str_detect(Name,"Mega ", negate = TRUE)) %>% filter(!duplicated(`#`)) # To quickly plot in Circlize you need # to form the data into a frequency table pokecount <- pokemon %>% count(`Type 1`,`Type 2`, sort = TRUE) %>% mutate(`Type 2` = coalesce(`Type 2`,`Type 1`)) circlize::chordDiagram(pokecount)
Looking at the above, I think the “too much information” approach is demonstrated fairly well.
However, I did find a package to accomplish this that I’ve become rather fond of:
chorddiag package isn’t hosted on CRAN, but you can find it on github here. The gist of
To use the default plotting function we will need an adjacency matrix rather than a frequency table. I did this the manual way at first:
pokecount_mat <- pokecount %>% pivot_wider(names_from = 'Type 2', values_from = n) %>% replace(is.na(.),0) pokecount_mat <- pokecount_mat[c("Type 1",pokecount_mat %>% select("Type 1") %>% unique() %>% as.matrix %>% as.character())] pokecount_mat <- pokecount_mat %>% column_to_rownames("Type 1") %>% as.matrix() chorddiag::chorddiag(pokecount_mat)
Though I had also found there is also a more automated way using
# Using igraph and tidygraph to reshape pokecount %>% tidygraph::as_tbl_graph() %>% igraph::as_adjacency_matrix(attr = "n") %>% # chorddiag requires a matrix, not a tibble as.matrix() %>% chorddiag::chorddiag()
Talk about an efficient work flow! Comparing the two I think it becomes strikingly clear how much interactivity adds to this plot. Of course interactivity isn’t always feasible for every medium, and that is where working off of
circlize would likely have a distinct advantage.