packages:
When do we use these types of graphs:
What kind of data would you need?
As defined by the “Data Visualization Catalogue”, a sunburst plot is:
… shows hierarchy through a series of rings, that are sliced for each category node. Each ring corresponds to a level in the hierarchy, with the central circle representing the root node and the hierarchy moving outwards from it. Rings are sliced up and divided based on their hierarchical relationship to the parent slice. The angle of each slice is either divided equally under its parent node or can be made proportional to a value.
A popular example is using a sequence sunburst diagram to display sequence data such as website navigation paths. This can make it easier for people to see where the visits originated from and the path taken for the final location. This function can work with data that is in CSV format.
If you’re looking to reuse the example with your data, there are some things to take into account:
library(sunburstR)
sequence_data <- read.csv(
paste0(
"https://gist.githubusercontent.com/kerryrodden/7090426/",
"raw/ad00fcf422541f19b70af5a8a4c5e1460254e6be/visit-sequences.csv"),
header=F,
stringsAsFactors = FALSE
)
sunburst(sequence_data)
Sunburst plots tend to work well with sports data. In this example, we’re going to use some baseball data from the “pitchRx” package. If you like baseball, this package has tools that collects info on MLB Gameday data.
What is the scrape() function? Great question. IDK but what I found out was, using the “rvest” package, you can take data that is presented in an unstructured format, like HTML tags found on the web, and convert that into a structured format that is easily accessible. Examples for applications for scrapping data are scrapping user reviews from Amazon to scrapping ratings of movies to create recommendation engines.
To be able to put the data in a sunburst graph, the example suggests to use the data on the runner to get an idea of action with a runner on base.
action <- baseball$runner %>%
group_by(event_num) %>%
filter(row_number() == 1) %>%
ungroup() %>%
group_by(gameday_link, inning, inning_side) %>%
summarize(event = paste(c(event),collapse="-"))
sequences <- action %>%
ungroup() %>%
group_by(event) %>%
summarize(count = n())
sequences$depth <- unlist(lapply(strsplit(sequences$event,"-"),length))
plot1 <- sequences %>%
arrange(desc(depth), event) %>%
sunburst()
plot1
The general idea of the package is broken down into mostly 2 groups of functions:
library(D3partitionR)
library(data.table)
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
titanic_data <- fread(url, sep = ",", header = T)
var_names=c('sex','embarked','pclass','survived')
data_plot=titanic_data[,.N,by=var_names]
data_plot[,(var_names):=lapply(var_names,function(x){data_plot[[x]]=paste0(x,' ',data_plot[[x]])
})]
library(magrittr)
D3partitionR() %>%
add_data(data_plot,
count ='N',
steps=c('sex','embarked','pclass','survived')) %>%
add_title('Titanic') %>%
plot()
You can change the chart with the set_chart_type() function, which is the following 2 examples:
D3partitionR() %>%
add_data(data_plot,count = 'N',steps=c('sex','embarked','pclass','survived')) %>%
set_chart_type('treemap') %>%
plot()
D3partitionR() %>%
add_data(data_plot,count = 'N',steps=c('sex','embarked','pclass','survived')) %>%
set_chart_type('circle_treemap') %>%
plot()