class: center, middle, inverse, title-slide # Network Analysis I ## Working with and Visualizing Network Graphs ###
rstudio::
conf(2022) --- class: left, middle, rstudio-logo, bigfont ## Aims of this module ✅ Understand graphs as an analytic tool - Review the mathematical definition of a graph - Learn how to construct graphs in R ✅ Review graph visualization options in R - Static visualization methods - Dynamic visualization methods --- class: left, middle, rstudio-logo # Working with Graphs --- class: left, middle, rstudio-logo ## Graphs help us model connections between entities <img src="5-working_with_and_visualizing_graphs_files/figure-html/unnamed-chunk-1-1.png" width="300" height="350" style="float:left; padding-right:40px" style="display: block; margin: auto;" /> * In People Analytics, it's quite common for the unit of analysis to be based on connection * To allow this, a different form of data structure is needed, known as a *graph* * This example graph connects four people based on whether they have worked together * The entities (people) are called *vertices* or *nodes*. The connections are called *edges*. * For this type of relationship, there is no need to define a direction. The edges are *undirected*. * Example: Facebook friends, meetings --- class: left, middle, rstudio-logo ## Graphs can be undirected or directed, depending on the relationship <img src="5-working_with_and_visualizing_graphs_files/figure-html/unnamed-chunk-2-1.png" width="300" height="350" style="float:left; padding-right:40px" style="display: block; margin: auto;" /> * In this graph, the relationship is 'is a manager of'. This relationship has direction. * This is known as a *directed graph* or *digraph*. * In a directed graph, it is possible for edges to point in both directions between two vertices. * However, these would be considered *two different edges*. * Example: Twitter follow, email --- class: left, middle, rstudio-logo ## Multigraphs are graphs which allow multiple edges between vertices <img src="5-working_with_and_visualizing_graphs_files/figure-html/unnamed-chunk-3-1.png" width="400" height="350" style="float:left; padding-right:40px" style="display: block; margin: auto;" /> * Each edge usually represents a different type of connection. * In this graph each edge represents a different flight number between three US cities on a given day in December 2010. * It's also possible for an edge to start and finish on the same vertex - this called a *loop edge*. * Graphs with only one edge between vertices and with no loop edges are called *simple graphs*. --- class: left, middle, rstudio-logo ## Defining graphs mathematically A graph `\(G\)` consists of two sets: 1. The vertex set `\(V\)` 2. The edge set `\(E\)`, which consists of pairs of vertices in `\(V\)` Example (our directed 'is a manager of' graph): $$ `\begin{aligned} G &= (V, E) \\ V &= \{\mathrm{Suraya}, \mathrm{David}, \mathrm{Zubin}, \mathrm{Jane}\} \\ E &= \{ \mathrm{Suraya} \longrightarrow \mathrm{David}, \mathrm{David} \longrightarrow \mathrm{Zubin}, \mathrm{David} \longrightarrow \mathrm{Jane} \} \end{aligned}` $$ --- class: left, middle, rstudio-logo ## Creating graph objects in R We use the `igraph` package to store objects in graph form. At a minumum, we need the edge set (also known as an edgelist) as a data frame. ```r (is_manager_of <- data.frame( from = c("Suraya", "David", "David"), to = c("David", "Zubin", "Jane") )) ``` ``` ## from to ## 1 Suraya David ## 2 David Zubin ## 3 David Jane ``` ```r library(igraph) manager_graph <- igraph::graph_from_data_frame( is_manager_of, directed = TRUE ) ``` --- class: left, middle, rstudio-logo ## Inside a graph object ```r # view full graph object manager_graph ``` ``` ## IGRAPH 6884c63 DN-- 4 3 -- ## + attr: name (v/c) ## + edges from 6884c63 (vertex names): ## [1] Suraya->David David ->Zubin David ->Jane ``` ```r # view vertices V(manager_graph) ``` ``` ## + 4/4 vertices, named, from 6884c63: ## [1] Suraya David Zubin Jane ``` ```r # view edges E(manager_graph) ``` ``` ## + 3/3 edges from 6884c63 (vertex names): ## [1] Suraya->David David ->Zubin David ->Jane ``` --- class: left, middle, rstudio-logo ## Exercise - Creating graphs For our next short exercise, we will do some practice on creating graphs in R. Go to our [RStudio Cloud workspace](https://rstudio.cloud/spaces/230780/join?access_code=7cXJKFU1KUuuZGLwBVQpLG3dIxPUD3jak3ZQmESh) and start **Assignment 05 - Creating and visualizing graphs**. Let's work on **Exercises 1, 2 and 3**. --- class: left, middle, rstudio-logo ## Edge properties <img src="5-working_with_and_visualizing_graphs_files/figure-html/unnamed-chunk-9-1.png" width="300" height="300" style="float:left; padding-right:40px" style="display: block; margin: auto;" /> * Further information about the relationship can be stored as a *property* or *attribute* in the edges of graphs * This graph represents financial transactions between three companies. * As well as showing the direction of the transaction, the edges also show the amount and currency of the transaction. --- class: left, middle, rstudio-logo ## Adding edge properties to a graph ```r # create company transaction graph transactions_edgelist <- data.frame( from = c("A", "A", "B", "B"), to = c("A", "B", "A", "C") ) # create directed graph (note directed is default) (transactions_graph <- igraph::graph_from_data_frame( transactions_edgelist )) ``` ``` ## IGRAPH c20c4f2 DN-- 3 4 -- ## + attr: name (v/c) ## + edges from c20c4f2 (vertex names): ## [1] A->A A->B B->A B->C ``` Add edge attributes in order of the edges in the graph object: ```r E(transactions_graph)$cur <- c("USD", "USD", "GBP", "GBP") E(transactions_graph)$amt <- c(15000, 570000, 230000, 175000) ``` --- class: left, middle, rstudio-logo ## Adding vertex properties to a graph In a similar way we can add properties or attributes to the vertices of a graph in order to store additional information about entities. ```r # location of companies V(transactions_graph)$loc <- c("USA", "UK", "France") ``` When we re-examine our graph we see that it contains these new vertex and edge properties: ```r transactions_graph ``` ``` ## IGRAPH c20c4f2 DN-- 3 4 -- ## + attr: name (v/c), loc (v/c), cur (e/c), amt (e/n) ## + edges from c20c4f2 (vertex names): ## [1] A->A A->B B->A B->C ``` --- class: left, middle, rstudio-logo ## Weighted edges The most commonly used type of edge property is a numeric weight, which is often used to indicate the strength of the relationship. A graph which has a numeric edge property called `weight` will be classified as a *weighted graph*. Whether a graph is weighted or not will have consequences for some of the algorithms and methods we will learn later this morning. ```r # add number of years managed as a weight on manager graph E(manager_graph)$weight <- c(8, 4, 2) ``` Now notice our graph object is a weighted (`W`) graph: ```r manager_graph ``` ``` ## IGRAPH 6884c63 DNW- 4 3 -- ## + attr: name (v/c), weight (e/n) ## + edges from 6884c63 (vertex names): ## [1] Suraya->David David ->Zubin David ->Jane ``` --- class: left, middle, rstudio-logo ## Loading properties from data frames Adding edge and vertex properties manually is not particularly efficient. If your edgelist dataframe contains additional properties in columns, they will automatically be added to the graph. ```r # get edgelist of romantic relationships in Mad Men url <- "https://ona-book.org/data/madmen_edges.csv" madmen_edgelist <- read.csv(url) head(madmen_edgelist) ``` ``` ## Name1 Name2 Married ## 1 Betty Draper Henry Francis 1 ## 2 Betty Draper Random guy 0 ## 3 Don Draper Allison 0 ## 4 Don Draper Bethany Van Nuys 0 ## 5 Don Draper Betty Draper 1 ## 6 Don Draper Bobbie Barrett 0 ``` --- class: left, middle, rstudio-logo ## Loading properties from data frames If we load this dataframe directly into `igraph`, the `Married` edge property will be automatically captured. ```r (madmen_graph <- igraph::graph_from_data_frame( madmen_edgelist, directed = FALSE )) ``` ``` ## IGRAPH 7fab39b UN-- 45 39 -- ## + attr: name (v/c), Married (e/n) ## + edges from 7fab39b (vertex names): ## [1] Betty Draper--Henry Francis Betty Draper--Random guy ## [3] Don Draper --Allison Don Draper --Bethany Van Nuys ## [5] Betty Draper--Don Draper Don Draper --Bobbie Barrett ## [7] Don Draper --Candace Don Draper --Doris ## [9] Don Draper --Faye Miller Don Draper --Joy ## [11] Don Draper --Megan Calvet Don Draper --Midge Daniels ## [13] Don Draper --Rachel Menken Don Draper --Shelly ## [15] Don Draper --Suzanne Farrell ## + ... omitted several edges ``` --- class: left, middle, rstudio-logo ## Loading properties from data frames Similarly, vertex names and properties can be included in a vertex dataframe: ```r url <- "https://ona-book.org/data/madmen_vertices.csv" madmen_vertices <- read.csv(url) head(madmen_vertices) ``` ``` ## label Gender Main ## 1 Betty Draper female 1 ## 2 Don Draper male 1 ## 3 Harry Crane male 0 ## 4 Joan Holloway female 1 ## 5 Lane Pryce male 0 ## 6 Peggy Olson female 1 ``` --- class: left, middle, rstudio-logo ## Loading properties from data frames Vertex properties can be added on loading by using the `vertices` argument in `igraph::graph_from_data_frame()`: ```r (madmen_graph <- igraph::graph_from_data_frame( madmen_edgelist, vertices = madmen_vertices, directed = FALSE )) ``` ``` ## IGRAPH ed465ce UN-- 45 39 -- ## + attr: name (v/c), Gender (v/c), Main (v/n), Married (e/n) ## + edges from ed465ce (vertex names): ## [1] Betty Draper--Henry Francis Betty Draper--Random guy ## [3] Don Draper --Allison Don Draper --Bethany Van Nuys ## [5] Betty Draper--Don Draper Don Draper --Bobbie Barrett ## [7] Don Draper --Candace Don Draper --Doris ## [9] Don Draper --Faye Miller Don Draper --Joy ## [11] Don Draper --Megan Calvet Don Draper --Midge Daniels ## [13] Don Draper --Rachel Menken Don Draper --Shelly ## [15] Don Draper --Suzanne Farrell ## + ... omitted several edges ``` --- class: left, middle, rstudio-logo ## Accessing properties Vertex or edge properties can be accessed within the vertex and edge sets of the graph: ```r # get Married edge property E(madmen_graph)$Married[1:5] ``` ``` ## [1] 1 0 0 0 1 ``` ```r V(madmen_graph)$Gender[1:5] ``` ``` ## [1] "female" "male" "male" "female" "male" ``` --- class: left, middle, rstudio-logo ## Exercise - Adding properties to graphs For our next short exercise, we will do some practice on adding properties to graphs in R. Go to our [RStudio Cloud workspace](https://rstudio.cloud/spaces/230780/join?access_code=7cXJKFU1KUuuZGLwBVQpLG3dIxPUD3jak3ZQmESh) and start **Assignment 05 - Creating and visualizing graphs**. Let's work on **Exercises 4, 5 and 6**. --- class: left, middle, rstudio-logo ## The importance of graph visualization Visualization is a very important way to understand a network graph. Before we study how to analyze graphs and draw insights from them, we should learn how to visualize them. There are quite a few options for how to visualize graphs, and we will look at a few of them briefly: 1. `igraph` native plotting: visually basic but easy 2. `ggraph`: uses `ggplot2` grammar, more 'pleasant' look and feel 3. `networkD3`: uses D3 in Javascript, interactive and responsive --- class: left, middle, rstudio-logo ## Basic plotting using `igraph` The `plot()` function on on `igraph` object generates a quick plot, which usually needs some tweaking. ```r plot(madmen_graph) ``` <img src="5-working_with_and_visualizing_graphs_files/figure-html/unnamed-chunk-22-1.png" height="400" style="display: block; margin: auto;" /> --- class: left, middle, rstudio-logo ## Using properties to control visual features `plot()` uses vertex and edge properties to control visual features. For example, here is how we would only show a label for main Madmen characters: ```r V(madmen_graph)$label <- ifelse(V(madmen_graph)$Main, V(madmen_graph)$name, "") plot(madmen_graph) ``` <img src="5-working_with_and_visualizing_graphs_files/figure-html/unnamed-chunk-23-1.png" height="400" style="display: block; margin: auto;" /> --- class: left, middle, rstudio-logo ## Using properties to control vertex visual features Here are a few of the standard vertex properties used to control visual appearance in `plot()`: * `size`: The size of the vertex * `color`: The fill color of the vertex * `frame.color`: The border color of the vertex * `shape`: The shape of the vertex; multiple shape options are supported including `circle`, `square`, `rectangle` and `none` * `label`: The text of the label, as well as various label features controlled by `label.family`, `label.font`, `label.color`. --- class: left, middle, rstudio-logo ## Using properties to control edge visual features Here are a few of the standard edge properties used to control visual appearance in `plot()`: * `color`: The color of the edge * `width`: The width of the edge * `arrow.size`: The size of the arrow in a directed edge * `arrow.width`: The width of the arrow in a directed edge * `arrow.mode`: Whether edges should direct forward (`>`), backward (`<`) or both (`<>`) * `lty`: Line type of edges, with numerous options including `solid`, `dashed`, `dotted`, `dotdash` and `blank` * `curved`: The amount of curvature to apply to the edge, with zero (default) as a straight edge, negative numbers bending clockwise and positive bending anti-clockwise --- class: left, middle, rstudio-logo ## Using layouts The positioning of vertices in a graph are controlled by layouts. A layout can be stored as a property of the overall graph. ```r madmen_graph$layout <- igraph::layout_in_circle(madmen_graph) plot(madmen_graph) ``` <img src="5-working_with_and_visualizing_graphs_files/figure-html/unnamed-chunk-24-1.png" height="400" style="display: block; margin: auto;" /> --- class: left, middle, rstudio-logo ## Layout options `igraph` offers numerous layout functions to control the layout of your graph. *Force-directed* layouts are very common and visually appealing graph layouts. They use physics principles to position connected vertices as close together as possible and unconnected vertices as far away as possible, finding an an optimal equilibrium. * `layout_with_fr()`: Fruchterman-Reingold - a very common force-directed layout * `layout_with_kk()`: Kamada-Kawaii - another common force-directed layout Other layouts include: * `layout_in_circle()`: Circular layout * `layout_on_sphere()`: 3D-spherical simulation * `layout_on grid()`: Rectangular-grid layout * `layout_with_mds()`: Multidimensional scaling --- class: left, middle, rstudio-logo ## Randomness in visualization Running the same layout function twice can often result in a different layouts due to the fact that random number generation is happening under the hood. ```r madmen_graph$layout <- igraph::layout_with_fr(madmen_graph) plot(madmen_graph) ``` <img src="5-working_with_and_visualizing_graphs_files/figure-html/unnamed-chunk-26-1.png" width="300" height="300" style="float:left; padding-right:40px" style="display: block; margin: auto;" /> <img src="5-working_with_and_visualizing_graphs_files/figure-html/unnamed-chunk-27-1.png" width="300" height="300" style="float:right; padding-left:40px" style="display: block; margin: auto;" /> --- class: left, middle, rstudio-logo ## Setting a seed to control randomness Use the `set.seed()` function with the same seed before every layout command to ensure reproducibility. ```r set.seed(123) madmen_graph$layout <- igraph::layout_with_fr(madmen_graph) plot(madmen_graph) ``` <img src="5-working_with_and_visualizing_graphs_files/figure-html/unnamed-chunk-29-1.png" width="300" height="300" style="float:left; padding-right:40px" style="display: block; margin: auto;" /> <img src="5-working_with_and_visualizing_graphs_files/figure-html/unnamed-chunk-30-1.png" width="300" height="300" style="float:right; padding-left:40px" style="display: block; margin: auto;" /> --- class: left, middle, rstudio-logo ## Exercise - Plotting using `igraph` For our next short exercise, we will do some practice on plotting graphs using `igraph` in R. Go to our [RStudio Cloud workspace](https://rstudio.cloud/spaces/230780/join?access_code=7cXJKFU1KUuuZGLwBVQpLG3dIxPUD3jak3ZQmESh) and start **Assignment 05 - Creating and visualizing graphs**. Let's work on **Exercises 7 and 8**. --- class: left, middle, rstudio-logo ## Visualizing using `ggraph` The `ggraph` package allows those who prefer the grammar of `ggplot2` to plot network graphs. ```r library(ggraph) set.seed(123) # always set seed for static viz ggraph(madmen_graph, layout = "fr") + # set layout in initial ggraph call geom_edge_link(color = "grey") + # basic edge features geom_node_point(size = 5, color = "red") + # basic vertex features theme_void() # empty/blank background ``` <img src="5-working_with_and_visualizing_graphs_files/figure-html/unnamed-chunk-31-1.png" height="300" style="display: block; margin: auto;" /> --- class: left, middle, rstudio-logo ## Using aesthetics to control features Vertex and node properties can be used as `ggplot2` aesthetics. In this example, we color the vertices according to whether the vertex is a main character or not. ```r library(ggraph) set.seed(123) ggraph(madmen_graph, layout = "fr") + geom_edge_link(color = "grey") + geom_node_point(size = 5, aes(color = factor(Main)), show.legend = FALSE) + theme_void() ``` <img src="5-working_with_and_visualizing_graphs_files/figure-html/unnamed-chunk-32-1.png" height="300" style="display: block; margin: auto;" /> --- class: left, middle, rstudio-logo ## Interactive visualization using `networkD3` The `networkD3` package provides an API to the D3 Javascript visualization library, and can be useful for creating simply dynamic and interactive visualization. `igraph` objects need to be converted to be used with `networkD3`. ```r library(networkD3) # structure the madmen graph for D3, grouping according to Main characters madmen_d3 <- networkD3::igraph_to_networkD3( madmen_graph, group = V(madmen_graph)$Main) # this creates a list of links and nodes head(madmen_d3$links, 2) ``` ``` ## source target value ## 1 5 31 0 ## 2 1 11 0 ``` ```r head(madmen_d3$nodes, 2) ``` ``` ## name group ## 1 Betty Draper 1 ## 2 Don Draper 1 ``` --- class: left, middle, rstudio-logo ## Interactive visualization using `networkD3` Then we pass the correct parameters through the `forceNetwork()` function to generate an interactive force-directed network. ```r networkD3::forceNetwork(Links = madmen_d3$links, Nodes = madmen_d3$nodes, NodeID = "name", Source = "source", Target = "target", Group = "group") ``` --- class: left, middle, rstudio-logo ## Interactive visualization using `networkD3`
--- class: left, middle, rstudio-logo ## Exercise - Further visualization options For our next short exercise, we will do some practice on plotting graphs using `ggraph` and `networkD3` in R. Go to our [RStudio Cloud workspace](https://rstudio.cloud/spaces/230780/join?access_code=7cXJKFU1KUuuZGLwBVQpLG3dIxPUD3jak3ZQmESh) and start **Assignment 05 - Creating and visualizing graphs**. Let's work on **Exercises 9 and 10**. --- class: left, middle, rstudio-logo # ☕ Let's have a break! 😌