forked from rich-iannone/DiagrammeR-docs
-
Notifications
You must be signed in to change notification settings - Fork 0
/
03-graph_attributes.Rmd
330 lines (270 loc) · 14.8 KB
/
03-graph_attributes.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
```{r load_dgr_03, include=FALSE, results=FALSE}
library(DiagrammeR)
```
# Graph Attributes {#attrs}
A graph can often convey more useful information when aesthetic properties are applied to its rendered form.
## External NDFs and EDFs
One way to add node or edge attributes, using the functions we've already explored, is to include these attributes and their values when generating node and edge data frames. For a node data frame, we can provide our own argument names and values when calling the `create_node_df()` function. In the example below, we provide values for `n` (the number of nodes, `4` in this case), `label`, and `type` but, in addition, we add the columns: `color`, `shape`, and `data`. Note that we either provide a single-length vector (i.e., a single value) or an *n*-length vector. Anything in between results in `NA` values applied to fill in the length of the data frame. The `color` and `shape` node attributes are instantly recognized by the **Graphviz** rendering engine. (Having the correct types of values are important here, more on that later.) The `data` column is ignored by the rendering engine, however, it is useful to have for other reasons (more on this later, as well).
```{r create_ndf_w_color_shape_data}
# Create a node data frame and display it
ndf <-
create_node_df(
n = 4,
label = TRUE,
type = "lower",
color = "lightgreen",
shape = c("circle", "circle",
"rectangle", "rectangle"),
data = c(2.5, 5.6, 7.4, 1.7))
ndf
```
We can include this node data frame in a new graph and then render it using the Graphviz engine (using `render_graph()` with no arguments except `graph`). Let's have a look.
```{r}
# Incorporate the node data frame into a
# new graph and display that graph; note that
# there are styling attributes being used
graph_4n <-
create_graph(
nodes_df = ndf)
render_graph(graph_4n)
```
We can take a very similar approach to adding styling attributes and a data column to an edge data frame with the `create_edge_df()` function. Here, let's add the `penwidth` (line thicknesses for edges) and the `color` styling attributes as well as a `data` attribute column.
```{r create_edf_w_penwidth_color_data}
# Create an edge data frame
edf <-
create_edge_df(
from = c(1, 4, 3),
to = c(2, 1, 1),
rel = "related_to",
penwidth = c(2.5, 0.5, 1.0),
color = c("pink", "red", "lightblue"),
data = c(1.1, 5.2, 3.4))
edf
```
Now that we have the edge data frame (`edf`), let's *add* that to the graph with the `add_edge_df()` function. Then, render the graph to see the styled nodes and edges.
```{r add_edge_df_render_graph}
# Add the edge data frame to the graph
# object, then, render the graph
graph_4n_3e <-
graph_4n %>%
add_edge_df(
edge_df = edf)
render_graph(graph_4n_3e)
```
## Internal NDFs and EDFs
We can use the `data` fields for both the nodes and the edges to create styling attributes. Doing this generally requires scaling of the values (if they are numerical data, and they are in this case) and generating a new column in the internal ndf or edf that's named for the styling attribute. Thankfully, we have some useful functions to help with this: `rescale_node_attrs()` and `rescale_edge_attrs()`. Let's look at an example of how this works for scaling up the sizes of the node shape:
```{r rescale_node_attrs}
# Create the `width` node attribute column
# by scaling the values in the `data` column
# to between 0.5 and 1.5 units
graph_scaled_nodes <-
graph_4n_3e %>%
rescale_node_attrs(
node_attr_from = "data",
to_lower_bound = 0.5,
to_upper_bound = 1.5,
node_attr_to = "width")
# Display the graph with the new styling
# attribute applied
render_graph(graph_scaled_nodes)
```
This looks pretty good and, conceptually, it isn't too difficult to reason about generating new columns with specific attribute names based on data in existing columns. Notice that the circular nodes were scaled up or down evenly whereas the rectangular one was only scaled horizontally (and this makes sense, we made a `width` attribute/column). If we wanted scaling both in `width` and in `height`, we also need a height column with the same scaled values. There is a set of functions to make this process easier: `copy_node_attrs()` and `copy_edge_attrs()`. It simply copies values from one attribute to a new attribute.
```{r copy_node_attrs}
# Copy the `width` attribute values to
# a new `height` attribute; because we
# want square rectangles
graph_square_rects <-
graph_scaled_nodes %>%
copy_node_attrs(
node_attr_from = "width",
node_attr_to = "height")
# Display the graph and observe squares
render_graph(graph_square_rects)
```
Let's move onto some modifications of the edges. We have already applied some color values manually when constructing the edge data frame before its incorporation into the graph. Let's use the `data` values available in the internal edge data frame to create scaled colors. We'll essentially replace the `color` attribute values already present and scale between the `grey85` and `grey25` colors with the `rescale_edge_attrs()` function.
```{r graph_scale_edge_colors}
# Create the `color` edge attribute column
# by scaling the values in the `data` column
# to between the `grey85` to `grey25` colors
graph_grey_edges <-
graph_square_rects %>%
rescale_edge_attrs(
edge_attr_from = "data",
to_lower_bound = "grey85",
to_upper_bound = "grey25",
edge_attr_to = "color")
# Display the graph with newly grayed edges
render_graph(graph = graph_grey_edges)
```
## Specific Nodes or Edges
We can set node or edge attribute values for all nodes or edges in a graph, or, for specific nodes or edges in a graph. These additions can be done with the `set_node_attrs()` and `set_edge_attrs()` functions. Let's create a random graph with 10 nodes and 15 edges and color some nodes and edges.
```{r set_node_edge_attrs_all}
# Create a random graph and set the
# `fillcolor` as `lightgreen` for all
# nodes in the graph, and, set `color`
# as `blue` for all graph edges
graph_1 <-
create_random_graph(
n = 10,
m = 15,
directed = TRUE,
set_seed = 23) %>%
set_node_attrs(
node_attr = "fillcolor",
values = "lightgreen") %>%
set_edge_attrs(
edge_attr = "color",
values = "blue")
# Display the blue and green graph
render_graph(graph = graph_1)
```
We can be more selective with the node and edge coloring (or with any other attributes or data we would like to apply).
```{r set_node_edge_attrs_specfic_to}
# Set attribute `fillcolor` as `orange`
# for nodes `1` and `3` in the graph;
# for edges, set `color` to `red` for
# all those edges leading to nodes
# `7` and `10`
graph_2 <-
graph_1 %>%
set_node_attrs(
node_attr = "fillcolor",
values = "orange",
nodes = c(1, 3)) %>%
set_edge_attrs(
edge_attr = "color",
values = "red",
to = c(7, 10))
# Display that multicolored graph
render_graph(graph = graph_2)
```
Finally, individual edges can be targeted if supplying vectors for both the `from` and `to` arguments.
```{r set_edge_attrs_specfic_from_to}
# Set attribute `color = "black"` for edges
# `7`->`9` and `1`->`6` using the edge
# data frame
graph_3 <-
graph_2 %>%
set_edge_attrs(
edge_attr = "color",
values = "black",
from = c(7, 1),
to = c(9, 6))
# Display the graph
render_graph(graph = graph_3)
```
## Adding Attributes to Selections
There is a large number of functions that store either a selection of nodes or edges within the graph object. Such functions begin with `select_...` and `trav_...`. Complementing this class of functions are functions that take a graph with an active selection and perform some transformation based on the selection of nodes or edges. These functions are identifiable by ending with `...ws` (with selection). We'll devote two entire chapters on the `select...()` and `trav...()` functions but, for now, we'll provide simple use cases with `select_nodes()` and `select_edges()`.
So, if we can make a selection of nodes or edges (but not both at the same time), how do we style these groups? We can do it with the `set_node_attrs_ws()` and `set_edge_attrs_ws()` functions. In this example, let's make a simple graph, select 2 of the 4 nodes using `select_nodes()` and specifying `nodes = c(1, 2)`, add `fillcolor` attribute values of `lightgreen` for those nodes with `set_node_attrs_ws()`, and then clear the selection with `clear_selection()` (it clears any selection of nodes or edges).
```{r set_node_attrs_ws}
# Create a simple graph, select nodes
# `1` and `2`, color them light green;
# clear the selection
graph_4 <-
create_graph() %>%
add_path(4) %>%
select_nodes(
nodes = c(1, 2)) %>%
set_node_attrs_ws(
node_attr = "fillcolor",
value = "lightgreen") %>%
clear_selection()
# As a further aside on functions, we
# can equivalently use the function
# `select_nodes_by_id()` in the place
# of `select_nodes()` in this instance
# and achieve the same result; let's
# now render the graph
render_graph(graph = graph_4)
```
We can follow a similar sequence for modifying edge attributes for some edges with `select_edges()` and `set_edge_attrs_ws()`. Note that we also use `clear_selection()` as before to clear the active selection which is, in this case, a selection of edges.
```{r set_edge_attrs_ws}
# Take `graph_4`, select edges
# `1->2` and `2->3`, color those red
# and then clear the selection
graph_5 <-
graph_4 %>%
select_edges(
from = c(1, 2),
to = c(2, 3)) %>%
set_edge_attrs_ws(
edge_attr = "color",
value = "red") %>%
clear_selection()
render_graph(graph = graph_5)
```
There are a few more `select_...()` functions that are useful and they allow you to make selections based on properties of nodes or edges in a graph. For nodes, these include:
- `select_nodes_by_degree()`: supply some information on degree (number of connections to/from/total for a node) to an `expressions` argument and a selection of nodes satisfying the expressions will be made
- `select_nodes_in_neighborhood()`: create a selection of nodes based on a walk distance from a specified node (supplying a single node ID for `node`)
- `select_last_nodes_created()`: if you just created a large amount of nodes, you can select them straight away with this function and then apply some transformation with a `..._ws()` function like `set_node_attrs_ws()`
The `select_nodes()` function also has some advanced functionality that can be accessed by supplying filtering statements to its `conditions` argument. Moreover, the `select_edges()` function also has selective filtering capabilities. More details are available on this in a later chapter, but for now it is good to know that this is possible.
In addition, edges can be selected with:
- `select_edges_by_edge_id()`: given some edge ID values, select those exact edges
- `select_edges_by_node_id()`: given some node ID values, select all edges associated (i.e., linked to or from) those nodes
- `select_last_edges_created()`: analogous to the `select_last_nodes_created()` function; useful for applying attribute values to just-created edges
These functions are best illustrated with a few examples. First, let's make a random graph (`26` nodes and `52` edges) and select nodes with a total degree (number of edges inbound + number of edges outbound) greater than or equal to `6`. The graph will be generated by the `create_random_graph()` function, the selection will be performed by the `select_nodes_by_degree()` function (using `expressions = "deg >= 6"` to express our filtering of nodes by total degree), and the coloring of the selected nodes will be accomplished using the `set_node_attrs_ws()` function. We'll also do another interesting thing: color all nodes not satisfying our stated condition. This last modification will be done by inverting the node selection with `invert_selection()` and then using `set_node_attrs_ws()` again, this time on the complementary node selection.
```{r select_nodes_by_degree}
# Create a random graph and label
# nodes by letters of the alphabet
graph_26n_52e <-
create_random_graph(
n = 26,
m = 52,
directed = TRUE,
set_seed = 23) %>%
set_node_attrs(
node_attr = "label",
values = LETTERS) %>%
set_node_attrs(
node_attr = "type",
values = "letter")
# Select nodes that have a (total)
# degree >= 6, color these nodes
# `purple` and the rest `lightgreen`
graph_26n_52e <-
graph_26n_52e %>%
select_nodes_by_degree(
expressions = "deg >= 6") %>%
set_node_attrs_ws(
node_attr = "fillcolor",
value = "purple") %>%
invert_selection() %>%
set_node_attrs_ws(
node_attr = "fillcolor",
value = "lightgreen") %>%
clear_selection()
# Render the graph
render_graph(graph = graph_26n_52e)
```
Interesting! And we're starting to visualize something of a useful insight, nodes that have greater connectedness than other nodes (distinguished by level of total degree) stand out by a visual aesthetic. This methodology is not limited to the visual either. We could, for instance, provide information on the nodes that have fulfilled the node degree requirements for reporting purposes. One can always retrieve the selection of nodes with the `get_selection()` function:
```{r select_nodes_by_degree_get_selection}
hi_deg_nodes <-
graph_26n_52e %>%
select_nodes_by_degree(
expressions = "deg >= 6") %>%
get_selection()
# Report the node IDs that have
# exceptionally high degree values
paste(
"Nodes",
paste(hi_deg_nodes, collapse = ", "),
"all have high degree.")
```
Getting a targeted list of node IDs is a great start, however, node ID values aren't usually very meaningful to the end user. Usually, we ascribe labels to nodes to represent entities we'd like to model in the graph. Examples include (unique) user names, gene identifers, project names, etc. So, let's get the values associated with the `label` and `values` node attributes in the selection. One strategy for doing this is involves the following sequence of functions: `select_nodes_by_degree()` (as before), `create_subgraph_ws()` to make a graph *subset* based on the selection of nodes, `get_node_df()` to return a data frame containing the graph's nodes and all attributes (the internal node data frame), and, finally, **dplyr**'s `select()` function to select only those data frame columns we want to keep (`label` and `value`).
```{r select_nodes_by_degree_create_subgraph}
# Load in the dplyr package
library(dplyr)
# Get a data frame of node `label`
# and associated data values for
# nodes with total degree >= 6
hi_deg_node_data <-
graph_26n_52e %>%
select_nodes_by_degree(
expressions = "deg >= 6") %>%
create_subgraph_ws() %>%
get_node_df() %>%
dplyr::select(label, value)
# Display the final data frame
hi_deg_node_data
```
This is more like it! But imagine that these label values could convey more meaning in a selected domain, like city names, species names, user names, or other names for things.