-
Notifications
You must be signed in to change notification settings - Fork 136
/
geocoding-with-r.Rmd
272 lines (192 loc) · 32.3 KB
/
geocoding-with-r.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
---
title: "Geocoding with R"
output:
md_document:
variant: markdown
always_allow_html: yes
---
```{r setup, include=FALSE}
post <- "geocoding-with-r"
source("common.r")
```
In the [previous post](https://jessesadler.com/post/excel-vs-r/) I discussed some reasons to use [R](https://www.r-project.org) instead of Excel to analyze and visualize data and provided a brief introduction to the R programming language. That post used an example of letters sent to the sixteenth-century merchant [Daniel van der Meulen](https://jessesadler.com/project/dvdm-correspondence/) in 1585. One aspect missing from the analysis was the visualization of the geographical aspects of the data. This post will provide an introduction to geocoding and mapping location data using the [ggmap](https://cran.r-project.org/web/packages/ggmap/index.html) package for R, which enables the creation of maps with [ggplot](http://ggplot2.tidyverse.org). There are a number of websites that can help geocode location data and even create maps.[^1] You could also use a full-scale geographic information systems (GIS) application such as [QGIS](https://qgis.org) or [ESRI](https://esri.com). However, an active developer community has made it possible to complete a full range of geographic analysis from geocoding data to the creation of publication-ready maps with R.[^2] Geocoding and mapping data with R instead of a web or GIS application brings the general advantages of using a programming language in analyzing and visualizing data. With R, you can write the code once and use it over and over, while also providing a record of all your steps in the creation of a map.[^3]
This post will merely scratch the surface of the mapping capabilities of R and will not enter into the domain of the more complex specific geographic packages available for R.[^4] Instead, it will build on the [dplyr](http://dplyr.tidyverse.org) and [ggplot](http://ggplot2.tidyverse.org) skills discussed in [my brief introduction to R](https://jessesadler.com/post/excel-vs-r/). The example of geocoding and mapping with R will also provide another opportunity to show the advantages of coding. In particular, geocoding is a good example of how code can simplify the workflow for entering data. Instead of dealing with separate spreadsheets, one containing information about the letters and the other with geographic information, with code the geographic information can be created directly from the contents of the letters data. This has the added advantage that the code to find the longitude and latitude of locations can be saved as a R script and rerun if new data is added to ensure that the information is always kept up to date.
## Preparing the data with `dplyr` {#preparing-data}
In this example, I will use the same database of letters sent to Daniel van der Meulen in 1585 as I did in the previous post. You can find the data and the R script that goes along with this tutorial on [GitHub](https://github.com/jessesadler/intro-to-r). Before getting into the database of letters and figuring out how to geocode the locations found in the data, it is necessary to set up the environment in R by loading the libraries that we will be using. Here, I load both the `tidyverse` library to import and manipulate the data and the `ggmap` library to do the actual geocoding and mapping.
```{r libraries}
library(tidyverse)
library(ggmap)
```
As in the previous post, that data is loaded through the `read_csv()` function from the `readr` package. It is best to keep the names of objects consistent across scripts. Therefore, I name the object `letters` with the assignment operator. Printing out the contents reveals the that `letters` is a [tibble](http://tibble.tidyverse.org) with 114 letters and four columns.
```{r load data}
letters <- read_csv("data/correspondence-data-1585.csv")
letters
```
To do the actual geocoding of the locations I will be using the `mutate_geocode()` function from the `ggmap` package. To geocode a number of locations at one time, the function requires a data frame with a column containing the locations we would like to geocode. The goal, then, is to get a data frame with a column that contains all of the distinct locations found in the `letters` data frame. In the [introduction to R](https://jessesadler.com/post/excel-vs-r/#new-dataframes) post I used the `distinct()` function to get data frames with the unique sources and destinations. We can rerun that code here and look at the results.
```{r distinct sources/destinations}
sources <- distinct(letters, source)
destinations <- distinct(letters, destination)
```
```{r print sources/destinations}
sources
destinations
```
A glance at the two data frames shows that neither provide exactly what I are looking for. Neither the `sources` nor the `destinations` data frames include all of the locations that we want to geocode. It would be possible to geocode both the `sources` and `destinations` data frames, but this would place the geocoded information in two different data frames, which is less than ideal. Instead, we can join the two data frames together using the [join functions in dplyr](http://r4ds.had.co.nz/relational-data.html). Coming from the world of spreadsheets, the join functions are a revelation, opening up seemingly endless possibilities for data manipulation.
The `dplyr` package includes a number of functions to join two data frames together to create a single data frame. The join functions use overlapping columns of data contained in both data frames, called keys, to match up the data. There are three join functions that are used most often. The `left_join()` keeps all observations from the left, or first, data frame, while dropping all rows from the right, or second, data frame that do not have a match in the left data frame. The `inner_join()` only keeps rows that contain matching keys in both data frames. Conversely, the `full_join()` brings together all rows from both the left and right data frames. The differences between these functions may not be immediately apparent, but you can experiment with them to see the variety of outputs they create.
Using `sources` as the left and `destinations` as the right data frames, a `left_join()` creates a new object with 9 rows, an `inner_join()` results in only one row, and a `full_join()` contains 13 observations. Thus, the `full_join()` is what we are looking for. The `full_join()` function — like the other join functions — takes three arguments: the two data frames to join and the key column by which they are to be joined. In this case, some extra work needs to be done to the identify the key columns. The key columns are the only columns in the two data frames, but because they have different names, it is necessary to declare that they are equivalent. This is done with the help of the concatenate or combine function, `c()`.[^5] The below command creates a new data frame that I have called `cities`, which brings together the "source" column with the "destination" column and therefore contains all of the locations found in the `letters` data frame.
```{r join locations}
cities <- full_join(sources, destinations, by = c("source" = "destination"))
cities
```
Printing out the `cities` data frame shows that there are 13 distinct locations in the `letters` data. However, the structure of the `cities` object is less than ideal. The name of the column is "source," which was taken over from the `sources` data frame, but this is not an accurate description of the data in the column. This can be fixed with the help of `rename()`, which uses the structure of `new_name = old_name`. Here, I change the name of the “source” column to "place" and then print out the first two rows with the `head()` function to show the change in the column name. Notice that using the `cities` object within the `rename()` function and using the same name for the result overwrites the original object with the new one. Alternatively, you could name the new object a different name such as `cities1`.[^6]
```{r rename source column}
cities <- rename(cities, place = source)
head(cities, n = 2)
```
## Geocoding with `ggmap` {#geocoding}
We now have an object with the basic structure needed to geocode the locations. However, if you run `mutate_geocode()` on the `cities` object as it is, you will receive an error. The error is a good example of a common frustration with coding. Computers are picky, and because humans also write flawed code, bugs exist, making computers picky in odd ways. In this case, we are running into a problem in which the `mutate_geocode()` function will not work on tibbles. As noted in my previous post, [tibbles](http://r4ds.had.co.nz/tibbles.html) are a special kind of data frame used by the `tidyverse` set of packages. It is usually easier to work with tibbles in the [tidyverse](https://www.tidyverse.org) workflow, but here it is necessary to convert the tibble to a standard data frame object with `as.data.frame()`. I give the result a new name to distinguish it from the `cities` tibble.
```{r tibble to df}
cities_df <- as.data.frame(cities)
```
The locations data from the letters sent to Daniel is now ready to be geocoded. The `mutate_geocode()` function uses [Google Maps](https://maps.google.com) to find the longitude and latitude of each location. The two necessary arguments are the data frame and the name of the column with the location data. The function can be used to find more information about each location, including the country and region, but here I just have the function return the longitude and latitude data. The function will query Google Maps, and so you must have an internet connection. This makes running the command relatively slow, especially if your data contains a large amount of locations. There is also a limit of 2,500 queries per day, so you may have to find other methods if you are geocoding thousands of locations.
```{r geocode data, eval = FALSE}
locations_df <- mutate_geocode(cities_df, place)
```
```{r load locations data, include = FALSE}
locations <- read_csv("data/locations.csv")
locations_df <- as.data.frame(locations)
```
Because `mutate_geocode()` necessitates an internet connection, is somewhat slow to run, and has a daily limit, it is not something that you want to do all the time. It is a good idea to save the results by writing the object out to a csv file. Before doing this, however, we should inspect the data and make sure everything is correct. Let's start by printing out the `locations_df` object and see what we have.
```{r print locations_df}
locations_df
```
At a quick glance, the result looks like what we would expect. `locations_df` is a data frame with three columns called "place,” "lon,” and "lat" with the latter two representing longitude and latitude of the location named in the first column. One thing that we might want to change, though this step is not necessary, is to convert the data frame back to a tibble, which can be done with the `as_tibble()` function.
```{r into tibble}
locations <- as_tibble(locations_df)
```
A quick look at the values of the longitude and latitude columns in the `locations` data would seem to indicate that the geocoding process occurred correctly. All of the longitude and latitude values are within a reasonable range given the cities in the data. However, because the data given to the Google Maps query consisted of only the names of the cities, it is always worth double checking the returned values. As anyone who has ever used Google Maps before knows, it is possible that the query could return the location of a different place that has the same name.[^7]
One way to check that the geocoding was done correctly is to map the locations with the [mapview](https://cran.r-project.org/web/packages/mapview/index.html) package. Using `mapview` requires converting the `locations` tibble to yet another format. This is done with the [simple features](https://cran.r-project.org/web/packages/sf/) package, which brings us to the world of GIS with R. The first step is to load the `sf` and `mapview` packages.
```{r load spatial packages}
library(sf)
library(mapview)
```
I will not get into the details of the `sf` package and type of objects that it creates here, but the function to transform the `locations` tibble into an `sf` object is understandable even without knowing the details. The function to make an `sf` object takes three main arguments. The first two are the data frame to be converted and the columns that contain the geographic data. This second argument uses the `c()` function to combine the "lon" and "lat" columns. The third argument determines the [coordinate reference system (crs)](https://en.wikipedia.org/wiki/Spatial_reference_system) or projection used for the data. Here, I indicate that I want the longitude and latitude to be plotted using the World Geographic System 1984 projection, which is referenced as [European Petroleum Survey Group (EPSG)](http://wiki.gis.com/wiki/index.php/European_Petroleum_Survey_Group) 4326. Geographic jargon aside, what matters at this stage is that EPSG 4326 is the projection used by web maps such as Google Maps.[^8]
``` {.r}
locations_sf <- st_as_sf(locations, coords = c("lon", "lat"), crs = 4326)
```
With the data in the correct format, a simple call to the `mapview()` function creates an interactive map with all the locations plotted as points. You can click on a point to see its name and compare it to the locations on the map. With this data, the accuracy of the location only needs to be at the city level. The location within the city is not relevant. Inspecting the data from the map shows that all of the locations were correctly geocoded.
```{r mapview, eval = FALSE}
mapview(locations_sf)
```
I can now save the locations data using the `readr` package and a function similar to that used to load data. I will use the `write_csv()` function to save the data as a csv file. Here, I save the `locations` tibble, but you could also save the other forms of the locations data. The second argument tells the function where to save the csv and what to call the file. Here, I place the file in the same folder as the “correspondence-data-1585.csv” and name the file “locations.csv”.
``` {.r eval = FALSE}
write_csv(locations, "data/locations.csv")
```
## Mapping with `ggmap` {#mapping}
Now that we have successfully geocoded the locations from which Daniel's correspondents sent letters and in which Daniel received them, we can move on to the task of plotting the locations on a map with `ggmap`.[^9] `ggmap` is essentially an extension of `ggplot`. It enables you to plot a map as the background of a `ggplot` graph. The two main plotting features of `ggmap` are the `get_map()` function, which downloads a map from a specified map provider, and the `ggmap()` function that plots the downloaded map on a `ggplot` plot. Once a map is downloaded and plotted, it is then possible to use the normal grammar of `gglot` to visually represent that data on the map.
The `get_map()` function can access data from three different map providers: [Google Maps](http://maps.google.com), [Open Street Maps](http://www.openstreetmap.org), and [Stamen Maps](http://maps.stamen.com/). In this example, I will use Google Maps and the `get_googlemap()` function. The challenge with the `get_map()` function is downloading a map with a zoom level and location that shows all of the data with minimal extra space. We need two pieces of information to do this for `get_googlemap()`: a location for the center of the map and a zoom level between 1 and 21.[^10] Figuring out the best center and zoom level for a map may take some trial and error. However, the work that we have already done can help to make an educated first guess. We can rerun `mapview(locations_sf)` and look at the details provided by the map. The map produced by `mapview()` shows the longitude and latitude at your cursor position and tells the zoom level of the map. We can use the cursor to guess a good center of the map or zoom in to find a city, which we can then geocode. After a couple of tries, I found that Mannheim, Germany works as a good center for a map with a zoom level of 6. I can get the coordinates of Mannheim with the generic `geocode()` function and then place the coordinates into the `get_googlemap()` function. The only alteration that needs to be made is to glue together the longitude and latitude values into one object with the concatenate function, `c()`.
```{r geocode mannheim}
geocode("mannheim")
```
```{r google map}
map <- get_googlemap(center = c(8.4, 49.5), zoom = 6)
```
We can look at what the map looks like by calling the `ggmap()` function.
```{r basic-google-map}
ggmap(map)
```
Given the historical nature of the data, some of the normal features of a Google Map are problematic. The modern road system obviously did not exist in the sixteenth century, nor are the modern political boundaries useful for the data. In the below command, I change the aesthetics of the original map using the color argument and turn off features of the map by using commands from the [Google Maps API](https://developers.google.com/maps/documentation/staticmaps/).
```{r styled map}
bw_map <- get_googlemap(center = c(8.4, 49.5), zoom = 6,
color = "bw",
style = "feature:road|visibility:off&style=element:labels|visibility:off&style=feature:administrative|visibility:off")
```
Now let's see what the map looks like, but this time let's add the location data. This is done using the normal `ggplot` functions. Here, I want to add points for each place in the `locations` data that we created above. The `aes()` function within `geom_point()` tells `ggplot` that the x-axis corresponds to the longitude and the y-axis to the latitude of each place.
```{r locations-map}
ggmap(bw_map) +
geom_point(data = locations, aes(x = lon, y = lat))
```
The resulting map is rather sparse and does not provide much information, but it gives a good starting point from which to build a more informative map.
## Adding data with `dplyr` {#adding-data}
The above map used the `locations` data to plot Daniel van der Meulen's correspondence in 1585, but because it did not use the `letters` data, the map could not tell us anything about the number of letters sent from or received in each location. Visualizing this information is a two-step process. Firstly, it is necessary to find out how many letters were sent from and received in each location. Because we need to distinguish between sent and received locations, this data will be contained in two data frames. To do this, we can reuse the `per_source` and `per_destination` code discussed in the [previous post](https://jessesadler.com/post/excel-vs-r/##the-pipe), which summarizes the amount of letters sent from and to each location. Secondly, we have to join the longitude and latitude information from the `locations` data frame to the `per_source` and `per_destination` data frames.
As a reminder, to create the `per_source` and `per_destination` objects I will use the `group_by()` and `summarise()` workflow. This groups the data by one or more defined variables and creates a new column that counts the number of unique observations from the variable(s). Below, I make `per_source` and `per_destination` data frames and print out the `per_destination` object to show its form.
```{r per_source/destination}
per_source <- letters %>%
group_by(source) %>%
summarise(count = n()) %>%
arrange(desc(count))
per_destination <- letters %>%
group_by(destination) %>%
summarise(count = n()) %>%
arrange(desc(count))
```
```{r per_destination}
per_destination
```
The `per_source` and `per_destination` data frames are in the basic structure that we want, but we need to add longitude and latitude columns to both of the objects so that they can be plotted on our map. Here, I will use a `left_join()`. As noted above, a `left_join()` only keeps the observations from the first data frame in the function. In other words, the result of a `left_join()` will have the same number of rows as the original left data frame, while adding the longitude and latitude columns from the `locations` data frame. The data frames will be joined by the columns containing the name of the cities. Because these "key" columns have different names, it is again necessary to denote their equivalency with `c()`. Below, I print out the newly created `geo_per_destination` data frame to show its structure.
```{r geo_per_source/destination}
geo_per_source <- left_join(per_source, locations, by = c("source" = "place"))
geo_per_destination <- left_join(per_destination, locations, by = c("destination" = "place"))
```
```{r geo_per_destination}
geo_per_destination
```
I now have the necessary data to create a map that will distinguish between the sources of the letters and their destinations and will allow me to show the quantities of letters sent from and received in each location.
## Mapping the data {#mapping-data}
Creating a quality visualization with `ggplot` involves iteration. Because the different parts of the plot are all written out in code, aspects can be added, subtracted, or modified until a good balance is found. Let's start by creating a basic map using the `geo_per_source` and `geo_per_destination` data. The structure of the command is similar to that used to make the first map, but now that I am using information from two data frames, I need to use two `geom_point()` functions. There is also a small change to the `ggmap()` function to have the map take up the entire plotting area so that the longitude and latitude scales do not show. The only change to the `geom_point()` function is the addition of different colors for the two sets of points. This makes it easier to distinguish between places from which Daniel's correspondents sent letters and places where he received them. Notice that the argument for the color of the points is placed outside of the `aes()` function. This makes all of the points plotted from each data frame a single color as opposed to mapping a change in color to a variable within the data. In this instance, I chose to specify color by name, but it is also possible to use rgb values, hex values, or a number of different color palettes.[^11]
```{r data-map1, fig.width = 8, fig.asp = 0.925}
ggmap(bw_map) +
geom_point(data = geo_per_destination,
aes(x = lon, y = lat), color = "red") +
geom_point(data = geo_per_source,
aes(x = lon, y = lat), color = "purple")
```
This plot is much better than what we started with, but it still has a couple of issues. In the first place, it does not communicate any information about the quantity of letters. In addition, because the points are opaque, it is not clear that letters were both sent from and to Haarlem. The former issue can be rectified by using the size argument within the `aes()` function. This will tell `ggplot` to vary the size of each of the points in proportion to the count column. By default, the size aesthetic creates a legend to indicate the scale used. In the `ggmap()` function I place the legend in the top right corner of the map since there are no data points there. The latter issue is solved by adding an alpha argument to the two `geom_point()` functions. This argument is placed outside of the `aes()` function, because it is an aspect we want to apply to all points. [Alpha](https://www.w3schools.com/css/css3_colors.asp) describes the translucency of an object and takes values between 1 (opaque) and 0 (translucent).
```{r data-map2, fig.width = 8, fig.asp = 0.875}
ggmap(bw_map) +
geom_point(data = geo_per_destination,
aes(x = lon, y = lat, size = count),
color = "red", alpha = 0.5) +
geom_point(data = geo_per_source,
aes(x = lon, y = lat, size = count),
color = "purple", alpha = 0.5)
```
The above map is more informative, but it is hardly a finished product. For instance, there is no explanation for the differences in the color of the points, the smallest points are not easy to see, and there are no labels to indicate the names of the cities. Let's deal with these issues one at a time to create a more fully fleshed out map. This will serve as an opportunity to demonstrate both the flexibility and complexity of `ggplot` code.[^12]
The different colors for sent and received locations are not defined in a legend in the previous plot, because `ggplot` only creates a legend for arguments within an `aes()` function. Even though the color does not change within the data for each `geom_point()` function, it is possible to place the color in the `aes()` function when used in tandem with `scale_color_manual()`. Inside the `aes()` function the color is associated with a name that will appear in the legend. The actual color to be used is defined in a separate `scale_color_manual()` function with the values argument.[^13]
A similar type of scale function also makes it possible to manually control the minimum and maximum size of the points drawn by the `size = count` argument within the `aes()` functions. For this last plot, I decided to make the minimum size of the point 2 and the maximum size 9. The range of the sizes for points is a good example of a plot element that you can play around with, trying out different sizes until you find one that works. Note too, that the best sizes of plot elements will also depend on the format in which the finished plot will be presented.
In changing the background map from the default Google Map, I took out the city labels, but this makes it difficult to know which cities are represented by the different points. The plot only contains fourteen points, making it possible to label each point without too much clutter. In `ggplot`, [labels are geoms](http://r4ds.had.co.nz/graphics-for-communication.html#annotations); they are distinct elements placed on a plot with separate geom functions. Labeling points can be done with either `geom_text()` or `geom_label()`. `geom_text()` places text at the indicated position, while `geom_label()` places the text within a white text box. In addition to x and y coordinates, the two geoms require a `label` argument, which indicates the variable that should be used for the text. By default `geom_text()` and `geom_label()` are placed exactly on the x and y coordinates given by the data. It is possible to nudge the placement of the labels with the `nudge_x` and `nudge_y` arguments. However, instead of using these, I will take advantage of the `geom_text_repel()` function from the `ggrepel` package. This package automatically chooses the placement of individual labels to ensure that they do not overlap. Note too that I use the `locations` data frame for the data of the geom, since `locations` contains the longitude and latitude of each point on the map.
The final touches to the map can be made by ensuring that all of the elements are clearly noted. [Labels for the plot itself](http://r4ds.had.co.nz/graphics-for-communication.html#label) can be made with the `labs()` function. For this plot, I will add a descriptive title and change the labels for the two legends. By default, `ggplot` uses the name of the indicated column as the label for the legend. This is shown in the above map, where the size of the point is labeled as "count." The default label can be replaced with a more informative one by indicating the aesthetic to be changed. Here, I will rename the size aesthetic as "Letters." In addition, I chose not to have a label for the color aesthetic, which is indicated by "NULL". Finally, I altered the size of the points drawn in the color legend to a larger size. [This is done with the `guides()` function](http://r4ds.had.co.nz/graphics-for-communication.html#legend-layout), which changes the scales of different aspects of the plot. In this case, I use the `override.aes` argument to have the red and purple points in the legend be drawn at `size = 6`.
```{r ggrepel}
library(ggrepel)
```
```{r data-map3, fig.width = 8, fig.asp = 0.8}
ggmap(bw_map) +
geom_point(data = geo_per_destination,
aes(x = lon, y = lat, size = count, color = "Destination"),
alpha = 0.5) +
geom_point(data = geo_per_source,
aes(x = lon, y = lat, size = count, color = "Source"),
alpha = 0.5) +
scale_color_manual(values = c(Destination = "red", Source = "purple")) +
scale_size_continuous(range = c(2, 9)) +
geom_text_repel(data = locations, aes(x = lon, y = lat, label = place)) +
labs(title = "Correspondence of Daniel van der Meulen, 1585",
size = "Letters",
color = NULL) +
guides(color = guide_legend(override.aes = list(size = 6)))
```
## Conclusion {#concluding}
The above map demonstrates the geographic spread of Daniel van der Meulen’s correspondence in 1585 and shows the relative significance of the location of his correspondents and the different places in which he received letters throughout the year. More important than the details of the map for the purposes of this post is the process by which it was created. One interesting feature that I would like to emphasize in concluding is that the map uses data from three different data frames or tables: `geo_per_destination`, `geo_per_source`, and `locations`. All three of these data frames derive from the original `letters` data, while they in turn were the product of yet more data frames. By my count, running the commands contained in this post leads to the creation of 12 different data frame like objects. The below diagram outlines the workflow. The ability to split, subset, transform, and then join newly created tables in a variety of ways is a very powerful and flexible workflow. Because the data frames can be recreated by running the code, there is minimal overhead in managing them, especially in comparison to creating tables within spreadsheets. In this case, I would only recommend that the `locations` data frame be saved for easy access in other R scripts and sessions. The other objects can be created on demand, or even more can be added, while the individual aspects of the map can be endlessly tweaked using the power of `ggplot`.
[^1]: Geocoding can be done with websites such as [geonames](http://www.geonames.org) and the service provided by [Texas A&M](http://geoservices.tamu.edu/Services/Geocode/). There are also a number of [options for creating maps on the web](http://crln.acrl.org/index.php/crlnews/article/view/16772/18314).
[^2]: See the Introduction of [Robin Lovelace, Jakub Nowosad, and Jannes Meunchow, *Geocomputation with R*](https://bookdown.org/robinlovelace/geocompr/) for a good overview of the GIS capabilities of R.
[^3]: [Alex David Singleton, Seth Spielman, and Chris Brunsdon, “Establishing a Framework for Open Geographic Information Science,”](https://www.cdrc.ac.uk/wp-content/uploads/2016/04/AS_EstablishingAFrameworkforOpenGIS.pdf) *International Journal of Geographical Information Science*, 30 (2016): 1507-1521.
[^4]: A full list of geographic packages for R can be found at [R Packages for the Analysis of Spatial Data](https://cran.r-project.org/web/views/Spatial.html). In future posts I will discuss some of these packages and their capabilities.
[^5]: In this command the quotations around "source" and "destination" are necessary. They tell the function that these are strings of characters.
[^6]: It is good practice to stay away from overwriting objects until you know whether the code works. However, if you mess up, it is always possible to recreate the original object by rerunning the code that created it.
[^7]: For instance, geocoding "Naples" will return the longitude and latitude of Naples, Florida and not Naples, Italy.
[^8]: If you want to get an idea of how an `sf` object differs from a normal tibble or data frame, you can print out the `locations_sf` object. This shows that the longitude and latitude columns have been combined into a special type of column called `simple_feature` and named "geometry", while additional information about the CRS is also provided.
[^9]: For a fuller discussion of `ggmap`, see [David Kahle and Hadley Wickham, "ggmap: Spatial Visualization with ggplot2.”](https://journal.r-project.org/archive/2013-1/kahle-wickham.pdf)
[^10]: A zoom level of 1 is the most zoomed out, while 21 is the most zoomed in.
[^11]: [W3 Schools has a good discussion on the use of rgb and hex colors](https://www.w3schools.com/colors/default.asp). For the use of colors with `ggplot`, see Hadley Wickham, *ggplot2: Elegant Graphics for Data Analysis*, Second edition (Springer, 2016), 133–145. A full list of the named colors available in R can be found [here](http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf)
[^12]: For a full discussion of the powers of `ggplot`, see [Wickham, *ggplot2*](http://www.springer.com/us/book/9783319242750).
[^13]: Wickham, *ggplot*, 142–143.