After a long absence today an easy guide to print maps with ggplot.
First we need a map. As example we use a map of Austria on local community (Gemeinden) level.
An ideal data format ist JSON. For Austria one source for this maps is here:
https://github.com/ginseng666/GeoJSON-TopoJSON-Austria
You can choose between three detail levels. For our needs we can stick with the lowest detail level. So the file we are interested in is: "gemeinden_999_geo.json"
What to show? For experimental purposes we use the vaccination rate on local community level, which can be downloaded here. Of course it is crucial to have the same detail level in your data as in your map!:
https://info.gesundheitsministerium.at/opendata
The file we are looking for is here:
https://www.data.gv.at/katalog/dataset/covid-19-schutzimpfungen-impfungen-in-gemeinden/resource/89ad8077-0318-4dfb-953f-0824754b5adc
I am not sure if this link is persistent, so you have to look for a file named impfungen-gemeinden.csv
As we got all of our data it is time to combine the data and show the map.
First the libraries we need:
library(tidyverse)
library(janitor)
library(sf)
Then we read the geo data out of the json file and calculate the center of the geo data. I will not use this calculations for this post, but it is nice to have this information. A typical way of using it would be to print the name (or an abbrevation) at exactly the center of the local community. In this case as there are over 2000 communities it is not very useful.
austria <- sf::st_read(("gemeinden_999_geo.json"), quiet=TRUE, stringsAsFactors=FALSE) %>%
mutate(
center = map(geometry, sf::st_centroid),
centercoord = map(center, sf::st_coordinates),
ccordx = map_dbl(centercoord, 1),
ccordy = map_dbl(centercoord, 2)) %>%
mutate(gkz = as.numeric(iso)) %>%
select(-iso)
Because of practical reasons I rename the column "iso" to "gkz", which is the better name for "Gemeindekennzahl" = community code and transform it into a numeric.
The the vaccination rate:
vac.raw = read.csv("impfungen-gemeinden.csv", sep=";") %>%
janitor::clean_names() %>%
transmute(gkz= gemeindecode,
vip100=vollimmunisierte_pro100)
The vaccination rate, above as "vip100", is measured in number of fully vaccinated people out of 100. I use the janitor package to get standard column names and rename "gemeindecode" to "gkz". Not by chance the same name as the community code in our austria map data.
Finally we combine geo data and vaccination data:
data =
austria %>%
filter(gkz!=90001) %>%
left_join(vac)
I remove one community with the number 90001 because this is a special case. 90001 is the community code for whole Vienna, which can be divided into districts "Bezirke". The geo data file above includes Vienna as a whole and its districts. As the vaccination data is shown on Vienna district level i can remove Vienna as whole. Vienna and its districts are overlaying, so normally you would have to remove either one or the other.
Done! We can now print the map:
data %>%
ggplot() +
geom_sf(aes(fill=vip100),
size=.3)+
scale_color_discrete(guide="none")+
scale_fill_gradient(low="white",high = "darkblue")+
coord_sf(datum=NA) + # datum=NA to supress long and lat labels
theme_minimal()+
labs(x="",y="",
title="Covid 19 vaccination rate in %",
subtitle ="2021-07-30",
caption= "Source: https://github.com/ginseng666/GeoJSON-TopoJSON-Austria, Österreichisches Gesundheitsministerium") +
theme(panel.border = element_blank(),
legend.title = element_blank()) +
list()
geom_sf does the trick and fill=vip100 says what to use as fill value. You can spend a lot of time to find the right fill gradient values for low (the lowest number) and high (the highest number) if you want to.
Have fun, more in Part 2.
Comments
Post a Comment