The wcde
package allows for R users to easily download
data from the Wittgenstein Centre
for Demography and Human Capital Data Explorer as well as containing
a number of helpful functions for working with education specific
demographic data.
You can install the released version of wcde
from CRAN with:
install.packages("wcde")
Install the developmental version with:
library(devtools)
install_github("guyabel/wcde", ref = "main")
The get_wcde()
function can be used to download data
from the Wittgenstein Centre Human Capital Data Explorer. It requires
three user inputs
indicator
: a short code for the indicator of
interestscenario
: a number referring to a SSP narrative, by
default 2 is used (for SSP2)country_code
(or country_name
):
corresponding to the country of interestlibrary(wcde)
# download education specific tfr data
get_wcde(indicator = "etfr",
country_name = c("Brazil", "Albania"))
#> # A tibble: 192 × 6
#> scenario name country_code education period etfr
#> <dbl> <chr> <dbl> <chr> <chr> <dbl>
#> 1 2 Brazil 76 No Education 2020-2025 2.16
#> 2 2 Albania 8 No Education 2020-2025 2.31
#> 3 2 Brazil 76 Incomplete Primary 2020-2025 2.16
#> 4 2 Albania 8 Incomplete Primary 2020-2025 2.51
#> 5 2 Brazil 76 Primary 2020-2025 2.16
#> 6 2 Albania 8 Primary 2020-2025 2.17
#> 7 2 Brazil 76 Lower Secondary 2020-2025 1.71
#> 8 2 Albania 8 Lower Secondary 2020-2025 1.88
#> 9 2 Brazil 76 Upper Secondary 2020-2025 1.30
#> 10 2 Albania 8 Upper Secondary 2020-2025 1.61
#> # … with 182 more rows
# download education specific survivorship rates
get_wcde(indicator = "eassr",
country_name = c("Niger", "Korea"))
#> # A tibble: 6,912 × 8
#> scenario name country_code age sex education period eassr
#> <dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <dbl>
#> 1 2 Niger 562 15--19 Male No Educati… 2020-… 0.987
#> 2 2 Republic of Korea 410 15--19 Male No Educati… 2020-… 0.999
#> 3 2 Niger 562 15--19 Male Incomplete… 2020-… 0.987
#> 4 2 Republic of Korea 410 15--19 Male Incomplete… 2020-… 0.999
#> 5 2 Niger 562 15--19 Male Primary 2020-… 0.989
#> 6 2 Republic of Korea 410 15--19 Male Primary 2020-… 0.999
#> 7 2 Niger 562 15--19 Male Lower Seco… 2020-… 0.990
#> 8 2 Republic of Korea 410 15--19 Male Lower Seco… 2020-… 0.999
#> 9 2 Niger 562 15--19 Male Upper Seco… 2020-… 0.992
#> 10 2 Republic of Korea 410 15--19 Male Upper Seco… 2020-… 0.999
#> # … with 6,902 more rows
The indicator input must match the short code from the indicator
table. The find_indicator()
function can be used to look up
short codes (given in the first column) from the
wic_indicators
data frame:
find_indicator(x = "tfr")
#> # A tibble: 2 × 6
#> indicator description `wcde-v3` wcde-…¹ wcde-…² defin…³
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 etfr Total Fertility Rate by Education projectio… projec… projec… The av…
#> 2 tfr Total Fertility Rate projectio… past-a… past-a… The av…
#> # … with abbreviated variable names ¹`wcde-v2`, ²`wcde-v1`, ³definition_latest
By default, get_wdce()
returns data for all years or
available periods or years. The filter()
function in dplyr can
be used to filter data for specific years or periods, for example:
library(tidyverse)
get_wcde(indicator = "e0",
country_name = c("Japan", "Australia")) %>%
filter(period == "2015-2020")
#> # A tibble: 0 × 6
#> # … with 6 variables: scenario <dbl>, name <chr>, country_code <dbl>,
#> # sex <chr>, period <chr>, e0 <dbl>
get_wcde(indicator = "sexratio",
country_name = c("China", "South Korea")) %>%
filter(year == 2020)
#> # A tibble: 44 × 6
#> scenario name country_code age year sexratio
#> <dbl> <chr> <dbl> <chr> <dbl> <dbl>
#> 1 2 China 156 All 2020 1.05
#> 2 2 Republic of Korea 410 All 2020 0.999
#> 3 2 China 156 0--4 2020 1.14
#> 4 2 Republic of Korea 410 0--4 2020 1.05
#> 5 2 China 156 5--9 2020 1.16
#> 6 2 Republic of Korea 410 5--9 2020 1.05
#> 7 2 China 156 10--14 2020 1.17
#> 8 2 Republic of Korea 410 10--14 2020 1.07
#> 9 2 China 156 15--19 2020 1.17
#> 10 2 Republic of Korea 410 15--19 2020 1.08
#> # … with 34 more rows
Past data is only available for selected indicators. These can be viewed using the version column:
wic_indicators %>%
filter(`wcde-v2` == "past-available") %>%
select(1:2)
#> # A tibble: 28 × 2
#> indicator description
#> <chr> <chr>
#> 1 asfr Age-Specific Fertility Rate
#> 2 assr Age-Specific Survival Ratio
#> 3 bmys Mean Years of Schooling by Broad Age
#> 4 bpop Population Size by Broad Age (000's)
#> 5 bprop Educational Attainment Distribution by Broad Age
#> 6 cbr Crude Birth Rate
#> 7 cdr Crude Death Rate
#> 8 e0 Life Expectancy at Birth
#> 9 epop Population Size by Education (000's)
#> 10 ggapedu15 Gender Gap in Educational Attainment (15+)
#> # … with 18 more rows
The filter()
function can also be used to filter
specific indicators to specific age, sex or education groups
get_wcde(indicator = "sexratio",
country_name = c("China", "South Korea")) %>%
filter(year == 2020,
age == "All")
#> # A tibble: 2 × 6
#> scenario name country_code age year sexratio
#> <dbl> <chr> <dbl> <chr> <dbl> <dbl>
#> 1 2 China 156 All 2020 1.05
#> 2 2 Republic of Korea 410 All 2020 0.999
Country names are guessed using the countrycode package.
get_wcde(indicator = "tfr",
country_name = c("U.A.E", "Espania", "Österreich"))
#> # A tibble: 48 × 5
#> scenario name country_code period tfr
#> <dbl> <chr> <dbl> <chr> <dbl>
#> 1 2 United Arab Emirates 784 2020-2025 1.35
#> 2 2 Spain 724 2020-2025 1.19
#> 3 2 Austria 40 2020-2025 1.45
#> 4 2 United Arab Emirates 784 2025-2030 1.39
#> 5 2 Spain 724 2025-2030 1.25
#> 6 2 Austria 40 2025-2030 1.48
#> 7 2 United Arab Emirates 784 2030-2035 1.41
#> 8 2 Spain 724 2030-2035 1.32
#> 9 2 Austria 40 2030-2035 1.51
#> 10 2 United Arab Emirates 784 2035-2040 1.44
#> # … with 38 more rows
The get_wcde()
functions accepts ISO alpha numeric codes
for countries via the country_code
argument:
get_wcde(indicator = "etfr", country_code = c(44, 100))
#> # A tibble: 192 × 6
#> scenario name country_code education period etfr
#> <dbl> <chr> <dbl> <chr> <chr> <dbl>
#> 1 2 Bahamas 44 No Education 2020-2025 2.16
#> 2 2 Bulgaria 100 No Education 2020-2025 1.86
#> 3 2 Bahamas 44 Incomplete Primary 2020-2025 2.16
#> 4 2 Bulgaria 100 Incomplete Primary 2020-2025 1.86
#> 5 2 Bahamas 44 Primary 2020-2025 2.16
#> 6 2 Bulgaria 100 Primary 2020-2025 1.86
#> 7 2 Bahamas 44 Lower Secondary 2020-2025 1.71
#> 8 2 Bulgaria 100 Lower Secondary 2020-2025 1.86
#> 9 2 Bahamas 44 Upper Secondary 2020-2025 1.43
#> 10 2 Bulgaria 100 Upper Secondary 2020-2025 1.51
#> # … with 182 more rows
A full list of available countries and region aggregates, and their
codes, can be found in the wic_locations
data frame.
wic_locations
#> # A tibble: 232 × 8
#> name isono conti…¹ region dim wcde-…² wcde-…³ wcde-…⁴
#> <chr> <dbl> <chr> <chr> <chr> <lgl> <lgl> <lgl>
#> 1 World 900 <NA> <NA> area TRUE TRUE TRUE
#> 2 Africa 903 <NA> <NA> area TRUE TRUE TRUE
#> 3 Asia 935 <NA> <NA> area TRUE TRUE TRUE
#> 4 Europe 908 <NA> <NA> area TRUE TRUE TRUE
#> 5 Latin America and the Car… 904 <NA> <NA> area TRUE TRUE TRUE
#> 6 Northern America 905 <NA> <NA> area TRUE TRUE TRUE
#> 7 Oceania 909 <NA> <NA> area TRUE TRUE TRUE
#> 8 Afghanistan 4 Asia South… coun… TRUE TRUE TRUE
#> 9 Albania 8 Europe South… coun… TRUE TRUE TRUE
#> 10 Algeria 12 Africa North… coun… TRUE TRUE TRUE
#> # … with 222 more rows, and abbreviated variable names ¹continent, ²`wcde-v3`,
#> # ³`wcde-v2`, ⁴`wcde-v1`
By default get_wcde()
returns data for Medium (SSP2)
scenario. Results for different SSP scenarios can be returned by passing
a different (or multiple) scenario values to the scenario
argument in get_data()
.
get_wcde(indicator = "growth",
country_name = c("India", "China"),
scenario = c(1:3, 22, 23)) %>%
filter(period == "2095-2100")
#> # A tibble: 10 × 5
#> scenario name country_code period growth
#> <dbl> <chr> <dbl> <chr> <dbl>
#> 1 1 India 356 2095-2100 -1.05
#> 2 1 China 156 2095-2100 -1.11
#> 3 2 India 356 2095-2100 -0.545
#> 4 2 China 156 2095-2100 -1.03
#> 5 3 India 356 2095-2100 0.170
#> 6 3 China 156 2095-2100 -0.428
#> 7 22 India 356 2095-2100 -0.545
#> 8 22 China 156 2095-2100 -1.03
#> 9 23 India 356 2095-2100 -0.545
#> 10 23 China 156 2095-2100 -1.03
Set include_scenario_names = TRUE
to include a columns
with the full names of the scenarios
get_wcde(indicator = "tfr",
country_name = c("Kenya", "Nigeria", "Algeria"),
scenario = 1:3,
include_scenario_names = TRUE) %>%
filter(period == "2045-2050")
#> # A tibble: 9 × 7
#> scenario scenario_name scenario_abb name countr…¹ period tfr
#> <dbl> <chr> <chr> <chr> <dbl> <chr> <dbl>
#> 1 1 Rapid Development (SSP1) SSP1 Kenya 404 2045-… 1.62
#> 2 1 Rapid Development (SSP1) SSP1 Nigeria 566 2045-… 2.62
#> 3 1 Rapid Development (SSP1) SSP1 Algeria 12 2045-… 1.52
#> 4 2 Medium (SSP2) SSP2 Kenya 404 2045-… 2.32
#> 5 2 Medium (SSP2) SSP2 Nigeria 566 2045-… 3.75
#> 6 2 Medium (SSP2) SSP2 Algeria 12 2045-… 2.04
#> 7 3 Stalled Development (SSP3) SSP3 Kenya 404 2045-… 3.02
#> 8 3 Stalled Development (SSP3) SSP3 Nigeria 566 2045-… 4.83
#> 9 3 Stalled Development (SSP3) SSP3 Algeria 12 2045-… 2.66
#> # … with abbreviated variable name ¹country_code
Additional details of the pathways for each scenario numeric code can
be found in the wic_scenarios
object. Further background
and links to the corresponding literature are provided in the Data Explorer
wic_scenarios
#> # A tibble: 9 × 6
#> scenario_name scena…¹ scena…² wcde-…³ wcde-…⁴ wcde-…⁵
#> <chr> <dbl> <chr> <lgl> <lgl> <lgl>
#> 1 Rapid Development (SSP1) 1 SSP1 TRUE TRUE TRUE
#> 2 Medium (SSP2) 2 SSP2 TRUE TRUE TRUE
#> 3 Stalled Development (SSP3) 3 SSP3 TRUE TRUE TRUE
#> 4 Inequality (SSP4) 4 SSP4 TRUE FALSE TRUE
#> 5 Conventional Development (SSP5) 5 SSP5 TRUE FALSE TRUE
#> 6 Medium - Zero Migration (SSP2-ZM) 22 SSP2-ZM TRUE TRUE FALSE
#> 7 Medium - Double Migration (SSP2-DM) 23 SSP2-DM TRUE TRUE FALSE
#> 8 Medium - Constant Enrolment Rate (SSP… 20 SSP2-C… FALSE FALSE TRUE
#> 9 Medium - Fast Track Education (SSP2-F… 21 SSP2-FT FALSE FALSE TRUE
#> # … with abbreviated variable names ¹scenario, ²scenario_abb, ³`wcde-v3`,
#> # ⁴`wcde-v2`, ⁵`wcde-v1`
Data for all countries can be obtained by not setting
country_name
or country_code
get_wcde(indicator = "mage")
#> # A tibble: 3,876 × 5
#> scenario name country_code year mage
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 2 Bulgaria 100 2020 40.1
#> 2 2 Myanmar 104 2020 24.6
#> 3 2 Burundi 108 2020 11.5
#> 4 2 Belarus 112 2020 35.9
#> 5 2 Cambodia 116 2020 22.0
#> 6 2 Algeria 12 2020 23.5
#> 7 2 Cameroon 120 2020 13.5
#> 8 2 Canada 124 2020 35.9
#> 9 2 Cape Verde 132 2020 21.8
#> 10 2 Central African Republic 140 2020 10.7
#> # … with 3,866 more rows
The get_wdce()
function needs to be called multiple
times to download multiple indicators. This can be done using the
map()
function in purrr
mi <- tibble(ind = c("odr", "nirate", "ggapedu25")) %>%
mutate(d = map(.x = ind, .f = ~get_wcde(indicator = .x)))
mi
#> # A tibble: 3 × 2
#> ind d
#> <chr> <list>
#> 1 odr <tibble [3,876 × 5]>
#> 2 nirate <tibble [3,648 × 5]>
#> 3 ggapedu25 <tibble [23,256 × 6]>
mi %>%
filter(ind == "odr") %>%
select(-ind) %>%
unnest(cols = d)
#> # A tibble: 3,876 × 5
#> scenario name country_code year odr
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 2 Bulgaria 100 2020 0.347
#> 2 2 Myanmar 104 2020 0.0930
#> 3 2 Burundi 108 2020 0.0486
#> 4 2 Belarus 112 2020 0.246
#> 5 2 Cambodia 116 2020 0.0790
#> 6 2 Algeria 12 2020 0.0937
#> 7 2 Cameroon 120 2020 0.0505
#> 8 2 Canada 124 2020 0.268
#> 9 2 Cape Verde 132 2020 0.0792
#> 10 2 Central African Republic 140 2020 0.0501
#> # … with 3,866 more rows
mi %>%
filter(ind == "nirate") %>%
select(-ind) %>%
unnest(cols = d)
#> # A tibble: 3,648 × 5
#> scenario name country_code period nirate
#> <dbl> <chr> <dbl> <chr> <dbl>
#> 1 2 Bulgaria 100 2020-2025 -10.7
#> 2 2 Myanmar 104 2020-2025 7.46
#> 3 2 Burundi 108 2020-2025 28.0
#> 4 2 Belarus 112 2020-2025 -5.95
#> 5 2 Cambodia 116 2020-2025 12.8
#> 6 2 Algeria 12 2020-2025 17.3
#> 7 2 Cameroon 120 2020-2025 27.0
#> 8 2 Canada 124 2020-2025 1.58
#> 9 2 Cape Verde 132 2020-2025 11.8
#> 10 2 Central African Republic 140 2020-2025 33.4
#> # … with 3,638 more rows
mi %>%
filter(ind == "ggapedu25") %>%
select(-ind) %>%
unnest(cols = d)
#> # A tibble: 23,256 × 6
#> scenario name country_code year education ggapedu25
#> <dbl> <chr> <dbl> <dbl> <chr> <dbl>
#> 1 2 Bulgaria 100 2020 No Education -4.63e- 3
#> 2 2 Myanmar 104 2020 No Education -4.30e- 2
#> 3 2 Burundi 108 2020 No Education 1.47e- 1
#> 4 2 Belarus 112 2020 No Education -5.76e- 4
#> 5 2 Cambodia 116 2020 No Education -1.19e- 1
#> 6 2 Algeria 12 2020 No Education -1.63e- 1
#> 7 2 Cameroon 120 2020 No Education -1.02e- 1
#> 8 2 Canada 124 2020 No Education 1.36e-20
#> 9 2 Cape Verde 132 2020 No Education 2.61e- 2
#> 10 2 Central African Republic 140 2020 No Education -3.13e- 1
#> # … with 23,246 more rows
Previous versions of projections from the Wittgenstein Centre for
Demography are available using the version
argument in
get_wdce()
to "wcde-v1"
or "wcde-v2"
,
where "wcde-v3"
is used as the default since 2024.
get_wcde(indicator = "etfr",
country_name = c("Brazil", "Albania"),
version = "wcde-v2")
#> # A tibble: 204 × 6
#> scenario name country_code education period etfr
#> <dbl> <chr> <dbl> <chr> <chr> <dbl>
#> 1 2 Brazil 76 No Education 2015-2020 2.47
#> 2 2 Albania 8 No Education 2015-2020 1.88
#> 3 2 Brazil 76 Incomplete Primary 2015-2020 2.47
#> 4 2 Albania 8 Incomplete Primary 2015-2020 1.88
#> 5 2 Brazil 76 Primary 2015-2020 2.47
#> 6 2 Albania 8 Primary 2015-2020 1.88
#> 7 2 Brazil 76 Lower Secondary 2015-2020 1.89
#> 8 2 Albania 8 Lower Secondary 2015-2020 1.9
#> 9 2 Brazil 76 Upper Secondary 2015-2020 1.37
#> 10 2 Albania 8 Upper Secondary 2015-2020 1.57
#> # … with 194 more rows
Note, not all indicators and scenarios are available in all versions
- see the the wic_indicators
and wic_scenarios
objects for further details (above).
If you have trouble with connecting to the IIASA server you can try
back versions using the server
option in
get_wcde()
, which can be set to github
,
1&1
.
get_wcde(indicator = "etfr",
country_name = c("Brazil", "Albania"),
version = "wcde-v2", server = "github")
#> # A tibble: 204 × 6
#> scenario name country_code education period etfr
#> <dbl> <chr> <dbl> <chr> <chr> <dbl>
#> 1 2 Brazil 76 No Education 2015-2020 2.47
#> 2 2 Albania 8 No Education 2015-2020 1.88
#> 3 2 Brazil 76 Incomplete Primary 2015-2020 2.47
#> 4 2 Albania 8 Incomplete Primary 2015-2020 1.88
#> 5 2 Brazil 76 Primary 2015-2020 2.47
#> 6 2 Albania 8 Primary 2015-2020 1.88
#> 7 2 Brazil 76 Lower Secondary 2015-2020 1.89
#> 8 2 Albania 8 Lower Secondary 2015-2020 1.9
#> 9 2 Brazil 76 Upper Secondary 2015-2020 1.37
#> 10 2 Albania 8 Upper Secondary 2015-2020 1.57
#> # … with 194 more rows
You may also set server = search-available
to search
through the three possible data location to download the data whereever
it is available.
Population data for a range of age-sex-educational attainment
combinations can be obtained by setting indicator = "pop"
in get_wcde()
and specifying a pop_age
,
pop_sex
and pop_edu
arguments. By default each
of the three population breakdown arguments are set to “total”
get_wcde(indicator = "pop", country_name = "India")
#> # A tibble: 17 × 5
#> scenario name country_code year pop
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 2 India 356 2020 1389966.
#> 2 2 India 356 2025 1445480.
#> 3 2 India 356 2030 1501725.
#> 4 2 India 356 2035 1548067.
#> 5 2 India 356 2040 1583687.
#> 6 2 India 356 2045 1607695.
#> 7 2 India 356 2050 1620358.
#> 8 2 India 356 2055 1625062.
#> 9 2 India 356 2060 1622572.
#> 10 2 India 356 2065 1612143.
#> 11 2 India 356 2070 1594676.
#> 12 2 India 356 2075 1570024.
#> 13 2 India 356 2080 1539493.
#> 14 2 India 356 2085 1504981.
#> 15 2 India 356 2090 1468261.
#> 16 2 India 356 2095 1430167.
#> 17 2 India 356 2100 1391608.
The pop_age
argument can be set to all
to
get population data broken down in five-year age groups. The
pop_sex
argument can be set to both
to get
population data broken down into female and male groups. The
pop_edu
argument can be set to four
,
six
or eight
to get population data broken
down into education categorizations with different levels of detail.
get_wcde(indicator = "pop", country_code = 900, pop_edu = "four")
#> # A tibble: 85 × 6
#> scenario name country_code year education pop
#> <dbl> <fct> <dbl> <dbl> <fct> <dbl>
#> 1 2 World 900 2020 Under 15 2012336.
#> 2 2 World 900 2020 No Education 756762.
#> 3 2 World 900 2020 Primary 1208824.
#> 4 2 World 900 2020 Secondary 2883491.
#> 5 2 World 900 2020 Post Secondary 943560.
#> 6 2 World 900 2025 Under 15 2002922.
#> 7 2 World 900 2025 No Education 724867.
#> 8 2 World 900 2025 Primary 1212577.
#> 9 2 World 900 2025 Secondary 3114657.
#> 10 2 World 900 2025 Post Secondary 1096623.
#> # … with 75 more rows
The population breakdown arguments can be used in combination to provide further breakdowns, for example sex and education specific population totals
get_wcde(indicator = "pop", country_code = 900, pop_edu = "six", pop_sex = "both")
#> # A tibble: 238 × 7
#> scenario name country_code year sex education pop
#> <dbl> <fct> <dbl> <dbl> <fct> <fct> <dbl>
#> 1 2 World 900 2020 Male Under 15 1037900.
#> 2 2 World 900 2020 Male No Education 308168.
#> 3 2 World 900 2020 Male Incomplete Primary 197055.
#> 4 2 World 900 2020 Male Primary 426676.
#> 5 2 World 900 2020 Male Lower Secondary 623289.
#> 6 2 World 900 2020 Male Upper Secondary 848609.
#> 7 2 World 900 2020 Male Post Secondary 484476.
#> 8 2 World 900 2020 Female Under 15 974436.
#> 9 2 World 900 2020 Female No Education 448594.
#> 10 2 World 900 2020 Female Incomplete Primary 186376.
#> # … with 228 more rows
The full age-sex-education specific data can also be obtained by
setting indicator = "epop"
in get_wcde()
.
Create population pyramids by setting male population values to negative equivalent to allow for divergent columns from the y axis.
w <- get_wcde(indicator = "pop", country_code = 900,
pop_age = "all", pop_sex = "both", pop_edu = "four",
version = "wcde-v2")
w
#> # A tibble: 6,510 × 8
#> scenario name country_code year age sex education pop
#> <dbl> <fct> <dbl> <int> <fct> <fct> <fct> <dbl>
#> 1 2 World 900 1950 0--4 Male Under 15 172362.
#> 2 2 World 900 1950 0--4 Male No Education 0
#> 3 2 World 900 1950 0--4 Male Primary 0
#> 4 2 World 900 1950 0--4 Male Secondary 0
#> 5 2 World 900 1950 0--4 Male Post Secondary 0
#> 6 2 World 900 1950 0--4 Female Under 15 166026.
#> 7 2 World 900 1950 0--4 Female No Education 0
#> 8 2 World 900 1950 0--4 Female Primary 0
#> 9 2 World 900 1950 0--4 Female Secondary 0
#> 10 2 World 900 1950 0--4 Female Post Secondary 0
#> # … with 6,500 more rows
w <- w %>%
mutate(pop_pm = ifelse(test = sex == "Male", yes = -pop, no = pop),
pop_pm = pop_pm/1e3)
w
#> # A tibble: 6,510 × 9
#> scenario name country_code year age sex education pop pop_pm
#> <dbl> <fct> <dbl> <int> <fct> <fct> <fct> <dbl> <dbl>
#> 1 2 World 900 1950 0--4 Male Under 15 172362. -172.
#> 2 2 World 900 1950 0--4 Male No Education 0 0
#> 3 2 World 900 1950 0--4 Male Primary 0 0
#> 4 2 World 900 1950 0--4 Male Secondary 0 0
#> 5 2 World 900 1950 0--4 Male Post Secondary 0 0
#> 6 2 World 900 1950 0--4 Female Under 15 166026. 166.
#> 7 2 World 900 1950 0--4 Female No Education 0 0
#> 8 2 World 900 1950 0--4 Female Primary 0 0
#> 9 2 World 900 1950 0--4 Female Secondary 0 0
#> 10 2 World 900 1950 0--4 Female Post Secondary 0 0
#> # … with 6,500 more rows
Use standard ggplot code to create population pyramid with
scale_x_symmetric()
from the lemon
package to allow for equal male and female x-axiswic_col4
object in the wcde
package which contains the names of the colours used in the Wittgenstein
Centre Human Capital Data Explorer Data Explorer.Note wic_col6
and wic_col8
objects also
exist for equivalent plots of population data objects with corresponding
numbers of categories of education.
library(lemon)
w %>%
filter(year == 2020) %>%
ggplot(mapping = aes(x = pop_pm, y = age, fill = fct_rev(education))) +
geom_col() +
geom_vline(xintercept = 0, colour = "black") +
scale_x_symmetric(labels = abs) +
scale_fill_manual(values = wic_col4, name = "Education") +
labs(x = "Population (millions)", y = "Age") +
theme_bw()
Add male and female labels on the x-axis by
geom_blank()
to allow for equal x-axis and
additional space at the end of largest columns.w <- w %>%
mutate(pop_max = ifelse(sex == "Male", -max(pop/1e3), max(pop/1e3)))
w %>%
filter(year == 2020) %>%
ggplot(mapping = aes(x = pop_pm, y = age, fill = fct_rev(education))) +
geom_col() +
geom_vline(xintercept = 0, colour = "black") +
scale_x_continuous(labels = abs, expand = c(0, 0)) +
scale_fill_manual(values = wic_col4, name = "Education") +
labs(x = "Population (millions)", y = "Age") +
facet_wrap(facets = "sex", scales = "free_x", strip.position = "bottom") +
geom_blank(mapping = aes(x = pop_max * 1.1)) +
theme(panel.spacing.x = unit(0, "pt"),
strip.placement = "outside",
strip.background = element_rect(fill = "transparent"),
strip.text.x = element_text(margin = margin( b = 0, t = 0)))
Animate the pyramid through the past data and projection periods
using the transition_time()
function in the gganimate
package
library(gganimate)
ggplot(data = w,
mapping = aes(x = pop_pm, y = age, fill = fct_rev(education))) +
geom_col() +
geom_vline(xintercept = 0, colour = "black") +
scale_x_continuous(labels = abs, expand = c(0, 0)) +
scale_fill_manual(values = wic_col4, name = "Education") +
facet_wrap(facets = "sex", scales = "free_x", strip.position = "bottom") +
geom_blank(mapping = aes(x = pop_max * 1.1)) +
theme(panel.spacing.x = unit(0, "pt"),
strip.placement = "outside",
strip.background = element_rect(fill = "transparent"),
strip.text.x = element_text(margin = margin(b = 0, t = 0))) +
transition_time(time = year) +
labs(x = "Population (millions)", y = "Age",
title = 'SSP2 World Population {round(frame_time)}')