In this vignette, we provide an overview of the list of DHS indicators currently implemented in surveyPrev and the process to add new indicators or create customized indicators.
First we load the surveyPrev package, and any packages used in the customized indicator processing function later. In our example, dplyr and labelled are used.
library(surveyPrev)
library(dplyr)
library(labelled)
library(kableExtra)
In order to use getDHSdata()
to download the relevant DHS data directly from the DHS website, we need to
- register with DHS to gain data access, and
- set up a DHS account details in R using
rdhs
package, i.e.,
::set_rdhs_config(email = "your_email", project = "your_registered_DHS_project_title") rdhs
1 Built-in indicators
Currently, the surveyPrev
package supports 22 indicators listed in Table 1. The list of indicators and their IDs can be found in the surveyPrevIndicators
dataset.
data(surveyPrevIndicators)
head(surveyPrevIndicators)
ID | alternative.ID | Description |
---|---|---|
AN_ANEM_W_ANY | womananemia | Percentage of women aged 15-49 classified as having any anemia |
AN_NUTS_W_THN | Underweight (Prevalence of underweight (BMI < 18.5) women of reproductive age) | |
CH_DIAT_C_ORT | Diarrhea treatment (Children under five with diarrhea treated with either ORS or RHF) | |
CH_VACC_C_BAS | Children with all 8 basic vaccinations (age 12-23 or 24-35?, see DHS Statistics Guide for different definitions) | |
CH_VACC_C_DP1 | Percentage of children 12-23 (or 24-35) months who had received DPT 1 vaccination | |
CH_VACC_C_DP3 | DPT3 | Percentage of children 12-23 (or 24-35) months who had received DPT 3 vaccination |
CH_VACC_C_MSL | Percentage of children 12-23 (or 24-35) months who had received MCV 1 (Measles containing vaccine) | |
CH_VACC_C_NON | Children with no vaccinations (age 12-23 or 24-35?, see DHS Statistics Guide for different definitions) | |
CM_ECMR_C_NNR | nmr | Probability of dying in the first month of life in the five or ten years preceding the survey |
CN_ANMC_C_ANY | Children under five with any anemia | |
CN_BRFS_C_EXB | Children exclusively breastfed (Prevalence of exclusive breastfeeding of children under six months of age) | |
CN_NUTS_C_HA2 | stunting | Stunting rate (Prevalence of stunted (HAZ < -2) children under five (0-59 months)) |
CN_NUTS_C_WH2 | wasting | Wasting rate (Prevalence of stunted (HAZ < -2) children under five (0-59 months)) |
FP_CUSA_W_MOD | Modern contraceptive prevalence rate (Married women currently using any modern method of contraception) | |
FP_NADA_W_UNT | unmet_family | women with an unmet need for family planning for spacing and limiting |
HA_HIVP_B_HIV | HIV prevalence | |
ML_NETP_H_IT2 | Households with at least one insecticide-treated mosquito net (ITN) for every two persons who stayed in the household the previous night | |
RH_ANCN_W_N4P | ancvisit4+ | Antenatal visits for pregnancy: 4+ visits |
RH_DELA_C_SKP | Assistance during delivery from a skilled provide | |
WS_SRCE_P_BAS | Population using a basic water source | |
WS_TLET_H_IMP | sanitation | Percentage of households using an improved sanitation facility |
WS_TLET_P_BAS | Population with access to a basic sanitation service |
These indicators above can be directly processed within surveyPrev using getDHSdata()
and getDHSindicator()
functions. Both the ID and alternative ID in the surveyPrevIndicators
dataset can be used to retrieve the indicator. getDHSindicator()
processes the raw survey data into a data.frame
, where the column titled value
is the indicator of interest. It also contains cluster ID, household ID, survey weight and strata information. This data format allows a svydesign
object to be defined in the survey
package. For example,
<- "unmet_family"
indicator <- 2018
year <- "Zambia"
country <- getDHSdata(country = country, indicator = indicator, year = year)
dhsData1 <- getDHSindicator(dhsData1, indicator = indicator)
data1 head(data1)
## cluster householdID v024 weight strata value
## 1 1 1 eastern 1892890 rural 1
## 2 1 2 eastern 1892890 rural 1
## 3 1 3 eastern 1892890 rural 1
## 4 1 4 eastern 1892890 rural 0
## 5 1 7 eastern 1892890 rural 0
## 6 1 9 eastern 1892890 rural 0
If the DHS download using the API fails, you may also manually download the file from the DHS website and read into R. The getDHSdata()
function returns a message specifying which file is used (e.g., Individual Record file for the ANC visit example).
2 New indicators
Details on how standard DHS indicators are defined can be found in the Guide to DHS Statistics:https://dhsprogram.com/data/Guide-to-DHS-Statistics/ by searching for an indicator. Codes for creating most standard DHS indicators can be found on DHS GitHub site: https://github.com/DHSProgram/DHS-Indicators-R. The indicators are organized by chapters in the Guide to DHS Statistics.
To use surveyPrev to create a new indicator not already built into the package, we need to specify
- which DHS data file to download,
- a customized function to process the indicator from the raw HDS survey data.
2.1 DHS dataset types
The table below lists the different types of DHS data and their naming conventions in surveyPrev. You can find more details in this website
Name | Recode |
---|---|
MRdata | Men’s Recode |
PRdata | Household Member Recode |
KRdata | Children’s Recode |
BRdata | Births Recode |
CRdata | Couples’ Recode |
HRdata | Household Recode |
IRdata | Individual Recode |
2.2 Function to process the indicator
Let’s take Current use of any modern method of contraception (all women) as an example. For users familiar with the standard indicators defined by the DHS Data Indicator API, the indicator ID is “FP_CUSA_W_MOD”. We will go through the steps to create the customized function below.
Step 1: Search indicator ID or key words in the Guide to DHS Statistics, and then identify which chapter it is in.
For our example, we can either search “FP_CUSA_W_MOD”, or “contraceptive”, and it is in chapter 7: family planning.
Step 2: We can download IndicatorList.xlsx from DHS GitHub site, and search keyword again in the the corresponding chapter to find out
- which DHS data recode is used to create this indicator
- which file contains the code to create this indicator in the DHS GitHub repository.
- what the corresponding variable name is in the DHS GitHub repository.
For our example, since we are looking up for the indicator of all women currently using any modern method of contraception, we identify the cell “currently use any modern method”, and find out that the code to process this indicator is in the FP_USE.do file and we need IRdata (Individual Recode). We also identify the variable name used in the R codes on the Github repository is “fp_cruse_mod”. We will all three pieces of information in the next step to find the codes processing the indicator.
Step 3: In the Github repository, we find the folder for Chapter 7, the file FP_USE.R, and search “fp_cruse_mod” to find the following chunk of code script.
We extract the following chunk of codes for this indicator, which takes the IR data, and perform a few steps of data cleaning.
# Currently use modern method
<- IRdata %>%
IRdata mutate(fp_cruse_mod = ifelse(v313 == 3, 1, 0)) %>%
set_value_labels(fp_cruse_mod = c(yes = 1, no = 0)) %>%
set_variable_labels(fp_cruse_mod = "Currently used any modern method")
We can use the code chunk to define a new function to be used in the getDHSindicator
function. The self-defined function should:
- Use Recode file as input and return the same data.frame.
- Change the name of your variable into
"value"
in the end.
The example below creates a fp_cruse_mod
function for “Current use of any modern method of contraception (all women)”, which can be recognized by the getDHSindicator
function later.
<- function(RData) {
fp_cruse_mod <- RData %>%
IRdata mutate(fp_cruse_mod = ifelse(v313 == 3, 1, 0)) %>%
set_value_labels(fp_cruse_mod = c(yes = 1, no = 0)) %>%
set_variable_labels(fp_cruse_mod = "Currently used any modern method")
colnames(IRdata)[colnames(IRdata) == "fp_cruse_mod"] <- "value"
return(IRdata)
}
Finally, after we create this function fp_cruse_mod
, We can create the indicator by
- Use
getDHSdata
function to downloading relevant DHS datasets using the identified DHS data type in Step 3. In this example we specifyRecode = "Individual Recode"
andindicator = NULL
, The recode be one of the recode lists in table 1. to download the dataset. We can also setRecode = NULL
, in which case all available DHS data types will be downloaded. - Use the function
FUN = fp_cruse_mod
in the call ofgetDHSindicator
to process the indicator according to the customized function.
Altogether, the following codes creates the dataset of the processed indicator.
<- 2018
year <- "Zambia"
country <- "Individual Recode"
Recode <- getDHSdata(country = country, indicator = NULL, Recode = Recode, year = year)
dhsData <- getDHSindicator(dhsData, indicator = NULL, FUN = fp_cruse_mod)
data head(data)
## cluster householdID v024 weight strata value
## 1 1 1 eastern 1892890 rural 0
## 2 1 2 eastern 1892890 rural 0
## 3 1 3 eastern 1892890 rural 0
## 4 1 4 eastern 1892890 rural 1
## 5 1 7 eastern 1892890 rural 1
## 6 1 9 eastern 1892890 rural 1
3 Multiple dataset
Some indicators, such as HIV prevalence, require additional data files. Take, HIV prevalence among general population, as an example. This is a built-in indicator with the ID “HA_HIVP_B_HIV”. It needs three Recode files: Individual, Men’s and HIV Test Results, so that the output of getDHSdata()
and the input for getDHSindicator()
is a list of three data files.
<- getDHSdata(country = country, indicator = NULL, Recode = c("Individual Recode",
HIVdhsData "Men's Recode", "HIV Test Results Recode"), year = year)