Creating Customized Indicators for surveyPrev

Qianyu Dong and Zehang Richard Li

2024-03-19

In this vignette, we provide an overview of the list of DHS indicators currently implemented in surveyPrev and the process to add new indicators or create customized indicators.

First we load the surveyPrev package, and any packages used in the customized indicator processing function later. In our example, dplyr and labelled are used.

library(surveyPrev)
library(dplyr)
library(labelled)
library(kableExtra)

In order to use getDHSdata() to download the relevant DHS data directly from the DHS website, we need to

  1. register with DHS to gain data access, and
  2. set up a DHS account details in R using rdhs package, i.e.,
rdhs::set_rdhs_config(email = "your_email", project = "your_registered_DHS_project_title")

1 Built-in indicators

Currently, the surveyPrev package supports 22 indicators listed in Table 1. The list of indicators and their IDs can be found in the surveyPrevIndicators dataset.

data(surveyPrevIndicators)
head(surveyPrevIndicators)
Table 1: List of built-in indicators in the surveyPrev package.
ID alternative.ID Description
AN_ANEM_W_ANY womananemia Percentage of women aged 15-49 classified as having any anemia
AN_NUTS_W_THN Underweight (Prevalence of underweight (BMI < 18.5) women of reproductive age)
CH_DIAT_C_ORT Diarrhea treatment (Children under five with diarrhea treated with either ORS or RHF)
CH_VACC_C_BAS Children with all 8 basic vaccinations (age 12-23 or 24-35?, see DHS Statistics Guide for different definitions)
CH_VACC_C_DP1 Percentage of children 12-23 (or 24-35) months who had received DPT 1 vaccination
CH_VACC_C_DP3 DPT3 Percentage of children 12-23 (or 24-35) months who had received DPT 3 vaccination
CH_VACC_C_MSL Percentage of children 12-23 (or 24-35) months who had received MCV 1 (Measles containing vaccine)
CH_VACC_C_NON Children with no vaccinations (age 12-23 or 24-35?, see DHS Statistics Guide for different definitions)
CM_ECMR_C_NNR nmr Probability of dying in the first month of life in the five or ten years preceding the survey
CN_ANMC_C_ANY Children under five with any anemia
CN_BRFS_C_EXB Children exclusively breastfed (Prevalence of exclusive breastfeeding of children under six months of age)
CN_NUTS_C_HA2 stunting Stunting rate (Prevalence of stunted (HAZ < -2) children under five (0-59 months))
CN_NUTS_C_WH2 wasting Wasting rate (Prevalence of stunted (HAZ < -2) children under five (0-59 months))
FP_CUSA_W_MOD Modern contraceptive prevalence rate (Married women currently using any modern method of contraception)
FP_NADA_W_UNT unmet_family women with an unmet need for family planning for spacing and limiting
HA_HIVP_B_HIV HIV prevalence
ML_NETP_H_IT2 Households with at least one insecticide-treated mosquito net (ITN) for every two persons who stayed in the household the previous night
RH_ANCN_W_N4P ancvisit4+ Antenatal visits for pregnancy: 4+ visits
RH_DELA_C_SKP Assistance during delivery from a skilled provide
WS_SRCE_P_BAS Population using a basic water source
WS_TLET_H_IMP sanitation Percentage of households using an improved sanitation facility
WS_TLET_P_BAS Population with access to a basic sanitation service

These indicators above can be directly processed within surveyPrev using getDHSdata() and getDHSindicator() functions. Both the ID and alternative ID in the surveyPrevIndicators dataset can be used to retrieve the indicator. getDHSindicator() processes the raw survey data into a data.frame, where the column titled value is the indicator of interest. It also contains cluster ID, household ID, survey weight and strata information. This data format allows a svydesign object to be defined in the survey package. For example,

indicator <- "unmet_family"
year <- 2018
country <- "Zambia"
dhsData1 <- getDHSdata(country = country, indicator = indicator, year = year)
data1 <- getDHSindicator(dhsData1, indicator = indicator)
head(data1)
##   cluster householdID    v024  weight strata value
## 1       1           1 eastern 1892890  rural     1
## 2       1           2 eastern 1892890  rural     1
## 3       1           3 eastern 1892890  rural     1
## 4       1           4 eastern 1892890  rural     0
## 5       1           7 eastern 1892890  rural     0
## 6       1           9 eastern 1892890  rural     0

If the DHS download using the API fails, you may also manually download the file from the DHS website and read into R. The getDHSdata() function returns a message specifying which file is used (e.g., Individual Record file for the ANC visit example).

2 New indicators

Details on how standard DHS indicators are defined can be found in the Guide to DHS Statistics:https://dhsprogram.com/data/Guide-to-DHS-Statistics/ by searching for an indicator. Codes for creating most standard DHS indicators can be found on DHS GitHub site: https://github.com/DHSProgram/DHS-Indicators-R. The indicators are organized by chapters in the Guide to DHS Statistics.

To use surveyPrev to create a new indicator not already built into the package, we need to specify

  1. which DHS data file to download,
  2. a customized function to process the indicator from the raw HDS survey data.

2.1 DHS dataset types

The table below lists the different types of DHS data and their naming conventions in surveyPrev. You can find more details in this website

Table 2: DHS data types
Name Recode
MRdata Men’s Recode
PRdata Household Member Recode
KRdata Children’s Recode
BRdata Births Recode
CRdata Couples’ Recode
HRdata Household Recode
IRdata Individual Recode

2.2 Function to process the indicator

Let’s take Current use of any modern method of contraception (all women) as an example. For users familiar with the standard indicators defined by the DHS Data Indicator API, the indicator ID is “FP_CUSA_W_MOD”. We will go through the steps to create the customized function below.

Step 1: Search indicator ID or key words in the Guide to DHS Statistics, and then identify which chapter it is in.

For our example, we can either search “FP_CUSA_W_MOD”, or “contraceptive”, and it is in chapter 7: family planning.

Screenshot of Step 1(a): Searching for indicator ID or key words.

Figure 1: Screenshot of Step 1(a): Searching for indicator ID or key words.

Screenshot of Step 1(b): Identifying which chapter the indicator is from.

Figure 2: Screenshot of Step 1(b): Identifying which chapter the indicator is from.

Step 2: We can download IndicatorList.xlsx from DHS GitHub site, and search keyword again in the the corresponding chapter to find out

  1. which DHS data recode is used to create this indicator
  2. which file contains the code to create this indicator in the DHS GitHub repository.
  3. what the corresponding variable name is in the DHS GitHub repository.

For our example, since we are looking up for the indicator of all women currently using any modern method of contraception, we identify the cell “currently use any modern method”, and find out that the code to process this indicator is in the FP_USE.do file and we need IRdata (Individual Recode). We also identify the variable name used in the R codes on the Github repository is “fp_cruse_mod”. We will all three pieces of information in the next step to find the codes processing the indicator.

Screenshot of Step 2: Finding file name and recode name.

Figure 3: Screenshot of Step 2: Finding file name and recode name.

Step 3: In the Github repository, we find the folder for Chapter 7, the file FP_USE.R, and search “fp_cruse_mod” to find the following chunk of code script.

Screenshot of Step 3: Finding code.

Figure 4: Screenshot of Step 3: Finding code.

We extract the following chunk of codes for this indicator, which takes the IR data, and perform a few steps of data cleaning.

# Currently use modern method
IRdata <- IRdata %>%
    mutate(fp_cruse_mod = ifelse(v313 == 3, 1, 0)) %>%
    set_value_labels(fp_cruse_mod = c(yes = 1, no = 0)) %>%
    set_variable_labels(fp_cruse_mod = "Currently used any modern method")

We can use the code chunk to define a new function to be used in the getDHSindicator function. The self-defined function should:

  • Use Recode file as input and return the same data.frame.
  • Change the name of your variable into "value" in the end.

The example below creates a fp_cruse_mod function for “Current use of any modern method of contraception (all women)”, which can be recognized by the getDHSindicator function later.

fp_cruse_mod <- function(RData) {
    IRdata <- RData %>%
        mutate(fp_cruse_mod = ifelse(v313 == 3, 1, 0)) %>%
        set_value_labels(fp_cruse_mod = c(yes = 1, no = 0)) %>%
        set_variable_labels(fp_cruse_mod = "Currently used any modern method")
    colnames(IRdata)[colnames(IRdata) == "fp_cruse_mod"] <- "value"
    return(IRdata)
}

Finally, after we create this function fp_cruse_mod, We can create the indicator by

  1. Use getDHSdata function to downloading relevant DHS datasets using the identified DHS data type in Step 3. In this example we specify Recode = "Individual Recode" and indicator = NULL, The recode be one of the recode lists in table 1. to download the dataset. We can also set Recode = NULL, in which case all available DHS data types will be downloaded.
  2. Use the function FUN = fp_cruse_mod in the call of getDHSindicator to process the indicator according to the customized function.

Altogether, the following codes creates the dataset of the processed indicator.

year <- 2018
country <- "Zambia"
Recode <- "Individual Recode"
dhsData <- getDHSdata(country = country, indicator = NULL, Recode = Recode, year = year)
data <- getDHSindicator(dhsData, indicator = NULL, FUN = fp_cruse_mod)
head(data)
##   cluster householdID    v024  weight strata value
## 1       1           1 eastern 1892890  rural     0
## 2       1           2 eastern 1892890  rural     0
## 3       1           3 eastern 1892890  rural     0
## 4       1           4 eastern 1892890  rural     1
## 5       1           7 eastern 1892890  rural     1
## 6       1           9 eastern 1892890  rural     1

3 Multiple dataset

Some indicators, such as HIV prevalence, require additional data files. Take, HIV prevalence among general population, as an example. This is a built-in indicator with the ID “HA_HIVP_B_HIV”. It needs three Recode files: Individual, Men’s and HIV Test Results, so that the output of getDHSdata() and the input for getDHSindicator() is a list of three data files.

HIVdhsData <- getDHSdata(country = country, indicator = NULL, Recode = c("Individual Recode",
    "Men's Recode", "HIV Test Results Recode"), year = year)