The goal of retroharmonize
is to facilitate
retrospective (ex-post) harmonization of data, particularly survey data,
in a reproducible manner. The package provides tools for organizing the
metadata, standardizing the coding of variables, variable names and
value labels, including missing values, and for documenting all
transformations, with the help of comprehensive S3 classes.
Currently being generalized from problems solved in the not yet released eurobarometer package (doi.)
The package is available on CRAN:
install.packages("retroharmonize")
The development version has new features with the
create_codebook()
functions. It can be installed from GitHub with:
# install.packages("devtools")
::install_github("rOpenGov/retroharmonize") devtools
You can download the manual in PDF for the 0.2.0 release.
The aim of retroharmonize
is to provide tools for
reproducible retrospective (ex-post) harmonization of datasets that
contain variables measuring the same concepts but coded in different
ways. Ex-post data harmonization enables better use of existing data and
creates new research opportunities. For example, harmonizing data from
different countries enables cross-national comparisons, while merging
data from different time points makes it possible to track changes over
time.
Retrospective data harmonization is associated with challenges
including conceptual issues with establishing equivalence and
comparability, practical complications of having to standardize the
naming and coding of variables, technical difficulties with merging data
stored in different formats, and the need to document a large number of
data transformations. The retroharmonize
package assists
with the latter three components, freeing up the capacity of researchers
to focus on the first.
Specifically, the retroharmonize
package proposes a
reproducible workflow, including a new class for storing data together
with the harmonized and original metadata, as well as functions for
importing data from different formats, harmonizing data and metadata,
documenting the harmonization process, and converting between data
types. See here
for an overview of the functionalities.
The new labelled_spss_survey()
class is an extension of
haven’s
labelled_spss class. It not only preserves variable and value labels
and the user-defined missing range, but also gives an identifier, for
example, the filename or the wave number, to the vector. Additionally,
it enables the preservation – as metadata attributes – of the original
variable names, labels, and value codes and labels, from the source
data, in addition to the harmonized variable names, labels, and value
codes and labels. This way, the harmonized data also contain the
pre-harmonization record. The stored original metadata can be used for
validation and documentation purposes.
The vignette Working
With The labelled_spss_survey Class provides more information about
the labelled_spss_survey()
class.
In Harmonize
Value Labels we discuss the characteristics of the
labelled_spss_survey()
class and demonstrates the problems
that using this class solves.
We also provide three extensive case studies illustrating how the
retroharmonize
package can be used for ex-post
harmonization of data from cross-national surveys:
The creators of retroharmonize
are not affiliated with
either Afrobarometer, Arab Barometer, Eurobarometer, or the
organizations that designs, produces or archives their surveys.
We started building an experimental APIs data is running retroharmonize regularly and improving known statistical data sources. See: Digital Music Observatory, Green Deal Data Observatory, Economy Data Observatory.
Our package has been tested on three harmonized survey’s microdata. Because retroharmonize is not affiliated with any of these data sources, to replicate our tutorials or work with the data, you have download the data files from these sources, and you have to cite those sources in your work.
Afrobarometer data: Cite Afrobarometer Arab Barometer data: cite Arab Barometer. Eurobarometer data: The Eurobarometer data Eurobarometer raw data and related documentation (questionnaires, codebooks, etc.) are made available by GESIS, ICPSR and through the Social Science Data Archive networks. You should cite your source, in our examples, we rely on the GESIS data files.
For main developer and contributors, see the package homepage.
This work can be freely used, modified and distributed under the GPL-3 license:
citation("retroharmonize")
#>
#> To cite package 'retroharmonize' in publications use:
#>
#> Daniel Antal (2021). retroharmonize: Ex Post Survey Data
#> Harmonization. https://retroharmonize.dataobservatory.eu/,
#> https://ropengov.github.io/retroharmonize/,
#> https://github.com/rOpenGov/retroharmonize.
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Manual{,
#> title = {retroharmonize: Ex Post Survey Data Harmonization},
#> author = {Daniel Antal},
#> year = {2021},
#> note = {https://retroharmonize.dataobservatory.eu/,
#> https://ropengov.github.io/retroharmonize/,
#> https://github.com/rOpenGov/retroharmonize},
#> }
For contact information, see the package homepage.
Please note that the retroharmonize
project is released
with a Contributor
Code of Conduct. By contributing to this project, you agree to abide
by its terms.