This document outlines the One4All package and highlights the main functions to validate data without using the validator app. It is the user’s choice whether to work in the validator app or to use the One4All package. After reading this document, users will have a better understanding of the One4All package development and the main functions to validate, share, and download data. To access the One4All package, go to our GitHub and link it directly to your own device in R.
The One4All package is the backbone of the validator app. If you are looking for a tutorial on how to use the app see the Validator App Tutorial.
After installing the R package, read in the following library:
To run the app, run the command run_app()
:
The function below validates data using the One4All package. Replace
the parameters defined below with your actual values or file paths. The
'data_names'
should be replaced with the tables from the
rules sheet.
'files_data'
: A list of file paths for
the dataset to be validated (either ‘CSV’ or ‘Excel’ files).
'data_names'
: (Optional) A character
vector of names for the datasets. If not provided, names will be
extracted from the file paths.
'(ex. methodology, samples, particles)'
'file_rules'
: A file path for the rules
file, either in ‘CSV’ or ‘Excel’ format.
The function below checks for malicious files. If any of the provided
files appear to have a malicious extension, the function will stop and
raise an error. The argument, 'files'
, is a character
vector of file paths, which can be paths to zip or individual files. If
any malicious file is found, the code will return ‘TRUE’, otherwise it
will say ‘FALSE’.
The function below reads rules from a file or a data frame. Acceptable file formats are ‘CSV’ or ‘Excel’ files.
The function reads and formats data from ‘CSV’ or ‘Excel’ files. The
argument, 'files_data'
, is the list of files to be read,
and 'data_names'
, is the optional vector of names for the
data frames.
If there are any invalid data, users can view the broken rules using
the 'rules_broken'
function. Replace the
'results'
and 'show_decision'
parameters with
your values. Ensure that the results are in the format of a dataframe by
specifying the table (ex. [[1]], [[2]], and [[3]]).
The 'remote_download'
function allows users to download
data from MongoDB, CKAN, and/or Amazon S3. The data is retrieved based
on the 'hashed_data'
identifier (the dataset ID from a
downloaded certificate) and assumes the data is stored using the same
naming conventions provided in the 'remote_share'
function.
The 'remote_download'
function is shown below.
downloaded_data <- remote_download(hashed_data = "example_hash",
ckan_url = "https://example.com",
ckan_key = "your_ckan_key",
ckan_package = "your_ckan_package",
s3_key_id = "your_s3_key_id",
s3_secret_key = "your_s3_secret_key",
s3_region = "your_s3_region",
s3_bucket = "your_s3_bucket",
mongo_key = "mongo_key",
mongo_collection = "mongo_collection")