The neonOS
package is designed to facilitate working
with NEON observational (OS) data. It is under development, and
may change without notice. The functionality planned for
immediate development plans include (1) duplicate checking and (2) table
joining.
NEON OS data are collected and ingested via semi-automated systems that include data validation at several steps. However, in a variety of circumstances it is possible for duplicate records to be entered into the system - for example, an analytical lab may accidentally upload multiple copies of the results for a given sample, or field staff may accidentally make the same measurement twice.
The removeDups()
function uses metadata to identify and
resolve duplicates in OS data. In the variables
file
provided with data downloads, the primaryKey field identifies a set of
fields that should define a unique record; i.e., for every unique
combination of primaryKey fields, there should be a single record of
data. removeDups()
uses these fields to identify
duplicates, then attempts to resolve the duplicates it finds, using
these steps:
Please note that NEON staff are also working to identify and resolve duplicates - if you find duplicates in a data set, and then download the same data later, some or all of the duplicates may have been removed.
NEON OS data are typically available as multiple data tables, representing data collected at different times or different temporal or spatial resolution. These are relational tables, and can be joined into a smaller number of flattened tables. Guidance in performing these joins can be found in the Data Relationships section of the Data Product User Guide for each product. A generic joining function for OS data is under development and will be added to this package when possible.
The neonOS package currently contains only one function,
removeDups()
. To access the package, install from GitHub.
Installation and loading:
library(devtools)
install_github('NEONScience/NEON-OS-data-processing/neonOS')
library(neonOS)
library(neonUtilities)
removeDups()
requires 3 inputs: a NEON OS data table,
the corresponding variables file, and the name of the data table. An
example using the fish data product:
fish <- loadByProduct(dpID='DP1.20107.001',
check.size=F,
startdate='2017-05',
enddate='2019-08',
package='expanded')
fish.pass.dups <- removeDups(data=fish$fsh_perPass,
variables=fish$variables_20107,
table='fsh_perPass')