Advanced search and citation of occurrences

Hannah L. Owens

Cory Merow

Brian Maitner

Jamie M. Kass

Vijay Barve

Robert Guralnick

2024-10-28

Advanced features

This vignette demonstrates more advanced features and customization available in occCite. We recommend you read vignette("Simple.Rmd", package = "occCite") first, if you have not already done so.

Loading data from previous GBIF searches

Querying GBIF can take quite a bit of time, especially for multiple species and/or well-known species. In this case, you may wish to access previously-downloaded data sets from your computer by specifying the general location of your downloaded .zip files. occQuery will crawl through your specified GBIFDownloadDirectory to collect all the .zip files contained in that folder and its subfolders. It will then import the most recent downloads that match your taxon list. These GBIF data will be appended to a BIEN search the same as if you do the simple real-time search (if you chose BIEN as well as GBIF), as was shown above. checkPreviousGBIFDownload is TRUE by default, but if loadLocalGBIFDownload is TRUE, occQuery will ignore checkPreviousDownload. It is also worth noting that occCite does not currently support mixed data download sources. That is, you cannot do GBIF queries for some taxa, download previously-prepared data sets for others, and load the rest from local data sets on your computer.

# Simple search
myOldOccCiteObject <- occQuery(x = "Protea cynaroides",
                                  datasources = c("gbif", "bien"),
                                  GBIFLogin = GBIFLogin, 
                                  GBIFDownloadDirectory = 
                                    system.file('extdata/', package='occCite'),
                                  checkPreviousGBIFDownload = T)

Here is the result. Look familiar?

#GBIF search results
head(myOldOccCiteObject@occResults$`Protea cynaroides`$GBIF$OccurrenceTable);
##                name longitude  latitude coordinateUncertaintyInMeters day month
## 1 Protea cynaroides  18.43928 -33.95440                             8  17     2
## 2 Protea cynaroides  22.12754 -33.91561                             4  11     2
## 3 Protea cynaroides  18.43927 -33.95429                             8  17     2
## 4 Protea cynaroides  18.43254 -34.29275                            31   6     2
## 5 Protea cynaroides  18.42429 -34.02934                          2167  10     2
## 6 Protea cynaroides  18.43529 -34.10545                             2   8     2
##   year                           datasetKey dataService
## 1 2022 50c9509d-22c7-4a22-a47d-8c48425ef4a7        GBIF
## 2 2022 50c9509d-22c7-4a22-a47d-8c48425ef4a7        GBIF
## 3 2022 50c9509d-22c7-4a22-a47d-8c48425ef4a7        GBIF
## 4 2022 50c9509d-22c7-4a22-a47d-8c48425ef4a7        GBIF
## 5 2022 50c9509d-22c7-4a22-a47d-8c48425ef4a7        GBIF
## 6 2022 50c9509d-22c7-4a22-a47d-8c48425ef4a7        GBIF
##                               datasetName
## 1 iNaturalist Research-grade Observations
## 2 iNaturalist Research-grade Observations
## 3 iNaturalist Research-grade Observations
## 4 iNaturalist Research-grade Observations
## 5 iNaturalist Research-grade Observations
## 6 iNaturalist Research-grade Observations
#The full summary
summary(myOldOccCiteObject)
##  
##  OccCite query occurred on: 20 June, 2024
##  
##  User query type: User-supplied list of taxa.
##  
##  Sources for taxonomic rectification: GBIF Backbone Taxonomy
##      
##  
##  Taxonomic cleaning results:     
## 
##          Input Name                Best Match Taxonomic Databases w/ Matches
## 1 Protea cynaroides Protea cynaroides (L.) L.         GBIF Backbone Taxonomy
##  
##  Sources for occurrence data: gbif, bien
##      
##                     Species Occurrences Sources
## 1 Protea cynaroides (L.) L.        2334      17
##  
##  GBIF dataset DOIs:  
## 
##                     Species GBIF Access Date           GBIF DOI
## 1 Protea cynaroides (L.) L.       2022-03-02 10.15468/dl.ztbx8c

Getting citation data works the exact same way with previously-downloaded data as it does from a fresh data set.

#Get citations
myOldOccCitations <- occCitation(myOldOccCiteObject)
print(myOldOccCitations)
## Writing 5 Bibtex entries ... OK
## Results written to file 'temp.bib'
## AFFOUARD A, JOLY A, LOMBARDO J, CHAMP J, GOEAU H, CHOUET M, GRESSE H, BONNET P (2023). Pl@ntNet observations. Version 1.8. Pl@ntNet. https://doi.org/10.15468/gtebaa. Accessed via GBIF on 2022-03-02.
## AFFOUARD A, JOLY A, LOMBARDO J, CHAMP J, GOEAU H, CHOUET M, GRESSE H, BOTELLA C, BONNET P (2023). Pl@ntNet automatically identified occurrences. Version 1.8. Pl@ntNet. https://doi.org/10.15468/mma2ec. Accessed via GBIF on 2022-03-02.
## Chamberlain, S., Barve, V., Mcglinn, D., Oldoni, D., Desmet, P., Geffert, L., Ram, K. (2024). rgbif: Interface to the Global Biodiversity Information Facility API. R package version 3.8.1. https://CRAN.R-project.org/package = rgbif.
## Chamberlain, S., Boettiger, C. (2017). R Python, and Ruby clients for GBIF species occurrence data. PeerJ PrePrints.
## Fatima Parker-Allie, Ranwashe F (2018). PRECIS. South African National Biodiversity Institute. https://doi.org/10.15468/rckmn2. Accessed via GBIF on 2022-03-02.
## MNHN, Chagnoux S (2024). The vascular plants collection (P) at the Herbarium of the Muséum national d'Histoire Naturelle (MNHN - Paris). Version 69.384. MNHN - Museum national d'Histoire naturelle. https://doi.org/10.15468/nc6rxy. Accessed via GBIF on 2022-03-02.
## MNHN. Accessed via BIEN on NA.
## Maitner, B. (2023). . R package version 1.2.6. https://CRAN.R-project.org/package = BIEN.
## Missouri Botanical Garden,Herbarium. Accessed via BIEN on NA.
## Observation.org (2024). Observation.org, Nature data from around the World. https://doi.org/10.15468/5nilie. Accessed via GBIF on 2022-03-02.
## Owens, H., Merow, C., Maitner, B., Kass, J., Barve, V., Guralnick, R. (2024). occCite: Querying and Managing Large Biodiversity Occurrence Datasets. R package version 0.5.9. https://CRAN.R-project.org/package = occCite.
## Ranwashe F (2024). Botanical Database of Southern Africa (BODATSA): Botanical Collections. Version 1.25. South African National Biodiversity Institute. https://doi.org/10.15468/2aki0q. Accessed via GBIF on 2022-03-02.
## Rob Cubey (2022). Royal Botanic Garden Edinburgh Living Plant Collections (E). Royal Botanic Garden Edinburgh. https://doi.org/10.15468/bkzv1l. Accessed via GBIF on 2022-03-02.
## SANBI. Accessed via BIEN on NA.
## Senckenberg (2020). African Plants - a photo guide. https://doi.org/10.15468/r9azth. Accessed via GBIF on 2022-03-02.
## Taylor S (2019). G. S. Torrey Herbarium at the University of Connecticut (CONN). University of Connecticut. https://doi.org/10.15468/w35jmd. Accessed via GBIF on 2022-03-02.
## Team}, {.C. (2024). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
## Teisher J, Stimmel H (2024). Tropicos MO Specimen Data. Missouri Botanical Garden. https://doi.org/10.15468/hja69f. Accessed via GBIF on 2022-03-02.
## Tela Botanica. Carnet en Ligne. https://doi.org/10.15468/rydcn2. Accessed via GBIF on 2022-03-02.
## UConn. Accessed via BIEN on NA.
## iNaturalist contributors, iNaturalist (2024). iNaturalist Research-grade Observations. iNaturalist.org. https://doi.org/10.15468/ab3s5x. Accessed via GBIF on 2022-03-02.
## naturgucker.de. naturgucker. https://doi.org/10.15468/uc1apo. Accessed via GBIF on 2022-03-02.

Note that you can also load multiple species using either a vector of species names or a phylogeny (provided you have previously downloaded data for all of the species of interest), and you can load occurrences from non-GBIF data sources (e.g. BIEN) in the same query.


occCite with a Phylogeny

Here is an example of how such a search is structured, using an unpublished phylogeny of billfishes.

library(ape)
#Get tree
treeFile <- system.file("extdata/Fish_12Tax_time_calibrated.tre", package='occCite')
phylogeny <- ape::read.nexus(treeFile)
tree <- ape::extract.clade(phylogeny, 22)
#Query databases for names
myPhyOccCiteObject <- studyTaxonList(x = tree, 
                                     datasources = "GBIF Backbone Taxonomy")
## handled warning: Package taxize unavailable. Skipping taxonomic rectification.
## handled warning: Package taxize unavailable. Skipping taxonomic rectification.
## handled warning: Package taxize unavailable. Skipping taxonomic rectification.
#Query GBIF for occurrence data
myPhyOccCiteObject <- occQuery(x = myPhyOccCiteObject, 
                            datasources = "gbif",
                            GBIFDownloadDirectory = system.file('extdata/', package='occCite'),
                            loadLocalGBIFDownload = T,
                            checkPreviousGBIFDownload = F)
## Warning in gbifRetriever(searchTaxa[[i]]): GBIF unreachable; please try again later.
## Warning in gbifRetriever(searchTaxa[[i]]): GBIF unreachable; please try again later.
## Warning in gbifRetriever(searchTaxa[[i]]): GBIF unreachable; please try again later.
# What does a multispecies query look like?
summary(myPhyOccCiteObject)
##  
##  OccCite query occurred on: 28 October, 2024
##  
##  User query type: User-supplied phylogeny.
##  
##  Sources for taxonomic rectification: GBIF Backbone Taxonomy
##      
##  
##  Taxonomic cleaning results:     
## 
##                   Input Name                 Best Match
## 1 Tetrapturus_angustirostris Tetrapturus_angustirostris
## 2         Tetrapturus_belone         Tetrapturus_belone
## 3      Tetrapturus_pfluegeri      Tetrapturus_pfluegeri
##   Taxonomic Databases w/ Matches
## 1                 Not rectified.
## 2                 Not rectified.
## 3                 Not rectified.
##  
##  Sources for occurrence data: gbif
##      
##                      Species Occurrences Sources
## 1 Tetrapturus_angustirostris           0       0
## 2         Tetrapturus_belone           0       0
## 3      Tetrapturus_pfluegeri           0       0
##  
##  GBIF dataset DOIs:  
## 
##                      Species GBIF Access Date GBIF DOI
## 1 Tetrapturus_angustirostris             <NA>     <NA>
## 2         Tetrapturus_belone             <NA>     <NA>
## 3      Tetrapturus_pfluegeri             <NA>     <NA>

When you have results for multiple species, as in this case, you can also plot the summary figures either for the whole search…

plot(myPhyOccCiteObject)
## Error in d.tbl[[i]]: subscript out of bounds

or you can plot the results by species!

plot(myPhyOccCiteObject, bySpecies = T, plotTypes = c("yearHistogram", "source"))
## Error in d.tbl[[i]]: subscript out of bounds

And then you can print out the citations, separated by species (or not, but in this example, they’re separate).

#Get citations
myPhyOccCitations <- occCitation(myPhyOccCiteObject)
## Error in strsplit(occResults$GBIF$Metadata$modified, "T"): non-character argument
#Print citations as text with accession dates.
print(myPhyOccCitations, bySpecies = T)
## Error: object 'myPhyOccCitations' not found