This is an abridged version of the full firehose vignette. The vignette can be viewed in its entirety at https://github.com/earthlab/cft-CRAN/blob/master/full_vignettes/firehose.md
This vignette provides a walk-through of a common use case of the Firehose function.
The purpose of the Firehose functions is to download the data as quickly as possible by distributing tasks across multiple processors. The more cores you use, the faster this process will go.
Note that the Firehose function is best used for downloading MACA climate model data for multiple climate variables from multiple climate models and emission scenarios in their entirety for a single lat/long location. If you want to download data about multiple variables from multiple models and emission scenarios over a larger geographic region and a shorter time period, you should use the available_data function. A vignette about how to use the available_data function is available at https://github.com/earthlab/cft-CRAN/blob/master/full_vignettes/available-data.md.
Load the cft package and other libraries required for vignette. If you need to install cft, install it from CRAN.
library(cft)
library(future)
library(furrr)
library(sf)
library(tidync)
We will start by setting up our computer to run code on multiple cores instead of just one. The availableCores() function first checks your local computer to see how many cores are available and then subtracts one so that you still have an available core for running your operating system. The plan() function then starts a back-end structure where tasks can be assigned. These backend systems can sometimes have difficulty shutting down after the process is done, especially if you force quite an operation in the works. If you find your code stalling without good explanation, it’s good to restart your computer to clear any of these structures that may be stuck in memory.
n_cores <- availableCores() - 1
plan(multicore, workers = n_cores)
We pull all of our data from the internet. Since internet connections can be a little variable, we try to make a strong link between our computer and the data server by creating an src object. Run this code to establish the connection and then use the src object you created to call on that connection for information. Because this src object is a connection, it will need to be reconnected each time you want to use it. You cannot make it once and then use if forever.
web_link = "https://cida.usgs.gov/thredds/dodsC/macav2metdata_daily_future"
# Change to "https://cida.usgs.gov/thredds/catalog.html?dataset=cida.usgs.gov/macav2metdata_daily_historical" for historical data.
src <- tidync::tidync(web_link)
## not a file:
## ' https://cida.usgs.gov/thredds/dodsC/macav2metdata_daily_future '
##
## ... attempting remote connection
## Connection succeeded.
After a connection is made to the server, we can run the available_data() function to check that server and see what data it has available to us. The available_data() function produces three outputs:
Here we print that list of variables. This may take up to a minutes as you retrieve the information from the server.
# This is your menu
inputs <- cft::available_data()
## Trying to connect to the USGS.gov API
## not a file:
## ' https://cida.usgs.gov/thredds/dodsC/macav2metdata_daily_future '
##
## ... attempting remote connection
## Connection succeeded.
## Reading results
## Converting into an R data.table
inputs[[1]]
## # A tibble: 350 × 9
## Available varia…¹ Varia…² Units Model Model…³ Scena…⁴ Varia…⁵ Model…⁶ Scena…⁷
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 huss_BNU-ESM_r1i… Specif… kg k… Beij… r1i1p1 RCP 4.5 huss BNU-ESM rcp45
## 2 huss_BNU-ESM_r1i… Specif… kg k… Beij… r1i1p1 RCP 8.5 huss BNU-ESM rcp85
## 3 huss_CCSM4_r6i1p… Specif… kg k… Comm… r6i1p1 RCP 4.5 huss CCSM4 rcp45
## 4 huss_CCSM4_r6i1p… Specif… kg k… Comm… r6i1p1 RCP 8.5 huss CCSM4 rcp85
## 5 huss_CNRM-CM5_r1… Specif… kg k… Cent… r1i1p1 RCP 4.5 huss CNRM-C… rcp45
## 6 huss_CNRM-CM5_r1… Specif… kg k… Cent… r1i1p1 RCP 8.5 huss CNRM-C… rcp85
## 7 huss_CSIRO-Mk3-6… Specif… kg k… Comm… r1i1p1 RCP 4.5 huss CSIRO-… rcp45
## 8 huss_CSIRO-Mk3-6… Specif… kg k… Comm… r1i1p1 RCP 8.5 huss CSIRO-… rcp85
## 9 huss_CanESM2_r1i… Specif… kg k… Cana… r1i1p1 RCP 4.5 huss CanESM2 rcp45
## 10 huss_CanESM2_r1i… Specif… kg k… Cana… r1i1p1 RCP 8.5 huss CanESM2 rcp85
## # … with 340 more rows, and abbreviated variable names ¹`Available variable`,
## # ²Variable, ³`Model ensemble type (only CCSM4 relevant)`, ⁴Scenario,
## # ⁵`Variable abbreviation`, ⁶`Model abbreviation`, ⁷`Scenario abbreviation`
From the table that was returned, we want to decide which variables we would like to request. If you are using the Firehose function, you likely have a long list of variables you’d like to download in their entirety. Use the filter() function to select the variables you’d like to download. Store those choices in an object called input_variables to pass on to the Firehose function.
input_variables <- inputs$variable_names %>%
filter(Variable %in% c("Minimum Relative Humidity",
"Minimum Temperature",
"Precipitation")) %>%
filter(Scenario %in% c( "RCP 8.5", "RCP 4.5")) %>%
filter(Model %in% c(
"Beijing Climate Center - Climate System Model 1.1",
"Beijing Normal University - Earth System Model")) %>%
pull("Available variable")
You can check the object you created to see that it is a list of formatted variable names ready to send to the server.
# This is the list of things you would like to order off the menu
input_variables
## [1] "pr_BNU-ESM_r1i1p1_rcp45" "pr_BNU-ESM_r1i1p1_rcp85"
## [3] "pr_bcc-csm1-1_r1i1p1_rcp45" "pr_bcc-csm1-1_r1i1p1_rcp85"
## [5] "rhsmin_BNU-ESM_r1i1p1_rcp45" "rhsmin_BNU-ESM_r1i1p1_rcp85"
## [7] "rhsmin_bcc-csm1-1_r1i1p1_rcp45" "rhsmin_bcc-csm1-1_r1i1p1_rcp85"
## [9] "tasmin_BNU-ESM_r1i1p1_rcp45" "tasmin_BNU-ESM_r1i1p1_rcp85"
## [11] "tasmin_bcc-csm1-1_r1i1p1_rcp45" "tasmin_bcc-csm1-1_r1i1p1_rcp85"
The Firehose function downloads all the requested variables, for all available time points, for a SINGLE geographic location. The underlying data are summarized to points on a grid and don’t include every conceivable point you could enter. You need to first suggest a lat/long pair and then find the aggregated download point that is closest to your suggested point. In the code here, we use Open Street Map to download a polygon for Rocky Mountain National Park and then find the centroid as our suggested point.
aoi_name <- "colorado"
bb <- getbb(aoi_name)
my_boundary <- opq(bb, timeout=300) %>%
add_osm_feature(key = "boundary", value = "national_park") %>%
osmdata_sf()
my_boundary
my_boundary$osm_multipolygons
## Simple feature collection with 13 features and 42 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -109.3377 ymin: 36.28472 xmax: -103.4169 ymax: 40.74693
## Geodetic CRS: WGS 84
## First 10 features:
## osm_id name
## 390960 390960 Rocky Mountain National Park
## 395552 395552 Black Canyon of the Gunnison National Park
## 395555 395555 Curecanti National Recreation Area
## 5453168 5453168 Colorado National Monument
## 5725308 5725308 Great Sand Dunes National Park
## 5725309 5725309 Great Sand Dunes National Preserve
## 5749022 5749022 Browns Canyon National Monument
## 5749060 5749060 Florissant Fossil Beds National Monument
## 5749153 5749153 Dinosaur National Monument
## 5749169 5749169 Yucca House National Monument
## attribution boundary boundary.type ele gnis.county_id
## 390960 National Park Service national_park protected_area
## 395552 National Park Service national_park 2045 085
## 395555 National Park Service national_park protected_area
## 5453168 national_park
## 5725308 National Park Service national_park protected_area 2590 109
## 5725309 National Park Service national_park protected_area
## 5749022 national_park protected_area
## 5749060 national_park
## 5749153 national_park
## 5749169 national_park
## gnis.created gnis.feature_id gnis.state_id heritage heritage.operator
## 390960
## 395552 10/13/1978 203271 08
## 395555
## 5453168
## 5725308 08/31/1992 192558 08
## 5725309
## 5749022
## 5749060
## 5749153
## 5749169 202628
## leisure name.ar name.de name.en name.es name.fr name.ja name.nl
## 390960
## 395552 nature_reserve
## 395555 nature_reserve
## 5453168 nature_reserve
## 5725308 nature_reserve
## 5725309 nature_reserve
## 5749022 nature_reserve
## 5749060 nature_reserve
## 5749153 nature_reserve
## 5749169
## name.ru name.zh operator operator.short
## 390960 National Park Service NPS
## 395552 United States National Park Service
## 395555 National Park Service NPS
## 5453168
## 5725308 National Park Service NPS
## 5725309 National Park Service NPS
## 5749022 BLM;USFS
## 5749060 United States National Park Service
## 5749153 United States National Park Service
## 5749169 United States National Park Service
## operator.type operator.wikidata operator.wikipedia ownership
## 390960 public Q308439 en:National Park Service national
## 395552 national
## 395555 public Q308439 national
## 5453168 national
## 5725308 public Q308439 en:National Park Service national
## 5725309 public Q308439 en:National Park Service national
## 5749022 national
## 5749060 national
## 5749153 national
## 5749169 national
## protect_class protect_id protect_title protected
## 390960 2 2 National Park perpetuity
## 395552 2 perpetuity
## 395555 5 perpetuity
## 5453168 3 perpetuity
## 5725308 2 perpetuity
## 5725309 2 perpetuity
## 5749022 3 perpetuity
## 5749060 5
## 5749153 3
## 5749169 22
## protection_title ref.whc
## 390960 National Park
## 395552 National Park
## 395555 National Recreation Area
## 5453168 National Monument
## 5725308 National Park
## 5725309 National Preserve
## 5749022 National Monument
## 5749060 National Monument
## 5749153 National Monument
## 5749169 National Monument
## source
## 390960
## 395552
## 395555 http://science.nature.nps.gov/im/gis/index.cfm
## 5453168 http://science.nature.nps.gov/im/gis/index.cfm;en.wikipedia.org
## 5725308
## 5725309
## 5749022
## 5749060
## 5749153
## 5749169
## source.geometry type website
## 390960 boundary http://www.nps.gov/romo/
## 395552 boundary
## 395555 boundary
## 5453168 boundary https://www.nps.gov/colm/index.htm
## 5725308 boundary
## 5725309 boundary
## 5749022 boundary
## 5749060 boundary https://www.nps.gov/flfo/index.htm
## 5749153 boundary https://www.nps.gov/dino/index.htm
## 5749169 boundary http://www.nps.gov/yuho/
## whc.criteria whc.inscription_date wikidata
## 390960 Q777183
## 395552 Q305880
## 395555
## 5453168 Q1111312
## 5725308 Q609097
## 5725309 Q609097
## 5749022
## 5749060 Q1430179
## 5749153 Q1226698
## 5749169 Q602414
## wikipedia
## 390960 en:Rocky Mountain National Park
## 395552 en:Black Canyon of the Gunnison National Park
## 395555
## 5453168 en:Colorado National Monument
## 5725308 en:Great Sand Dunes National Park and Preserve
## 5725309 en:Great Sand Dunes National Park and Preserve
## 5749022
## 5749060 en:Florissant Fossil Beds National Monument
## 5749153 en:Dinosaur National Monument
## 5749169 en:Yucca House National Monument
## geometry
## 390960 MULTIPOLYGON (((-105.7615 4...
## 395552 MULTIPOLYGON (((-107.8019 3...
## 395555 MULTIPOLYGON (((-107.6507 3...
## 5453168 MULTIPOLYGON (((-108.7384 3...
## 5725308 MULTIPOLYGON (((-105.517 37...
## 5725309 MULTIPOLYGON (((-105.5522 3...
## 5749022 MULTIPOLYGON (((-105.9675 3...
## 5749060 MULTIPOLYGON (((-105.2509 3...
## 5749153 MULTIPOLYGON (((-108.9678 4...
## 5749169 MULTIPOLYGON (((-108.6855 3...
my_boundary$osm_multipolygons[1,]
## Simple feature collection with 1 feature and 42 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -105.9137 ymin: 40.15777 xmax: -105.4936 ymax: 40.55379
## Geodetic CRS: WGS 84
## osm_id name attribution boundary
## 390960 390960 Rocky Mountain National Park National Park Service national_park
## boundary.type ele gnis.county_id gnis.created gnis.feature_id
## 390960 protected_area
## gnis.state_id heritage heritage.operator leisure name.ar name.de name.en
## 390960
## name.es name.fr name.ja name.nl name.ru name.zh operator
## 390960 National Park Service
## operator.short operator.type operator.wikidata operator.wikipedia
## 390960 NPS public Q308439 en:National Park Service
## ownership protect_class protect_id protect_title protected
## 390960 national 2 2 National Park perpetuity
## protection_title ref.whc source source.geometry type
## 390960 National Park boundary
## website whc.criteria whc.inscription_date wikidata
## 390960 http://www.nps.gov/romo/ Q777183
## wikipedia geometry
## 390960 en:Rocky Mountain National Park MULTIPOLYGON (((-105.7615 4...
boundaries <- my_boundary$osm_multipolygons[1,]
pulled_bb <- st_bbox(boundaries)
pulled_bb
## xmin ymin xmax ymax
## -105.91371 40.15777 -105.49358 40.55379
pt <- st_coordinates(st_centroid(boundaries))
## Warning in st_centroid.sf(boundaries): st_centroid assumes attributes are
## constant over geometries of x
So, our suggested point is pt.
pt
## X Y
## 1 -105.6973 40.35543
Now we can check to see which point in the dataset most closely resembles our suggested point.
lat_pt <- pt[1,2]
lon_pt <- pt[1,1]
lons <- src %>% activate("D2") %>% hyper_tibble()
lats <- src %>% activate("D1") %>% hyper_tibble()
known_lon <- lons[which(abs(lons-lon_pt)==min(abs(lons-lon_pt))),]
known_lat <- lats[which(abs(lats-lat_pt)==min(abs(lats-lat_pt))),]
chosen_pt <- st_as_sf(cbind(known_lon,known_lat), coords = c("lon", "lat"), crs = "WGS84", agr = "constant")
The chosen point is relatively close to the suggested point.
chosen_pt
## Simple feature collection with 1 feature and 0 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -105.6891 ymin: 40.3545 xmax: -105.6891 ymax: 40.3545
## Geodetic CRS: WGS 84
## geometry
## 1 POINT (-105.6891 40.3545)
You will need to supply the Firehose function with two things for it to work properly:
Provide those two things to the single_point_firehose() function and it will download your data as fast as possible, organize those data into an sf spatial dataframe, and return that dataframe.
Some notes of caution.
When you use numerous cores to make a lot of simultaneous requests, the server occasionally mistakes you for a DDOS attack and kills your connection. The firehose function deals with this uncertainty, and other forms of network disruption, by conducting two phases of downloads. It first tries to download everything you requested through a ton of independent api requests, it then scans all of the downloads to see which ones failed and organizes a follow-up download run to recapture downloads that failed on your first attempt. Do not worry if you see the progress bar get disrupted during either one of these passes, it will hopefully capture all of the errors and retry those downloads.
This function will run faster if you provide it more cores and a faster internet connection. In our test runs with 11 cores on a 12 core laptop and 800 MB/s internet it took 40-80 minutes to download 181 variables. The variance was largely driven by our network connection speed and error rate (because they require a second try to download).
out <- single_point_firehose(input_variables, known_lat, known_lon )
out
## Simple feature collection with 34333 features and 13 fields
## Attribute-geometry relationship: 13 constant, 0 aggregate, 0 identity
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -105.6891 ymin: 40.3545 xmax: -105.6891 ymax: 40.3545
## Geodetic CRS: WGS 84
## First 10 features:
## pr_BNU-ESM_r1i1p1_rcp85 time pr_bcc-csm1-1_r1i1p1_rcp45
## 1 32.2 38716 0.0
## 2 1.1 38717 1.0
## 3 1.9 38718 0.0
## 4 7.8 38719 0.0
## 5 0.7 38720 0.0
## 6 5.4 38721 0.0
## 7 1.1 38722 0.6
## 8 0.0 38723 0.7
## 9 0.0 38724 3.1
## 10 0.6 38725 1.9
## rhsmin_BNU-ESM_r1i1p1_rcp45 rhsmin_BNU-ESM_r1i1p1_rcp85
## 1 66 65
## 2 40 40
## 3 41 40
## 4 55 55
## 5 54 53
## 6 53 53
## 7 34 35
## 8 38 38
## 9 51 55
## 10 48 50
## rhsmin_bcc-csm1-1_r1i1p1_rcp45 tasmin_BNU-ESM_r1i1p1_rcp45
## 1 59 271.2
## 2 54 269.9
## 3 50 266.6
## 4 54 270.4
## 5 51 274.6
## 6 49 272.9
## 7 31 267.1
## 8 37 265.8
## 9 42 265.8
## 10 33 267.0
## tasmin_BNU-ESM_r1i1p1_rcp85 tasmin_bcc-csm1-1_r1i1p1_rcp45
## 1 271.0 264.2
## 2 269.7 269.7
## 3 266.6 265.3
## 4 269.8 264.7
## 5 274.9 260.9
## 6 272.8 260.3
## 7 267.2 261.0
## 8 265.9 264.3
## 9 265.7 268.8
## 10 266.9 269.5
## pr_BNU-ESM_r1i1p1_rcp45 pr_bcc-csm1-1_r1i1p1_rcp85
## 1 34.9 0.0
## 2 1.2 1.2
## 3 2.1 0.0
## 4 7.7 0.0
## 5 0.5 0.0
## 6 5.0 0.0
## 7 1.3 0.0
## 8 0.0 1.8
## 9 0.0 0.9
## 10 0.4 1.2
## rhsmin_bcc-csm1-1_r1i1p1_rcp85 tasmin_bcc-csm1-1_r1i1p1_rcp85
## 1 56 263.9
## 2 54 269.3
## 3 51 265.2
## 4 56 263.5
## 5 51 261.5
## 6 50 261.4
## 7 35 261.0
## 8 39 263.3
## 9 39 267.0
## 10 34 268.9
## geometry
## 1 POINT (-105.6891 40.3545)
## 2 POINT (-105.6891 40.3545)
## 3 POINT (-105.6891 40.3545)
## 4 POINT (-105.6891 40.3545)
## 5 POINT (-105.6891 40.3545)
## 6 POINT (-105.6891 40.3545)
## 7 POINT (-105.6891 40.3545)
## 8 POINT (-105.6891 40.3545)
## 9 POINT (-105.6891 40.3545)
## 10 POINT (-105.6891 40.3545)