sapfluxnetr
package offers a very flexible but powerful
API based on the tidyverse
packages to aggregate and
summarise the site/s data in the form of the sfn_metrics
function. All the metrics family of functions (?metrics
)
make use of the sfn_metrics
function under the hood. If you
want full control to the statistics returned and aggregation periods, we
recommend you to use this API. This vignette will show you how.
daily_metrics
monthly_metrics
predawn_metrics
midday_metrics
nightly_metrics
daylight_metrics
See each function help for a detailed description and examples of use.
daily_metrics
and related functions return a complete
set of metrics ready for use, but if you want different metrics you can
supply your own summarising functions using the .funs
argument.
The correct way of specifying the functions to use is described in the
summarise_all
help (?dplyr::summarise_all
).
The recommended way is a list of formulas with the function call:
# libraries
library(sapfluxnetr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
### only mean and sd at a daily scale
# data
data('ARG_TRE', package = 'sapfluxnetr')
# summarising funs (as a list of formulas)
<- list(mean = ~ mean(., na.rm = TRUE), std_dev = ~ sd(., na.rm = TRUE))
custom_funs
# metrics
<- sfn_metrics(
foo_simpler_metrics
ARG_TRE,period = '1 day',
.funs = custom_funs,
solar = TRUE,
interval = 'general'
)#> [1] "Crunching data for ARG_TRE. In large datasets this could take a while"
#> [1] "General data for ARG_TRE"
'sapf']]
foo_simpler_metrics[[#> # A tibble: 14 × 9
#> TIMESTAMP ARG_TRE…¹ ARG_T…² ARG_T…³ ARG_T…⁴ ARG_T…⁵ ARG_T…⁶ ARG_T…⁷
#> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2009-11-17 00:00:00 308. 173. 303. 255. 20.7 23.2 14.0
#> 2 2009-11-18 00:00:00 507. 376. 432. 490. 170. 174. 130.
#> 3 2009-11-19 00:00:00 541. 380. 391. 524. 262. 169. 150.
#> 4 2009-11-20 00:00:00 330. 218. 272. 334. 139. 67.2 74.6
#> 5 2009-11-21 00:00:00 338. 219. 278. 351. 190. 108. 113.
#> 6 2009-11-22 00:00:00 384. 243. 310. 383. 268. 172. 184.
#> 7 2009-11-23 00:00:00 492. 300. 390. 513. 327. 200. 228.
#> 8 2009-11-24 00:00:00 573. 389. 497. 626. 313. 222. 261.
#> 9 2009-11-25 00:00:00 601. 400. 484. 644. 193. 133. 170.
#> 10 2009-11-26 00:00:00 502. 360. 450. 613. 277. 233. 308.
#> 11 2009-11-27 00:00:00 544. 411. 506. 740. 271. 221. 285.
#> 12 2009-11-28 00:00:00 573. 451. 589. 840. 180. 169. 249.
#> 13 2009-11-29 00:00:00 371. 285. 357. 547. 233. 220. 197.
#> 14 2009-11-30 00:00:00 386. 293. 381. 602. 273. 209. 288.
#> # … with 1 more variable: ARG_TRE_Nan_Jt_4_std_dev <dbl>, and abbreviated
#> # variable names ¹ARG_TRE_Nan_Jt_1_mean, ²ARG_TRE_Nan_Jt_2_mean,
#> # ³ARG_TRE_Nan_Jt_3_mean, ⁴ARG_TRE_Nan_Jt_4_mean, ⁵ARG_TRE_Nan_Jt_1_std_dev,
#> # ⁶ARG_TRE_Nan_Jt_2_std_dev, ⁷ARG_TRE_Nan_Jt_3_std_dev
When supplying only one function to .funs, names of variables are not changed to contain the metric name at the end, as the summary function returns the same columns as the original data
You can also choose if the “special interest” intervals (predawn, midday, nighttime or daylight) are calculated or not. For example, if you are only interested in the midday interval you can use:
<- sfn_metrics(
foo_simpler_metrics_midday
ARG_TRE,period = '1 day',
.funs = custom_funs,
solar = TRUE,
interval = 'midday', int_start = 11, int_end = 13
)#> [1] "Crunching data for ARG_TRE. In large datasets this could take a while"
#> [1] "midday data for ARG_TRE"
'sapf']]
foo_simpler_metrics_midday[[#> # A tibble: 13 × 9
#> TIMESTAMP_md ARG_TRE…¹ ARG_T…² ARG_T…³ ARG_T…⁴ ARG_T…⁵ ARG_T…⁶ ARG_T…⁷
#> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2009-11-18 00:00:00 685. 665. 614. 719. 23.8 70.1 40.1
#> 2 2009-11-19 00:00:00 879. 594. 626. 664. 193. 67.3 9.82
#> 3 2009-11-20 00:00:00 438. 272. 258. 474. 116. 54.2 63.1
#> 4 2009-11-21 00:00:00 631. 379. 533. 619. 46.8 25.1 7.69
#> 5 2009-11-22 00:00:00 783. 535. 680. 875. 40.9 42.4 194.
#> 6 2009-11-23 00:00:00 841. 478. 618. 1018. 13.3 9.47 9.94
#> 7 2009-11-24 00:00:00 951. 636. 789. 829. 27.0 1.00 168.
#> 8 2009-11-25 00:00:00 907. 602. 789. 913. 22.1 31.0 229.
#> 9 2009-11-26 00:00:00 861. 697. 925. 1265. 100. 97.8 229.
#> 10 2009-11-27 00:00:00 806. 594. 706. 1044. 1.67 42.8 4.14
#> 11 2009-11-28 00:00:00 837. 730. 925. 1313. 11.2 30.3 228.
#> 12 2009-11-29 00:00:00 638. 605. 666. 1333. 40.7 29.5 49.5
#> 13 2009-11-30 00:00:00 548. 371. 444. 961. 44.6 149. 222.
#> # … with 1 more variable: ARG_TRE_Nan_Jt_4_std_dev_md <dbl>, and abbreviated
#> # variable names ¹ARG_TRE_Nan_Jt_1_mean_md, ²ARG_TRE_Nan_Jt_2_mean_md,
#> # ³ARG_TRE_Nan_Jt_3_mean_md, ⁴ARG_TRE_Nan_Jt_4_mean_md,
#> # ⁵ARG_TRE_Nan_Jt_1_std_dev_md, ⁶ARG_TRE_Nan_Jt_2_std_dev_md,
#> # ⁷ARG_TRE_Nan_Jt_3_std_dev_md
period
argument in sfn_metrics
is passed to
.collapse_timestamp
function, and so, it can use the same
input:
# weekly
<- sfn_metrics(
foo_weekly
ARG_TRE,period = '7 days',
.funs = custom_funs,
solar = TRUE,
interval = 'general'
)#> [1] "Crunching data for ARG_TRE. In large datasets this could take a while"
#> [1] "General data for ARG_TRE"
'env']]
foo_weekly[[#> # A tibble: 3 × 19
#> TIMESTAMP ta_mean rh_mean vpd_mean sw_in_m…¹ ws_mean preci…² swc_s…³
#> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2009-11-15 00:00:00 4.81 35.3 0.598 280. 15.5 0.00612 0.365
#> 2 2009-11-22 00:00:00 6.15 35.3 0.656 327. 24.5 0.192 0.348
#> 3 2009-11-29 00:00:00 2.55 40.9 0.453 261. 23.1 0.122 0.374
#> # … with 11 more variables: ppfd_in_mean <dbl>, ext_rad_mean <dbl>,
#> # ta_std_dev <dbl>, rh_std_dev <dbl>, vpd_std_dev <dbl>, sw_in_std_dev <dbl>,
#> # ws_std_dev <dbl>, precip_std_dev <dbl>, swc_shallow_std_dev <dbl>,
#> # ppfd_in_std_dev <dbl>, ext_rad_std_dev <dbl>, and abbreviated variable
#> # names ¹sw_in_mean, ²precip_mean, ³swc_shallow_mean
...
) argument of
sfn_metrics
. Also, this function always must return a
vector of timestamps of the same length as the original timestamp.quarter
function from the lubridate package:<- sfn_metrics(
foo_custom
AUS_CAN_ST2_MIX,period = lubridate::quarter,
.funs = custom_funs,
solar = TRUE,
interval = 'general',
with_year = TRUE # argument for lubridate::quarter
)#> [1] "Crunching data for AUS_CAN_ST2_MIX. In large datasets this could take a while"
#> Warning in .period_to_minutes(period, .data$TIMESTAMP, unique(.data$timestep)): when using a custom function as period, coverage calculation
#> can be less accurate
#> Warning in .period_to_minutes(period, .data$TIMESTAMP, unique(.data$timestep)): when using a custom function as period, coverage calculation
#> can be less accurate
#> [1] "General data for AUS_CAN_ST2_MIX"
'env']
foo_custom[#> $env
#> # A tibble: 5 × 17
#> TIMESTAMP ta_mean vpd_mean sw_in_mean ws_mean precip…¹ ppfd_…² rh_mean ext_r…³
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2006. 8.48 0.106 25.4 0.161 0.0158 53.7 93.1 166.
#> 2 2006. 10.5 0.278 82.0 0.318 0.0399 173. 86.9 245.
#> 3 2006. 16.6 0.826 219. 0.581 0.0200 463. 72.8 468.
#> 4 2007. 20.9 0.985 197. 0.439 0.0333 416. 75.7 435.
#> 5 2007. 15.8 0.386 110. 0.200 0.0181 231. 86.7 213.
#> # … with 8 more variables: ta_std_dev <dbl>, vpd_std_dev <dbl>,
#> # sw_in_std_dev <dbl>, ws_std_dev <dbl>, precip_std_dev <dbl>,
#> # ppfd_in_std_dev <dbl>, rh_std_dev <dbl>, ext_rad_std_dev <dbl>, and
#> # abbreviated variable names ¹precip_mean, ²ppfd_in_mean, ³ext_rad_mean
sfn_metrics
has a ...
parameter intended to
supply additional parameters to the internal functions used:
.collapse_timestamp
accepts the following extra
arguments:
side
dplyr::summarise_all
accepts extra arguments
intended to be applied to the summarising functions provided (to
all, so they all must have the argument provided or an
error will be raised). That’s the reason because we recommend to use the
list way, as the arguments are specified for the individual
functions.
For example, if we want the TIMESTAMPs after aggregation to show the end of the period instead the beginning (default) we can do the following:
<- sfn_metrics(
foo_simpler_metrics_end
ARG_TRE,period = '1 day',
.funs = custom_funs,
solar = TRUE,
interval = 'general',
side = "end"
)#> [1] "Crunching data for ARG_TRE. In large datasets this could take a while"
#> [1] "General data for ARG_TRE"
'sapf']]
foo_simpler_metrics_end[[#> # A tibble: 14 × 9
#> TIMESTAMP ARG_TRE…¹ ARG_T…² ARG_T…³ ARG_T…⁴ ARG_T…⁵ ARG_T…⁶ ARG_T…⁷
#> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2009-11-18 00:00:00 308. 173. 303. 255. 20.7 23.2 14.0
#> 2 2009-11-19 00:00:00 507. 376. 432. 490. 170. 174. 130.
#> 3 2009-11-20 00:00:00 541. 380. 391. 524. 262. 169. 150.
#> 4 2009-11-21 00:00:00 330. 218. 272. 334. 139. 67.2 74.6
#> 5 2009-11-22 00:00:00 338. 219. 278. 351. 190. 108. 113.
#> 6 2009-11-23 00:00:00 384. 243. 310. 383. 268. 172. 184.
#> 7 2009-11-24 00:00:00 492. 300. 390. 513. 327. 200. 228.
#> 8 2009-11-25 00:00:00 573. 389. 497. 626. 313. 222. 261.
#> 9 2009-11-26 00:00:00 601. 400. 484. 644. 193. 133. 170.
#> 10 2009-11-27 00:00:00 502. 360. 450. 613. 277. 233. 308.
#> 11 2009-11-28 00:00:00 544. 411. 506. 740. 271. 221. 285.
#> 12 2009-11-29 00:00:00 573. 451. 589. 840. 180. 169. 249.
#> 13 2009-11-30 00:00:00 371. 285. 357. 547. 233. 220. 197.
#> 14 2009-12-01 00:00:00 386. 293. 381. 602. 273. 209. 288.
#> # … with 1 more variable: ARG_TRE_Nan_Jt_4_std_dev <dbl>, and abbreviated
#> # variable names ¹ARG_TRE_Nan_Jt_1_mean, ²ARG_TRE_Nan_Jt_2_mean,
#> # ³ARG_TRE_Nan_Jt_3_mean, ⁴ARG_TRE_Nan_Jt_4_mean, ⁵ARG_TRE_Nan_Jt_1_std_dev,
#> # ⁶ARG_TRE_Nan_Jt_2_std_dev, ⁷ARG_TRE_Nan_Jt_3_std_dev
If it is compared with the foo_simpler_metrics
calculated before, now the period is identified in the TIMESTAMP by the
ending of the period (daily in this case).
When supplying custom functions as “period” argument, the default coverage statistic is not reliable as there is no way of knowing beforehand the period/s in minutes.
The internal aggregation process in sfn_metrics
generates some transitory columns which can be used in the summarising
functions:
TIMESTAMP_coll
When aggregating by the declared period (i.e. "daily"
),
the TIMESTAMP column collapses to the period start/end value (meaning
thet all the TIMESTAMP values for the same day becomes identical).
This makes impossible to use any summarise functions thet obtain the
time of the day at which one event happens (i.e. time of the day at
which the maximum sap flow occurs) because all TIMESTAMP values are
identical. For thet kind of summarising functions, a transitory column
called TIMESTAMP_coll
is created. So in this case we can
create a function thet takes de variable values for the day, the
TIMESTAMP_coll values for the day and return the TIMESTAMP at which the
max sap flow occurs and use it with sfn_metrics
:
<- function(x, time) {
max_time
# x: vector of values for a day
# time: TIMESTAMP for the day
# if all the values in x are NAs (a daily summmarise of no measures day for
# example) this will return a length 0 POSIXct vector, which will crash
# dplyr summarise step. So, check if all NA and if true return NA as POSIXct
if(all(is.na(x))) {
return(as.POSIXct(NA, tz = attr(time, 'tz'), origin = lubridate::origin))
else {
} which.max(x)]
time[
}
}
<- list(max = ~ max(., na.rm = TRUE), ~ max_time(., TIMESTAMP_coll))
custom_funs
<- sfn_metrics(
max_time_metrics
ARG_TRE,period = '1 day',
.funs = custom_funs,
solar = TRUE,
interval = 'general'
)#> [1] "Crunching data for ARG_TRE. In large datasets this could take a while"
#> [1] "General data for ARG_TRE"
'sapf']]
max_time_metrics[[#> # A tibble: 14 × 9
#> TIMESTAMP ARG_TRE_Nan…¹ ARG_T…² ARG_T…³ ARG_T…⁴ ARG_TRE_Nan_Jt_1_…⁵
#> <dttm> <dbl> <dbl> <dbl> <dbl> <dttm>
#> 1 2009-11-17 00:00:00 322. 190. 313. 293. 2009-11-17 22:24:58
#> 2 2009-11-18 00:00:00 778. 715. 679. 948. 2009-11-18 13:24:43
#> 3 2009-11-19 00:00:00 1015. 694. 633. 978. 2009-11-19 12:24:26
#> 4 2009-11-20 00:00:00 648. 401. 442. 636. 2009-11-20 13:24:10
#> 5 2009-11-21 00:00:00 664. 406. 539. 633. 2009-11-21 12:23:52
#> 6 2009-11-22 00:00:00 812. 564. 816. 877. 2009-11-22 12:23:34
#> 7 2009-11-23 00:00:00 1085. 676. 935. 1150. 2009-11-23 13:23:15
#> 8 2009-11-24 00:00:00 992. 736. 1115. 1547. 2009-11-24 17:22:56
#> 9 2009-11-25 00:00:00 976. 646. 951. 1027. 2009-11-25 10:22:36
#> 10 2009-11-26 00:00:00 932. 766. 1087. 1384. 2009-11-26 12:22:15
#> 11 2009-11-27 00:00:00 862. 704. 921. 1193. 2009-11-27 16:21:54
#> 12 2009-11-28 00:00:00 845. 763. 1165. 1706. 2009-11-28 11:21:33
#> 13 2009-11-29 00:00:00 714. 747. 701. 1633. 2009-11-29 13:21:11
#> 14 2009-11-30 00:00:00 875. 646. 919. 1853. 2009-11-30 15:20:48
#> # … with 3 more variables: ARG_TRE_Nan_Jt_2_max_time <dttm>,
#> # ARG_TRE_Nan_Jt_3_max_time <dttm>, ARG_TRE_Nan_Jt_4_max_time <dttm>, and
#> # abbreviated variable names ¹ARG_TRE_Nan_Jt_1_max, ²ARG_TRE_Nan_Jt_2_max,
#> # ³ARG_TRE_Nan_Jt_3_max, ⁴ARG_TRE_Nan_Jt_4_max, ⁵ARG_TRE_Nan_Jt_1_max_time
sfn_metrics
allows to perform sub-daily aggregations, by
means of the period
parameter. Sapfluxnet datasets have
sub-daily data usually in the range of 30 minutes to 2 hours. This means
thet data can be aggregated in periods above 2 hours. We can aggregate
to a 3 hours period easily:
<- list(max = ~ max(., na.rm = TRUE))
custom_funs
<- sfn_metrics(
three_hours_agg
ARG_TRE,period = '3 hours',
.funs = custom_funs,
solar = TRUE,
interval = 'general'
)#> [1] "Crunching data for ARG_TRE. In large datasets this could take a while"
#> [1] "General data for ARG_TRE"
'sapf']]
three_hours_agg[[#> # A tibble: 105 × 5
#> TIMESTAMP ARG_TRE_Nan_Jt_1_max ARG_TRE_Nan_Jt_2_max ARG_T…¹ ARG_T…²
#> <dttm> <dbl> <dbl> <dbl> <dbl>
#> 1 2009-11-17 21:00:00 322. 190. 313. 293.
#> 2 2009-11-18 00:00:00 301. 178. 331. 309.
#> 3 2009-11-18 03:00:00 343. 198. 301. 285.
#> 4 2009-11-18 06:00:00 504. 386. 406. 428.
#> 5 2009-11-18 09:00:00 698. 715. 642. 647.
#> 6 2009-11-18 12:00:00 778. 617. 679. 948.
#> 7 2009-11-18 15:00:00 724. 531. 603. 624.
#> 8 2009-11-18 18:00:00 660. 514. 517. 693.
#> 9 2009-11-18 21:00:00 384. 261. 348. 403.
#> 10 2009-11-19 00:00:00 403. 339. 313. 396.
#> # … with 95 more rows, and abbreviated variable names ¹ARG_TRE_Nan_Jt_3_max,
#> # ²ARG_TRE_Nan_Jt_4_max