Empirical Streamflow Predictions at Thompsons Creek
2021-04-07
##note, need development version of bookdown and gt to properly compile this document
## readr imports data
library(readr)
library(tidyr)
## tibbles are advanced fancy dataframes
library(tibble)
## dplyr for data handling and piping functions
library(dplyr)
## ggplot for plots
library(ggplot2)
## stringr to read and manipulate strings
library(stringr)
## here is a function to ensure file paths are correct
library(here)
## units and ggforce facilitate attaching units to data
library(units)
library(ggforce)
## hrbrtheme is optional, I use it to pretty my plots
library(hrbrthemes)
## patchwork and cowplot support arranging multiple ggplots into one plot
library(patchwork)
library(cowplot)
## lubridate provides functions for handling time and date
library(lubridate)
## purrr lets us use map_ functions as an alternative to loops
library(purrr)
## hydroGOF provide goodness of fit metrics (NSE, RMSE, etc.)
library(hydroGOF)
## tsibble and imputeTS will allow some simple time series interpolation
library(tsibble)
library(imputeTS)
## gt
library(gtsummary)
library(flextable)
## nls.multstart fits non-linear least squares using the Levenberg-Marquardt algorithm with multiple starting values.
library(nls.multstart)
##
library(mgcv)
library(gratia)
## apply drainage area ratio
library(dartx)
## set some options
update_geom_font_defaults(font_rc)
units_options(parse = FALSE)
## some custom functions
<- function(...) {
theme_ms theme_ipsum_rc(plot_margin = margin(10,10,10,10),
axis_title_just = "c") +
theme(legend.position = "bottom",
panel.background = element_rect(fill = "transparent",
colour = NA),
panel.border = element_rect(fill = NA,
colour = "grey20"),
plot.title = element_text(size = 11,
face = "plain"),
...)
}
<- function(x, pow) {
exponent abs(x)^pow)*sign(x)
( }
About
This is an exploratory document investigating the feasibility of estimating mean daily streamflows in the Thompsons Creek watershed using empirical methods. In a previous document mean daily streamflows at three sites in the watershed were developed for March 2020 through March 20221 using measured flows, depths, and rating curves. The one-year record of streamflows will be used to fit and validate drainage area ratio, regression, and semi-parametric regression based methods for estimating streamflow using locally available data.
Introduction
Ungaged streamflow estimation
Statistical information transfer methods
Statistical information transfer and empirical regression are two relatively simple methods for estimating streamflows in poorly gaged watersheds using only measured stremaflow values from nearby gaged watersheds. Statistical transfer procedures simply transfer flow duration curves or daily streamflow values from a gaged watershed to the ungaged watershed using assumed relationships between area and runoff. The most common statistical transfer method is the drainage area ratio. With the drainage area ratio, daily streamflows are transferred from one basin to the other by multiplying the area ratio to daily streamflows:
\[\begin{equation} Q_y^t = Q_x^t\bigg(\frac{A_y}{A_x}\bigg)^\phi \tag{1} \end{equation}\]
Where \(Q_y^t\) is streamflow at ungaged basin \(y\) and time \(t\), \(Q_x^t\) is streamflow at gaged basin \(x\) and time \(t\), and \(\frac{A_y}{A_x}\) is the area ratio of the basins. Parameter \(\phi\) is typically equal to one (Asquith, Roussel, and Vrabel 2006). However, Asquith, Roussel, and Vrabel (2006) provides empirically estimated values of \(\phi\) for use in the drainage area ratio when applied in Texas. With an available short term streamflow record available, various stream gages can be assessed for performance using the drainage area ratio method. However, when that short-term streamflow record is available we extend the simple drainage area ratio to a linear regression for streamflow estimation using one or more gaged watersheds:
\[\begin{equation} Q_y = \beta_0 + \beta_n{Q_{xn}} + \varepsilon \tag{2} \end{equation}\]
where \(Q_y\) is the predicted mean daily streamflow at the ungaged or temporarily gaged site, \(\beta_0\) is the intercept, \(Q_{xn}\) are mean daily streamflows at gaged watershed \(n\), \(\beta_n\) is the regression coefficient, and \(varepsilon\) is the residual error term assumed normally distributed around mean zero. A linear regression of this form still acts as a streamflow transfer methods like the drainage area approach but allows for an easy incorporation of additional model terms such as lagged streamflows which might improve predictive performance.
Semi-parametric rainfall-runoff regression
If nearby gaged watersheds are unavailable or are not reflective of the streamflow responses in the ungaged watershed, locally available weather information can be used to empircally estimate streamflow response. A number of empirically based rainfall-runoff routing models are available that account for soil and land-use conditions to predict streamflow response (SIMHYD, IHACRES, and Sacramento rainfall-runoff models are examples). More complex mechanistic models that simulate hydrologic and water quality responses to landuse and precipitation are also available (SWAT and HSPF are two examples). The mechanistic models have a steep requirement for data and ability of the technician developing the model. The routing models and mechanistic models are outside the scope of work for this particular project. Here we focus on using a semi-parametric regression based approach to predict streamflow using locally available weather data. Empirical regression based approaches require a period of measured streamflow and some predictor variables to estimate the runoff response from. Typically, daily rainfall and temperature data are employed to fit a regression model to measured streamflow response:
\[\begin{equation} Q_i = \beta_0 + \beta_1x_i+ \varepsilon_i \tag{3} \end{equation}\]
where \(Q\) is predicted discharge on day \(i\), \(\beta_0\) is the intercept, \(\beta_1\) is the regression coefficient, \(x\) is the predictor variable value on the \(i\)th day. The error term, \(\varepsilon_i\) is assumed normally distributed around mean zero. Using simple linear or multiple linear regression \(Q\) is typically log transformed prior to estimating the regression coefficients. Generalized linear models (GLMs) can instead be used when the data distribution is skewed (more specifically, when the residuals are not expected to be normally distributed). This is accomplished through the inclusion of a link function that describes how the mean of the response depends on the linear predictor and a variance function the describes how variance depends on the mean. Using the GLM and appropriate link function, transformation of the response variable can be avoided. This approach is highly desirable due to biases in the variance structure that are introduced when estimating the means of log transformed data that is intended to be back transformed to the original scale.
If the relationship between predictor variables and the response are expected to be nonlinear, polynomial terms can be included. However, an extension of GLMs called generalized additive models (GAMs) allows relatively easy fitting of these nonlinear terms to the data. With GAMs, the response variable depends on the sum of smoothing functions applied to each predictor variable:
\[\begin{equation} Q_i = \beta_0 + f(x_1) + \varepsilon \tag{4} \end{equation}\]
where \(Q\) is predicted discharge on day \(i\), \(\beta_0\) is the intercept, and some function \(f(x_1)\) is the linear predictor. In the case of GAMs fit using the mgcv
package in R, \(f\) is a smoothing function fit to the data using generalized cross validation or restricted maximum likelihood. Due to the smoothing function and link functions, GAMs are extremely flexible for fitting regression models to data of different distributions and responses. However, compared to linear regression and GLMs the inclusion of the smoothing functions limits interpretability because traditional regression coefficients are not part of the model structure. Therefore, the effect of each individual smoothing function on the response variable mean is shown graphically.
Method
Data
Two National Oceanic and Atmospheric Administration (NOAA) Global Historical Climatology Network (GHCN) locations provide daily precipitation data for the project area (Figure 1; Figure 2). GHCND daily summaries were downloaded for GHCND:USW00003904 (Easterwood Airport) using the NOAA API services and the rnoaa
package in R (Chamberlain 2019).
<- read_csv("Data/noaa_precip/easterwood_precip.csv",
easterwood_precip col_types = cols(
date = col_datetime(format = ""),
datatype = col_character(),
station = col_character(),
value = col_double(),
fl_m = col_character(),
fl_q = col_character(),
fl_so = col_character(),
fl_t = col_character(),
units = col_character())) %>%
mutate(value = set_units(value/10, "mm")) %>%
select(date, datatype, station, value)
%>%
easterwood_precip mutate(date = as.Date(date),
value = as.numeric(value)) %>%
ggplot() +
geom_line(aes(date, value)) +
geom_point(aes(date, value), alpha = 0) +
labs(y = "Daily precipitation (mm)",
x = "Date") +
facet_wrap(~station, ncol = 1) +
theme_ms(plot.margin = margin(10,0,10,10),
panel.spacing = unit(0, "lines"),
panel.spacing.x = unit(0, "lines"),
panel.spacing.y = unit(2, "lines")) -> p1
%>%
easterwood_precip mutate(date = as.Date(date),
value = as.numeric(value)) %>%
ggplot() +
geom_histogram(aes(value), binwidth = 5) +
facet_wrap(~station, ncol = 1) +
coord_flip() +
theme_ms() +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
panel.background = element_blank(),
panel.border = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
plot.background = element_blank(),
plot.margin = margin(10,10,10,0),
panel.spacing = unit(0, "lines"),
panel.spacing.x = unit(0, "lines"),
panel.spacing.y = unit(2, "lines"),
strip.text = element_text(color="transparent")) -> p2
plot_grid(p1,p2,align = "h", axis = "bt", rel_widths = c(2, 1))

Figure 1: Daily precipitation and marginal histograms of daily precipitation.
<- read_csv("Data/noaa_precip/easterwood_tmax.csv",
easterwood_tmax col_types = cols(
date = col_datetime(format = ""),
datatype = col_character(),
station = col_character(),
value = col_double(),
fl_m = col_character(),
fl_q = col_character(),
fl_so = col_character(),
fl_t = col_character(),
units = col_character())) %>%
mutate(value = set_units(value/10, "°C")) %>%
select(date, datatype, station, value)
%>%
easterwood_tmax mutate(date = as.Date(date),
value = as.numeric(value)) %>%
ggplot() +
geom_step(aes(date, value), alpha = 0.5) +
labs(y = "Daily Maximum Temperature [°C]",
x = "Date") +
theme_ms(plot.margin = margin(10,0,10,10),
panel.spacing = unit(0, "lines"),
panel.spacing.x = unit(0, "lines"),
panel.spacing.y = unit(2, "lines")) -> p1
%>%
easterwood_tmax mutate(date = as.Date(date),
value = as.numeric(value)) %>%
ggplot() +
geom_density(aes(value),
binwidth = 1.1,
alpha = 0.5) +
coord_flip() +
theme_ms() +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
panel.background = element_blank(),
panel.border = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
plot.background = element_blank(),
plot.margin = margin(10,10,10,0),
panel.spacing = unit(0, "lines"),
panel.spacing.x = unit(0, "lines"),
panel.spacing.y = unit(2, "lines"),
strip.text = element_text(color="transparent")) -> p2
plot_grid(p1,p2,align = "h", axis = "bt", rel_widths = c(2, 1))

Figure 2: Daily maximum temperature and marginal histograms of daily maximum temperature.
We identified two wastewater treatment facilities located upstream of the lowest discharge point. The Still Creek WWTF (TPDES Permit No. WQ0010426002) is permitted to discharge 4.0 MGD to Still Creek and discharge flows past SWQM sites 16882 and 16396. The Sanderson Farm Inc. facility (TPDES Permit No. WQ0003821000) is permitted to discharge 1.678 MGD to a tributary of Cottonwood Branch and the discharge flows past SWQM site 16396. Mean daily wastewater facility discharges were downloaded from the EPA ECHO database (Figure 3) using the echor
R package and subtracted from the measured mean daily streamflows to better represent naturalized mean daily flows (Schramm 2020).
<- read_csv("Data/EPA_WWTF/mean_daily_discharges.csv",
wwtf col_types = cols(
npdes_id = col_character(),
date = col_date(format = ""),
mgd = col_double(),
cfs = col_double()))
%>%
wwtf ggplot() +
geom_step(aes(date, cfs, color = npdes_id)) +
scale_x_date(date_breaks = "month",
date_labels = "%b-%y") +
scale_color_discrete(labels = c("WQ0010426002, Still Creek WWTF",
"WQ0003821000, Sanderson Farm Inc." )) +
labs(x = "Date", y = "Mean Daily Discharge [cfs]") +
theme_ms() +
guides(x = guide_axis(angle = 90)) +
theme(axis.line.x = element_blank(),
axis.ticks.y = element_line(color = "black"),
axis.ticks.x = element_line(color = "black"),
axis.ticks.length = grid::unit(5, "pt"),
legend.title = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.minor.y = element_blank())

Figure 3: Mean daily WWTF discharges
Daily flow estimation
Mean daily streamflows from 2020-03-03 through 2021-01-14 were estimated using rating curves at SWQM stations 16396, 16397, and 16882 (Table 1). Mean daily wastewater discharges were subtracted from the flow record to represent naturalized flows. Mean daily temperature and total daily precipitation were joined to the data record by date. Figure 4 shows the hydrograph of naturalized flows and precipitation at each SWQM station.
Daily naturalized flow was estimated using the drainage area ratio equation (Equation (1)) at USGS stream gages 08065800, 08109800, and 08110100 (Table 1). Values of recommended in Asquith, Roussel, and Vrabel (2006) were used. Linear regression was also used to estimate log transformed flows at each SWQM station using mean daily flows and 1-day lagged mean daily flows from each USGS stream gage as a predictor variable.
<- tibble(Site = c("SWQM-16396", "SWQM-16397", "SWQM-16882", "USGS-08065800", "USGS-08109800", "USGS-08110100"),
dar_table Description = c("Thompsons Creek at Silver Hill Rd",
"Thompsons Creek at Hwy 21",
"Still Creek at Hwy 21",
"Bedias Creek near Madisonville",
"East Yegua Creek near Dime Box",
"Davidson Creek near Lyons"),
Area = c(42.33,24.21,10.03,321,244,195))
kable(dar_table,
caption = "TCEQ SWQM stations and USGS stream gages used to develop flows with drainage area ratio and linear regression methods.")
Site | Description | Area |
---|---|---|
SWQM-16396 | Thompsons Creek at Silver Hill Rd | 42.33 |
SWQM-16397 | Thompsons Creek at Hwy 21 | 24.21 |
SWQM-16882 | Still Creek at Hwy 21 | 10.03 |
USGS-08065800 | Bedias Creek near Madisonville | 321.00 |
USGS-08109800 | East Yegua Creek near Dime Box | 244.00 |
USGS-08110100 | Davidson Creek near Lyons | 195.00 |
##naturalize streamflows
<- read_csv("Data/daily_streamflow/model_df.csv",
df col_types = cols(
Site = col_character(),
date = col_date(format = ""),
mean_daily = col_double()
))
%>%
df bind_rows(tibble(Site = c(rep("16396",5),rep("16397",5),rep("16882",5)),
date = rep(seq.Date(as.Date("2020-02-27"), as.Date("2020-03-02"), by = "day"),3),
mean_daily = NA)) %>%
arrange(date) %>%
left_join(easterwood_precip, by = c("date" = "date")) %>%
mutate(value = as.numeric(value)) %>%
::rename(ewood_precip = value) %>%
dplyrleft_join(easterwood_tmax, by = c("date" = "date")) %>%
mutate(value = as.numeric(value)) %>%
::rename(ewood_tmax = value) %>%
dplyr::select(Site, date, mean_daily, ewood_precip, ewood_tmax) %>%
dplyrleft_join(wwtf %>% pivot_wider(id_cols = date, names_from = npdes_id, values_from = cfs),
by = c("date" = "date")) %>%
## remove WWTF influence from discharge record
mutate(adjusted_flow = case_when(
== 16396 ~ mean_daily - TX0025071 - TX0113603,
Site == 16882 ~ mean_daily - TX0025071,
Site == 16397 ~ mean_daily)) %>%
Site mutate(adjusted_flow = case_when(
< 0 ~ 0,
adjusted_flow >= 0 ~ adjusted_flow)) %>%
adjusted_flow mutate(doy = lubridate::yday(date))-> df
## plot hydrograph of Thompsons @ Silver Hill
<- ggplot(df %>% filter(Site == "16396") %>% mutate(date = as.Date(date))) +
p1 geom_line(aes(date, adjusted_flow, color = "Mean Daily Streamflow")) +
scale_y_continuous(position = "left",
limits = c(0,600),
expand = c(0,0)) +
scale_x_date(date_breaks = "month",
date_labels = "%b-%y") +
labs(y = "Mean daily streamflow [cfs]",
x = "Date",
caption = "16396, Thompsons Creek at Silver Hill Rd") +
scale_color_manual(values = c("dodgerblue")) +
theme_ms() +
guides(x = guide_axis(angle = 90)) +
theme(axis.text.x = element_text(size = 8),
axis.title.y.left = element_text(hjust = 0),
axis.ticks.y.left = element_line(color = "black"),
axis.ticks.x = element_line(color = "black"),
axis.ticks.length = grid::unit(5, "pt"),
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.minor.y = element_blank(),
legend.position = "none")
<- ggplot(df %>% filter(Site == "16396")) +
p2 geom_line(aes(date, ewood_precip, color = "Total Daily Preciptitation")) +
scale_y_reverse(position = "right",
limits = c(350,0),
breaks = c(0,25,50),
labels = c(0,25,50),
expand = c(0,0)) +
labs(y = "Total Daily Precipitaiton [mm]") +
scale_color_manual(values = c("dodgerblue4")) +
theme_ms() +
guides(x = guide_axis(angle = 90)) +
theme(axis.line.x = element_blank(),
axis.text.x = element_blank(),
axis.title.y.right = element_text(hjust = 1),
axis.title.x = element_blank(),
axis.ticks.y.right = element_line(color = "black"),
axis.ticks.length = grid::unit(5, "pt"),
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.minor.y = element_blank(),
legend.position = "none"
) set_null_device("agg")
<- align_plots(p1, p2, align="hv", axis="tblr")
aligned_plots set_null_device("agg")
<- ggdraw(aligned_plots[[1]]) + draw_plot(aligned_plots[[2]])
hg_plot1
## plot hydrograph of Thompsons @ Hwy21
<- ggplot(df %>% filter(Site == "16397") %>% mutate(date = as.Date(date))) +
p1 geom_line(aes(date, adjusted_flow, color = "Mean Daily Streamflow")) +
scale_y_continuous(position = "left",
limits = c(0,75),
expand = c(0,0)) +
scale_x_date(date_breaks = "month",
date_labels = "%b-%y") +
labs(y = "Mean daily streamflow [cfs]",
x = "Date",
caption = "16397, Thompsons Creek at Hwy 21") +
scale_color_manual(values = c("dodgerblue")) +
theme_ms() +
guides(x = guide_axis(angle = 90)) +
theme(axis.text.x = element_text(size = 8),
axis.title.y.left = element_text(hjust = 0),
axis.ticks.y.left = element_line(color = "black"),
axis.ticks.x = element_line(color = "black"),
axis.ticks.length = grid::unit(5, "pt"),
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.minor.y = element_blank(),
legend.position = "none")
<- ggplot(df %>% filter(Site == "16397")) +
p2 geom_line(aes(date, ewood_precip, color = "Total Daily Preciptitation")) +
scale_y_reverse(position = "right",
limits = c(350,0),
breaks = c(0,25,50),
labels = c(0,25,50),
expand = c(0,0)) +
labs(y = "Total Daily Precipitaiton [mm]") +
scale_color_manual(values = c("dodgerblue4")) +
theme_ms() +
guides(x = guide_axis(angle = 90)) +
theme(axis.line.x = element_blank(),
axis.text.x = element_blank(),
axis.title.y.right = element_text(hjust = 1),
axis.title.x = element_blank(),
axis.ticks.y.right = element_line(color = "black"),
axis.ticks.length = grid::unit(5, "pt"),
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.minor.y = element_blank(),
legend.position = "none"
) set_null_device("agg")
<- align_plots(p1, p2, align="hv", axis="tblr")
aligned_plots set_null_device("agg")
<- ggdraw(aligned_plots[[1]]) + draw_plot(aligned_plots[[2]])
hg_plot2
## plot hydrograph of Sill Creek @ Hwy21
<- ggplot(df %>% filter(Site == "16882") %>% mutate(date = as.Date(date))) +
p1 geom_line(aes(date, adjusted_flow, color = "Mean Daily Streamflow")) +
scale_y_continuous(position = "left",
limits = c(0,75),
expand = c(0,0)) +
scale_x_date(date_breaks = "month",
date_labels = "%b-%y") +
labs(y = "Mean daily streamflow [cfs]",
x = "Date",
caption = "16882, Sill Creek at Hwy 21") +
scale_color_manual(values = c("dodgerblue")) +
theme_ms() +
guides(x = guide_axis(angle = 90)) +
theme(axis.text.x = element_text(size = 8),
axis.title.y.left = element_text(hjust = 0),
axis.ticks.y.left = element_line(color = "black"),
axis.ticks.x = element_line(color = "black"),
axis.ticks.length = grid::unit(5, "pt"),
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.minor.y = element_blank(),
legend.position = "none")
<- ggplot(df %>% filter(Site == "16882")) +
p2 geom_line(aes(date, ewood_precip, color = "Total Daily Preciptitation")) +
scale_y_reverse(position = "right",
limits = c(350,0),
breaks = c(0,25,50),
labels = c(0,25,50),
expand = c(0,0)) +
labs(y = "Total Daily Precipitaiton [mm]") +
scale_color_manual(values = c("dodgerblue4")) +
theme_ms() +
guides(x = guide_axis(angle = 90)) +
theme(axis.line.x = element_blank(),
axis.text.x = element_blank(),
axis.title.y.right = element_text(hjust = 1),
axis.title.x = element_blank(),
axis.ticks.y.right = element_line(color = "black"),
axis.ticks.length = grid::unit(5, "pt"),
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.minor.y = element_blank(),
legend.position = "none"
) set_null_device("agg")
<- align_plots(p1, p2, align="hv", axis="tblr")
aligned_plots set_null_device("agg")
<- ggdraw(aligned_plots[[1]]) + draw_plot(aligned_plots[[2]])
hg_plot3
/ hg_plot2 / hg_plot3 hg_plot1

Figure 4: Hydrograph of mean daily streamflows at SWQM 16396, 16397, and 16882 with reported total daily precipitation at Easterwood Airport.
GLMs and GAMs relied on local precipitation and temperature data to estimate streamflows at each SWQM station (Figure 1; Figure 2). The GLM was of form:
\[\begin{multline} Q = P + T + DOY + P_{sum,3} + T_{mean,5} + (P\times T) + (P \times P_{sum,3})\\ + (P \times T_{mean,5}) + (T \times P_{sum,3})\\ + (T \times T_{mean,5}) + (P_{sum,3} \times T_{mean,5}) \tag{5} \end{multline}\]
where \(P\) is total daily precipitation and \(T\) is mean daily temperature, and are assumed to be the main forcing variables influencing streamflow. \(DOY\) is day of the year and included as a seasonal predictor. \(P_{sum,3}\) is the rolling 3-day sum rainfall and included as an indicator of wetness in the watershed. \(T_{mean,5}\) is the rolling 5-day mean temperature and included as an indicator of seasonal temperature condition that is less daily variance than \(T\) and maybe an improved indicator of potential evapotransportation conditions. The GLM error structure was fit with a scaled t distribution and a log link function.
The GAM included the same variables but with smoothing functions:
\[\begin{multline} Q = f(P) + f(T) + f(DOY) + f(P_{sum,3}) + f(T_{mean,5})\\ + f(P,T) + f(P, P_{sum,3}) + f(P,T_{mean,5})\\ + f(T,P_{sum,3}) + (T,T_{mean,5}) + f(P_{sum,3},T_{mean,5}) \tag{6} \end{multline}\]
where \(f()\) is the smoothing function. Where a single covariate is smoothed, a thin plate regression spline was fit to the data using restricted maximum likelihood to estimated the automatically select the optimal smoothing parameters. For smoothing functions applied to two parameters, a tensor product smooth function was applied. The GAM error structure was fit using the scaled t distribution and a log link function.
Performance of predicted streamflows over the period of record for each method was assessed using Nash-Sutcliffe Efficiency (NSE) and Kling-Gupta Efficiency (KGE) goodness-of-fit metrics.
Results
## create a model df for 16396
## additional variables =
## lagPrecip = 1 day lag precipitation
## wetness index = 3 day total precip
## ET index = 5 day avg tmax
%>%
df ::filter(Site == "16396") %>%
dplyrarrange(date) -> df_16396
%>%
df_16396 mutate(lagPrecip = lag(ewood_precip),
wetness = map(row_number(.$date),
~{if(.x - 1 <= 0) {df_16396$ewood_precip[.x]}
if(.x - 2 <= 0) {sum(df_16396$ewood_precip[.x],
$ewood_precip[.x-1])}
df_16396if(.x - 2 > 0) {
sum(
$ewood_precip[.x],
df_16396$ewood_precip[.x-1],
df_16396$ewood_precip[.x-2],
df_16396na.rm = TRUE
)
}}),et = map(row_number(.$date),
~{if(.x - 1 <= 0) {df_16396$ewood_tmax[.x]}
if(.x - 2 <= 0) {df_16396$ewood_tmax[.x-1]}
if(.x - 3 <= 0) { mean(c(df_16396$ewood_tmax[.x-1],
$ewood_tmax[.x-2]),
df_16396na.rm = TRUE)}
if(.x - 4 <= 0) { mean(c(df_16396$ewood_tmax[.x-1],
$ewood_tmax[.x-2],
df_16396$ewood_tmax[.x-3]),
df_16396na.rm = TRUE)}
if(.x - 5 <= 0) { mean(c(df_16396$ewood_tmax[.x-1],
$ewood_tmax[.x-2],
df_16396$ewood_tmax[.x-3],
df_16396$ewood_tmax[.x-4]),
df_16396na.rm = TRUE)}
mean(c(df_16396$ewood_tmax[.x-1],
$ewood_tmax[.x-2],
df_16396$ewood_tmax[.x-3],
df_16396$ewood_tmax[.x-4],
df_16396$ewood_tmax[.x-5]),
df_16396na.rm = TRUE)})) %>%
unnest(c(wetness, et)) %>%
::filter(!is.na(mean_daily)) -> df_16396
dplyr
## create a model df for 16397
## additional variable =
## wetness index = 3 day total precip
## ET index = 5 day avg tmax
%>%
df ::filter(Site == "16397") %>%
dplyrarrange(date) -> df_16397
%>%
df_16397 mutate(lagPrecip = lag(ewood_precip),
wetness = map(row_number(.$date),
~{if(.x - 1 <= 0) {df_16397$ewood_precip[.x]}
if(.x - 2 <= 0) {sum(df_16397$ewood_precip[.x],
$ewood_precip[.x-1])}
df_16397if(.x - 2 > 0) {
sum(
$ewood_precip[.x],
df_16397$ewood_precip[.x-1],
df_16397$ewood_precip[.x-2],
df_16397na.rm = TRUE
)
}}),et = map(row_number(.$date),
~{if(.x - 1 <= 0) {df_16397$ewood_tmax[.x]}
if(.x - 2 <= 0) {df_16397$ewood_tmax[.x-1]}
if(.x - 3 <= 0) { mean(c(df_16397$ewood_tmax[.x-1],
$ewood_tmax[.x-2]),
df_16397na.rm = TRUE)}
if(.x - 4 <= 0) { mean(c(df_16397$ewood_tmax[.x-1],
$ewood_tmax[.x-2],
df_16397$ewood_tmax[.x-3]),
df_16397na.rm = TRUE)}
if(.x - 5 <= 0) { mean(c(df_16397$ewood_tmax[.x-1],
$ewood_tmax[.x-2],
df_16397$ewood_tmax[.x-3],
df_16397$ewood_tmax[.x-4]),
df_16397na.rm = TRUE)}
mean(c(df_16397$ewood_tmax[.x-1],
$ewood_tmax[.x-2],
df_16397$ewood_tmax[.x-3],
df_16397$ewood_tmax[.x-4],
df_16397$ewood_tmax[.x-5]),
df_16397na.rm = TRUE)})) %>%
unnest(c(wetness, et)) %>%
::filter(!is.na(mean_daily)) -> df_16397
dplyr
## create a model df for 16882
## additional variable =
## wetness index = 3 day total precip
## ET index = 5 day avg tmax
%>%
df ::filter(Site == "16882") %>%
dplyrarrange(date) -> df_16882
%>%
df_16882 mutate(lagPrecip = lag(ewood_precip),
wetness = map(row_number(.$date),
~{if(.x - 1 <= 0) {df_16882$ewood_precip[.x]}
if(.x - 2 <= 0) {sum(df_16882$ewood_precip[.x],
$ewood_precip[.x-1])}
df_16882if(.x - 2 > 0) {
sum(
$ewood_precip[.x],
df_16882$ewood_precip[.x-1],
df_16882$ewood_precip[.x-2],
df_16882na.rm = TRUE
)
}}),et = map(row_number(.$date),
~{if(.x - 1 <= 0) {df_16882$ewood_tmax[.x]}
if(.x - 2 <= 0) {df_16882$ewood_tmax[.x-1]}
if(.x - 3 <= 0) { mean(c(df_16882$ewood_tmax[.x-1],
$ewood_tmax[.x-2]),
df_16882na.rm = TRUE)}
if(.x - 4 <= 0) { mean(c(df_16882$ewood_tmax[.x-1],
$ewood_tmax[.x-2],
df_16882$ewood_tmax[.x-3]),
df_16882na.rm = TRUE)}
if(.x - 5 <= 0) { mean(c(df_16882$ewood_tmax[.x-1],
$ewood_tmax[.x-2],
df_16882$ewood_tmax[.x-3],
df_16882$ewood_tmax[.x-4]),
df_16882na.rm = TRUE)}
mean(c(df_16882$ewood_tmax[.x-1],
$ewood_tmax[.x-2],
df_16882$ewood_tmax[.x-3],
df_16882$ewood_tmax[.x-4],
df_16882$ewood_tmax[.x-5]),
df_16882na.rm = TRUE)})) %>%
unnest(c(wetness, et)) %>%
::filter(!is.na(mean_daily)) %>%
dplyrmutate(non_zero = case_when(
== 0 ~ 0,
adjusted_flow > 0 ~ 1
adjusted_flow -> df_16882 ))
Statistical Information Transfer
DAR
Mean daily streamflows estimated using DAR applied to three different USGS streamgages are displayed in Figure @ref[fig:dar16396results]. All three USGS gages results in biased results at low streamflows (consistently underpredicted streamflows). The hydrographs indicate the timing and magnitude of stormflow events are routinely mismatched using any of the three USGS gages. This visual validation suggests DAR method is not suitable in the Thompsons Creek watershed.
## apply DAR to 16396 using each select USGS gage
<- readr::read_csv("Data/USGS_Streamflow/08065800.csv") %>%
usgs_08065800 ::rename(Flow_08065800 = "133946_00060_00003") %>%
dplyr::select(datetime, Flow_08065800)
dplyr
<- readr::read_csv("Data/USGS_Streamflow/08109800.csv") %>%
usgs_08109800 ::rename(Flow_08109800 = `135356_00060_00003`) %>%
dplyr::select(datetime, Flow_08109800)
dplyr
<- readr::read_csv("Data/USGS_Streamflow/08110100.csv") %>%
usgs_08110100 ::rename(Flow_08110100 = `135389_00060_00003`) %>%
dplyr::select(datetime, Flow_08110100)
dplyr
%>%
df_16396 ::select(Site, date, adjusted_flow) %>%
dplyrleft_join(usgs_08065800, by = c("date" = "datetime")) %>%
dartx(Flow_08065800, 42.33/321) %>%
::rename(DAR_Q_08065800 = Q,
dplyrQ_percentile_08065800 = Q_percentile,
exp_08065800 = exp) %>%
left_join(usgs_08109800, by = c("date" = "datetime")) %>%
dartx(Flow_08109800, 42.33/244) %>%
::rename(DAR_Q_08109800 = Q,
dplyrQ_percentile_08109800 = Q_percentile,
exp_08109800 = exp) %>%
left_join(usgs_08110100, by = c("date" = "datetime")) %>%
dartx(Flow_08110100, 42.33/195) %>%
::rename(DAR_Q_08110100 = Q,
dplyrQ_percentile_08110100 = Q_percentile,
exp_08110100 = exp) -> dar_results_16396
%>%
df_16882 ::select(Site, date, adjusted_flow) %>%
dplyrleft_join(usgs_08065800, by = c("date" = "datetime")) %>%
dartx(Flow_08065800, 24.21/321) %>%
::rename(DAR_Q_08065800 = Q,
dplyrQ_percentile_08065800 = Q_percentile,
exp_08065800 = exp) %>%
left_join(usgs_08109800, by = c("date" = "datetime")) %>%
dartx(Flow_08109800, 24.21/244) %>%
::rename(DAR_Q_08109800 = Q,
dplyrQ_percentile_08109800 = Q_percentile,
exp_08109800 = exp) %>%
left_join(usgs_08110100, by = c("date" = "datetime")) %>%
dartx(Flow_08110100, 24.21/195) %>%
::rename(DAR_Q_08110100 = Q,
dplyrQ_percentile_08110100 = Q_percentile,
exp_08110100 = exp) -> dar_results_16882
%>%
df_16397 ::select(Site, date, adjusted_flow) %>%
dplyrleft_join(usgs_08065800, by = c("date" = "datetime")) %>%
dartx(Flow_08065800, 10.03/321) %>%
::rename(DAR_Q_08065800 = Q,
dplyrQ_percentile_08065800 = Q_percentile,
exp_08065800 = exp) %>%
left_join(usgs_08109800, by = c("date" = "datetime")) %>%
dartx(Flow_08109800, 10.03/244) %>%
::rename(DAR_Q_08109800 = Q,
dplyrQ_percentile_08109800 = Q_percentile,
exp_08109800 = exp) %>%
left_join(usgs_08110100, by = c("date" = "datetime")) %>%
dartx(Flow_08110100, 10.03/195) %>%
::rename(DAR_Q_08110100 = Q,
dplyrQ_percentile_08110100 = Q_percentile,
exp_08110100 = exp) -> dar_results_16397
%>%
dar_results_16396 ::select(Site, date, adjusted_flow, DAR_Q_08065800, DAR_Q_08109800, DAR_Q_08110100) %>%
dplyrpivot_longer(c(DAR_Q_08065800, DAR_Q_08109800, DAR_Q_08110100),
names_to = "Source_Site",
values_to = "Estimated_Flow") %>%
mutate(Source_Site = case_when(
== "DAR_Q_08065800" ~ "Estimated flow using USGS-08065800",
Source_Site == "DAR_Q_08109800" ~ "Estimated flow using USGS-08109800",
Source_Site == "DAR_Q_08110100" ~ "Estimated flow using USGS-08110100"
Source_Site %>%
)) ggplot() +
geom_point(aes(Estimated_Flow, adjusted_flow, color = "DAR Estimates against Naturalized Flow"), alpha = 0.3) +
geom_smooth(aes(Estimated_Flow, adjusted_flow, color = "DAR Estimates against Naturalized Flow"),
method = "lm", se = FALSE, alpha = 0.3) +
geom_abline(aes(linetype = "1:1 Line", intercept = 0, slope = 1)) +
facet_wrap(~Source_Site, scales = "free", ncol = 1) +
scale_x_continuous(trans = "pseudo_log") + scale_y_continuous(trans = "pseudo_log") +
labs(x = "DAR Estimated Flow [cfs]", y = "Naturalized Flow [cfs]") +
guides(color = guide_legend(""), linetype = guide_legend("")) +
theme_ms() +
theme(legend.margin = margin(0,0,0,0),
legend.box = "vertical",
legend.box.margin = margin(0,0,0,0),
legend.direction = "vertical") -> p1
%>%
dar_results_16396 ::select(Site, date, DAR_Q_08065800, DAR_Q_08109800, DAR_Q_08110100) %>%
dplyrpivot_longer(c(DAR_Q_08065800, DAR_Q_08109800, DAR_Q_08110100),
names_to = "Source_Site",
values_to = "Estimated_Flow") %>%
mutate(Source_Site = case_when(
== "DAR_Q_08065800" ~ "Estimated flow using USGS-08065800",
Source_Site == "DAR_Q_08109800" ~ "Estimated flow using USGS-08109800",
Source_Site == "DAR_Q_08110100" ~ "Estimated flow using USGS-08110100"
Source_Site %>%
)) ggplot() +
geom_line(aes(date, Estimated_Flow, color = "DAR Estimated Flow")) +
geom_line(data = dar_results_16396, aes(date, adjusted_flow, color = "Naturalized Flow SWQM 16396"), alpha = 0.4) +
facet_wrap(~Source_Site, scales = "free_y", ncol = 1) +
labs(x = "Date", y = "Mean Daily Flow [cfs]") +
theme_ms() +
theme(legend.title = element_blank(),
legend.direction = "vertical") -> p2
+ p2 p1
## `geom_smooth()` using formula 'y ~ x'

Figure 5: Estimated streamflows at SWQM 16396 using DAR at three USGS gages plotted against measured streamflows and over time.
%>%
dar_results_16397 ::select(Site, date, adjusted_flow, DAR_Q_08065800, DAR_Q_08109800, DAR_Q_08110100) %>%
dplyrpivot_longer(c(DAR_Q_08065800, DAR_Q_08109800, DAR_Q_08110100),
names_to = "Source_Site",
values_to = "Estimated_Flow") %>%
mutate(Source_Site = case_when(
== "DAR_Q_08065800" ~ "Estimated flow using USGS-08065800",
Source_Site == "DAR_Q_08109800" ~ "Estimated flow using USGS-08109800",
Source_Site == "DAR_Q_08110100" ~ "Estimated flow using USGS-08110100"
Source_Site %>%
)) ggplot() +
geom_point(aes(Estimated_Flow, adjusted_flow, color = "DAR Estimates against Naturalized Flow"), alpha = 0.3) +
geom_smooth(aes(Estimated_Flow, adjusted_flow, color = "DAR Estimates against Naturalized Flow"),
method = "lm", se = FALSE, alpha = 0.3) +
geom_abline(aes(linetype = "1:1 Line", intercept = 0, slope = 1)) +
facet_wrap(~Source_Site, scales = "free", ncol = 1) +
scale_x_continuous(trans = "pseudo_log") + scale_y_continuous(trans = "pseudo_log") +
labs(x = "DAR Estimated Flow [cfs]", y = "Naturalized Flow [cfs]") +
guides(color = guide_legend(""), linetype = guide_legend("")) +
theme_ms() +
theme(legend.margin = margin(0,0,0,0),
legend.box = "vertical",
legend.box.margin = margin(0,0,0,0),
legend.direction = "vertical") -> p1
%>%
dar_results_16397 ::select(Site, date, DAR_Q_08065800, DAR_Q_08109800, DAR_Q_08110100) %>%
dplyrpivot_longer(c(DAR_Q_08065800, DAR_Q_08109800, DAR_Q_08110100),
names_to = "Source_Site",
values_to = "Estimated_Flow") %>%
mutate(Source_Site = case_when(
== "DAR_Q_08065800" ~ "Estimated flow using USGS-08065800",
Source_Site == "DAR_Q_08109800" ~ "Estimated flow using USGS-08109800",
Source_Site == "DAR_Q_08110100" ~ "Estimated flow using USGS-08110100"
Source_Site %>%
)) ggplot() +
geom_line(aes(date, Estimated_Flow, color = "DAR Estimated Flow")) +
geom_line(data = dar_results_16397, aes(date, adjusted_flow, color = "Naturalized Flow SWQM 16397"), alpha = 0.4) +
facet_wrap(~Source_Site, scales = "free_y", ncol = 1) +
labs(x = "Date", y = "Mean Daily Flow [cfs]") +
theme_ms() +
theme(legend.title = element_blank(),
legend.direction = "vertical") -> p2
+ p2 p1
## `geom_smooth()` using formula 'y ~ x'

Figure 6: Estimated streamflows at SWQM 16397 using DAR at three USGS gages plotted against measured streamflows and over time.
%>%
dar_results_16882 ::select(Site, date, adjusted_flow, DAR_Q_08065800, DAR_Q_08109800, DAR_Q_08110100) %>%
dplyrpivot_longer(c(DAR_Q_08065800, DAR_Q_08109800, DAR_Q_08110100),
names_to = "Source_Site",
values_to = "Estimated_Flow") %>%
mutate(Source_Site = case_when(
== "DAR_Q_08065800" ~ "Estimated flow using USGS-08065800",
Source_Site == "DAR_Q_08109800" ~ "Estimated flow using USGS-08109800",
Source_Site == "DAR_Q_08110100" ~ "Estimated flow using USGS-08110100"
Source_Site %>%
)) ggplot() +
geom_point(aes(Estimated_Flow, adjusted_flow, color = "DAR Estimates against Naturalized Flow"), alpha = 0.3) +
geom_smooth(aes(Estimated_Flow, adjusted_flow, color = "DAR Estimates against Naturalized Flow"),
method = "lm", se = FALSE, alpha = 0.3) +
geom_abline(aes(linetype = "1:1 Line", intercept = 0, slope = 1)) +
facet_wrap(~Source_Site, scales = "free", ncol = 1) +
scale_x_continuous(trans = "pseudo_log") + scale_y_continuous(trans = "pseudo_log") +
labs(x = "DAR Estimated Flow [cfs]", y = "Naturalized Flow [cfs]") +
guides(color = guide_legend(""), linetype = guide_legend("")) +
theme_ms() +
theme(legend.margin = margin(0,0,0,0),
legend.box = "vertical",
legend.box.margin = margin(0,0,0,0),
legend.direction = "vertical") -> p1
%>%
dar_results_16882 ::select(Site, date, DAR_Q_08065800, DAR_Q_08109800, DAR_Q_08110100) %>%
dplyrpivot_longer(c(DAR_Q_08065800, DAR_Q_08109800, DAR_Q_08110100),
names_to = "Source_Site",
values_to = "Estimated_Flow") %>%
mutate(Source_Site = case_when(
== "DAR_Q_08065800" ~ "Estimated flow using USGS-08065800",
Source_Site == "DAR_Q_08109800" ~ "Estimated flow using USGS-08109800",
Source_Site == "DAR_Q_08110100" ~ "Estimated flow using USGS-08110100"
Source_Site %>%
)) ggplot() +
geom_line(aes(date, Estimated_Flow, color = "DAR Estimated Flow")) +
geom_line(data = dar_results_16882, aes(date, adjusted_flow, color = "Naturalized Flow SWQM 16882"), alpha = 0.4) +
facet_wrap(~Source_Site, scales = "free_y", ncol = 1) +
labs(x = "Date", y = "Mean Daily Flow [cfs]") +
theme_ms() +
theme(legend.title = element_blank(),
legend.direction = "vertical") -> p2
+ p2 p1
## `geom_smooth()` using formula 'y ~ x'

Figure 7: Estimated streamflows at SWQM 16882 using DAR at three USGS gages plotted against measured streamflows and over time.
Linear Regression
%>%
df_16396 ::select(Site, date, adjusted_flow) %>%
dplyrleft_join(usgs_08065800, by = c("date" = "datetime")) %>%
left_join(usgs_08109800, by = c("date" = "datetime")) %>%
left_join(usgs_08110100, by = c("date" = "datetime")) %>%
mutate(lag_Flow_08065800 = lag(Flow_08065800),
lag_Flow_08109800 = lag(Flow_08109800),
lag_Flow_08110100 = lag(Flow_08110100),
log_Q = log1p(adjusted_flow)) -> df_16396_lm
<- lm(log_Q ~ Flow_08065800 +
m1.lm +
Flow_08109800 +
Flow_08110100 +
lag_Flow_08065800 +
lag_Flow_08109800
lag_Flow_08110100,data = df_16396_lm)
::as_flextable(m1.lm) %>%
flextableset_caption("Summary of linear regression coefficients at SWQM 16396")
Estimate | Standard Error | t value | Pr(>|t|) | ||
(Intercept) | 1.83e+00 | 4.43e-02 | 41.1913 | 2.052e-126 | *** |
Flow_08065800 | 1.79e-04 | 9.58e-05 | 1.8656 | 6.305e-02 | . |
Flow_08109800 | 1.69e-02 | 2.25e-03 | 7.5366 | 5.578e-13 | *** |
Flow_08110100 | 5.16e-03 | 8.31e-04 | 6.2130 | 1.708e-09 | *** |
lag_Flow_08065800 | 7.85e-06 | 9.50e-05 | 0.0827 | 9.342e-01 | |
lag_Flow_08109800 | -4.97e-03 | 2.23e-03 | -2.2311 | 2.640e-02 | * |
lag_Flow_08110100 | -1.76e-03 | 8.41e-04 | -2.0984 | 3.669e-02 | * |
Signif. codes: 0 <= '***' < 0.001 < '**' < 0.01 < '*' < 0.05 < '.' < 0.1 < '' < 1 | |||||
Residual standard error: 0.7083 on 304 degrees of freedom | |||||
Multiple R-squared: 0.4088, Adjusted R-squared: 0.3972 | |||||
F-statistic: 35.04 on 304 and 6 DF, p-value: 0.0000 |
%>%
df_16397 ::select(Site, date, adjusted_flow) %>%
dplyrleft_join(usgs_08065800, by = c("date" = "datetime")) %>%
left_join(usgs_08109800, by = c("date" = "datetime")) %>%
left_join(usgs_08110100, by = c("date" = "datetime")) %>%
mutate(lag_Flow_08065800 = lag(Flow_08065800),
lag_Flow_08109800 = lag(Flow_08109800),
lag_Flow_08110100 = lag(Flow_08110100),
log_Q = log1p(adjusted_flow)) -> df_16397_lm
<- lm(log_Q ~ Flow_08065800 +
m2.lm +
Flow_08109800 +
Flow_08110100 +
lag_Flow_08065800 +
lag_Flow_08109800
lag_Flow_08110100,data = df_16397_lm)
::as_flextable(m2.lm) %>%
flextableset_caption("Summary of linear regression coefficients at SWQM 16397")
Estimate | Standard Error | t value | Pr(>|t|) | ||
(Intercept) | 8.46e-01 | 1.48e-02 | 57.074 | 1.691e-164 | *** |
Flow_08065800 | 2.23e-05 | 3.20e-05 | 0.695 | 4.877e-01 | |
Flow_08109800 | 5.64e-03 | 7.51e-04 | 7.509 | 6.650e-13 | *** |
Flow_08110100 | 2.30e-03 | 2.78e-04 | 8.296 | 3.547e-15 | *** |
lag_Flow_08065800 | 5.52e-05 | 3.17e-05 | 1.739 | 8.304e-02 | . |
lag_Flow_08109800 | -2.96e-03 | 7.45e-04 | -3.970 | 8.990e-05 | *** |
lag_Flow_08110100 | -8.21e-04 | 2.81e-04 | -2.920 | 3.760e-03 | ** |
Signif. codes: 0 <= '***' < 0.001 < '**' < 0.01 < '*' < 0.05 < '.' < 0.1 < '' < 1 | |||||
Residual standard error: 0.2368 on 304 degrees of freedom | |||||
Multiple R-squared: 0.4337, Adjusted R-squared: 0.4226 | |||||
F-statistic: 38.81 on 304 and 6 DF, p-value: 0.0000 |
%>%
df_16882 ::select(Site, date, adjusted_flow) %>%
dplyrleft_join(usgs_08065800, by = c("date" = "datetime")) %>%
left_join(usgs_08109800, by = c("date" = "datetime")) %>%
left_join(usgs_08110100, by = c("date" = "datetime")) %>%
mutate(lag_Flow_08065800 = lag(Flow_08065800),
lag_Flow_08109800 = lag(Flow_08109800),
lag_Flow_08110100 = lag(Flow_08110100),
log_Q = log1p(adjusted_flow)) -> df_16882_lm
<- lm(log_Q ~ Flow_08065800 +
m3.lm +
Flow_08109800 +
Flow_08110100 +
lag_Flow_08065800 +
lag_Flow_08109800
lag_Flow_08110100,data = df_16882_lm)
::as_flextable(m3.lm) %>%
flextableset_caption("Summary of linear regression coefficients at SWQM 16882")
Estimate | Standard Error | t value | Pr(>|t|) | ||
(Intercept) | 3.12e-01 | 0.049030 | 6.3536 | 7.497e-10 | *** |
Flow_08065800 | -9.30e-06 | 0.000107 | -0.0868 | 9.309e-01 | |
Flow_08109800 | 1.81e-02 | 0.002513 | 7.1853 | 5.022e-12 | *** |
Flow_08110100 | 3.65e-03 | 0.000929 | 3.9333 | 1.034e-04 | *** |
lag_Flow_08065800 | 6.80e-05 | 0.000106 | 0.6405 | 5.223e-01 | |
lag_Flow_08109800 | -8.79e-03 | 0.002494 | -3.5244 | 4.885e-04 | *** |
lag_Flow_08110100 | -2.54e-03 | 0.000940 | -2.7016 | 7.280e-03 | ** |
Signif. codes: 0 <= '***' < 0.001 < '**' < 0.01 < '*' < 0.05 < '.' < 0.1 < '' < 1 | |||||
Residual standard error: 0.7923 on 310 degrees of freedom | |||||
Multiple R-squared: 0.2595, Adjusted R-squared: 0.2452 | |||||
F-statistic: 18.11 on 310 and 6 DF, p-value: 0.0000 |
%>%
df_16396_lm mutate(fits = as.numeric(predict(m1.lm, newdata = .))) %>%
mutate(fits = expm1(fits)) -> df_16396_lm
%>%
df_16397_lm mutate(fits = as.numeric(predict(m2.lm, newdata = .))) %>%
mutate(fits = expm1(fits)) -> df_16397_lm
%>%
df_16882_lm mutate(fits = as.numeric(predict(m3.lm, newdata = .))) %>%
mutate(fits = expm1(fits)) -> df_16882_lm
%>%
df_16396_lm ggplot() +
geom_point(aes(fits, adjusted_flow, color = "Regression Estimates against Naturalized Flow"),
alpha =0.4) +
geom_smooth(aes(fits, adjusted_flow, color = "Regression Estimates against Naturalized Flow"),
method = "lm", se = FALSE) +
geom_abline(aes(linetype = "1:1 Line", intercept = 0, slope = 1)) +
scale_x_log10() + scale_y_log10() +
labs(x = "Regression Estimated Flow [cfs]", y = "Naturalized Flow [cfs]",
title = "SWQM-16396") +
guides(color = guide_legend(""), linetype = guide_legend("")) +
coord_equal() +
theme_ms() +
theme(legend.margin = margin(0,0,0,0),
legend.box = "vertical",
legend.box.margin = margin(0,0,0,0),
legend.direction = "vertical",
legend.position = "none")-> p1
%>%
df_16396_lm ggplot() +
geom_line(aes(date, fits, color = "Linear Regression Estimated Flow")) +
geom_line(aes(date, adjusted_flow, color = "Naturalized Flow SWQM 16396"), alpha = 0.4) +
labs(x = "Date", y = "Mean Daily Flow [cfs]") +
theme_ms() +
theme(legend.title = element_blank(),
legend.direction = "vertical",
legend.position = "none") -> p2
%>%
df_16397_lm ggplot() +
geom_point(aes(fits, adjusted_flow, color = "Regression Estimates against Naturalized Flow"),
alpha =0.4) +
geom_smooth(aes(fits, adjusted_flow, color = "Regression Estimates against Naturalized Flow"),
method = "lm", se = FALSE) +
geom_abline(aes(linetype = "1:1 Line", intercept = 0, slope = 1)) +
scale_x_log10() + scale_y_log10() +
labs(x = "Regression Estimated Flow [cfs]", y = "Naturalized Flow [cfs]",
title = "SWQM-16397") +
guides(color = guide_legend(""), linetype = guide_legend("")) +
coord_equal() +
theme_ms() +
theme(legend.margin = margin(0,0,0,0),
legend.box = "vertical",
legend.box.margin = margin(0,0,0,0),
legend.direction = "vertical",
legend.position = "none")-> p3
%>%
df_16397_lm ggplot() +
geom_line(aes(date, fits, color = "Linear Regression Estimated Flow")) +
geom_line(aes(date, adjusted_flow, color = "Naturalized Flow SWQM 16397"), alpha = 0.4) +
labs(x = "Date", y = "Mean Daily Flow [cfs]") +
theme_ms() +
theme(legend.title = element_blank(),
legend.direction = "vertical",
legend.position = "none") -> p4
%>%
df_16882_lm ggplot() +
geom_point(aes(fits, adjusted_flow, color = "Regression Estimates against Naturalized Flow"),
alpha =0.4) +
geom_smooth(aes(fits, adjusted_flow, color = "Regression Estimates against Naturalized Flow"),
method = "lm", se = FALSE) +
geom_abline(aes(linetype = "1:1 Line", intercept = 0, slope = 1)) +
scale_x_log10() + scale_y_log10() +
labs(x = "Regression Estimated Flow [cfs]", y = "Naturalized Flow [cfs]",
title = "SWQM-16882") +
guides(color = guide_legend(""), linetype = guide_legend("")) +
coord_equal() +
theme_ms() +
theme(legend.margin = margin(0,0,0,0),
legend.box = "vertical",
legend.box.margin = margin(0,0,0,0),
legend.direction = "vertical")-> p5
%>%
df_16882_lm ggplot() +
geom_line(aes(date, fits, color = "Linear Regression Estimated Flow")) +
geom_line(aes(date, adjusted_flow, color = "Naturalized Flow"), alpha = 0.4) +
labs(x = "Date", y = "Mean Daily Flow [cfs]") +
theme_ms() +
theme(legend.title = element_blank(),
legend.direction = "vertical") -> p6
+ p2)/(p3 + p4)/(p5 + p6) (p1

Figure 8: Linear regression estimated mean daily streamflows plotted against measured mean daily streamflows and over time.
GAM
<- gam(adjusted_flow ~
m1.gam s(ewood_precip, bs = "ts", k = 5) +
s(ewood_tmax, bs = "ts", k = 5) +
s(lagPrecip, bs = "ts", k = 5) +
s(doy, bs = "cc", k = 8) +
s(wetness, bs = "ts", k = 5) +
s(et, bs = "ts", k = 5) +
ti(ewood_precip, lagPrecip, k = 5) +
ti(ewood_precip, ewood_tmax, k = 5) +
ti(ewood_precip, wetness, k = 5) +
ti(ewood_precip, et, k = 5) +
ti(ewood_tmax, wetness, k = 5) +
ti(ewood_tmax, et, k = 5) +
ti(wetness, et, k = 5),
data = df_16396,
select = TRUE,
family = scat(link = "log"),
method = "REML",
control = gam.control(nthreads = 2))
::as_flextable(m1.gam) %>%
flextable::set_caption("Summary of GAM coefficients at SWQM 16396") flextable
Estimate | Standard Error | z value | Pr(>|z|) | Signif. | |
s(ewood_precip) | 8.17 | 0.000e+00 | *** | ||
s(ewood_tmax) | 9.40 | 6.247e-06 | *** | ||
s(lagPrecip) | 342.94 | 0.000e+00 | *** | ||
s(doy) | 333.11 | 0.000e+00 | *** | ||
s(wetness) | 50.00 | 0.000e+00 | *** | ||
s(et) | 1.98 | 2.336e-02 | * | ||
ti(ewood_precip,lagPrecip) | 19.35 | 7.282e-07 | *** | ||
ti(ewood_precip,ewood_tmax) | 130.02 | 0.000e+00 | *** | ||
ti(ewood_precip,wetness) | 139.59 | 0.000e+00 | *** | ||
ti(ewood_precip,et) | 779.63 | 0.000e+00 | *** | ||
ti(ewood_tmax,wetness) | 32.16 | 0.000e+00 | *** | ||
ti(ewood_tmax,et) | 40.48 | 0.000e+00 | *** | ||
ti(wetness,et) | 451.08 | 0.000e+00 | *** | ||
Signif. codes: 0 <= '***' < 0.001 < '**' < 0.01 < '*' < 0.05 < '.' < 0.1 < '' < 1 | |||||
| |||||
(Dispersion parameter for Scaled t(3.001,1.175) family taken to be 1) | |||||
Null deviance: NULL on NULL degrees of freedom | |||||
Residual deviance: NULL on NULL degrees of freedom |
<- gam(adjusted_flow ~
m2.gam s(ewood_precip, bs = "ts", k = 5) +
s(ewood_tmax, bs = "ts", k = 5) +
s(lagPrecip, bs = "ts", k = 5) +
s(doy, bs = "cc", k = 8) +
s(wetness, bs = "ts", k = 5) +
s(et, bs = "ts", k = 5) +
ti(ewood_precip, lagPrecip, k = 5) +
ti(ewood_precip, ewood_tmax, k = 5) +
ti(ewood_precip, wetness, k = 5) +
ti(ewood_precip, et, k = 5) +
ti(ewood_tmax, wetness, k = 5) +
ti(ewood_tmax, et, k = 5) +
ti(wetness, et, k = 5),
data = df_16397,
select = TRUE,
family = scat(link = "log"),
method = "REML",
control = gam.control(nthreads = 2))
::as_flextable(m2.gam) %>%
flextable::set_caption("Summary of GAM coefficients at SWQM 16397") flextable
Estimate | Standard Error | z value | Pr(>|z|) | Signif. | |
s(ewood_precip) | 11.93 | 1.229e-06 | *** | ||
s(ewood_tmax) | 1.56 | 4.402e-02 | * | ||
s(lagPrecip) | 54.56 | 0.000e+00 | *** | ||
s(doy) | 285.47 | 0.000e+00 | *** | ||
s(wetness) | 32.63 | 0.000e+00 | *** | ||
s(et) | 26.11 | 0.000e+00 | *** | ||
ti(ewood_precip,lagPrecip) | 209.86 | 0.000e+00 | *** | ||
ti(ewood_precip,ewood_tmax) | 161.10 | 0.000e+00 | *** | ||
ti(ewood_precip,wetness) | 53.93 | 0.000e+00 | *** | ||
ti(ewood_precip,et) | 378.14 | 0.000e+00 | *** | ||
ti(ewood_tmax,wetness) | 375.35 | 0.000e+00 | *** | ||
ti(ewood_tmax,et) | 128.74 | 0.000e+00 | *** | ||
ti(wetness,et) | 436.57 | 0.000e+00 | *** | ||
Signif. codes: 0 <= '***' < 0.001 < '**' < 0.01 < '*' < 0.05 < '.' < 0.1 < '' < 1 | |||||
| |||||
(Dispersion parameter for Scaled t(3.013,0.081) family taken to be 1) | |||||
Null deviance: NULL on NULL degrees of freedom | |||||
Residual deviance: NULL on NULL degrees of freedom |
# we need a hurdle model here. We will fit logistic regression to predict
# the probabilty of a non-zero flow
# and a scaled-t model with log link to predict the mean of the non-zero data
<- gam(non_zero ~
m3.gam s(ewood_precip, bs = "ts") +
s(ewood_tmax, bs = "ts") +
s(lagPrecip, bs = "ts") +
s(doy, bs = "cc") +
s(wetness, bs = "ts") +
s(et, bs = "ts") +
ti(ewood_precip, ewood_tmax) +
ti(ewood_precip, wetness) +
ti(ewood_precip, et) +
ti(ewood_tmax, wetness) +
ti(ewood_tmax, et) +
ti(wetness, et),
data = df_16882,
select = TRUE,
family = binomial(),
method = "REML",
control = gam.control(nthreads = 2))
<- gam(adjusted_flow ~
m4.gam #s(ewood_precip, bs = "ts", k = 5) +
#s(ewood_tmax, bs = "ts", k = 5) +
s(doy, bs = "cs", k = 6) +
s(lagPrecip, bs = "ts", k = 5) +
#s(wetness, bs = "ts", k = 4) +
#s(et, bs = "ts", k = 4) +
#te(ewood_precip, ewood_tmax, k = 8),
te(ewood_precip, wetness, k = 3)+
te(ewood_precip, et, k = 3) +
te(ewood_tmax, wetness, k = 3)+
te(ewood_tmax, et, k = 3) +
te(wetness, et, k = 3),
data = subset(df_16882, non_zero == 1),
select = TRUE,
family = scat(link = "log"),
method = "REML",
control = gam.control(nthreads = 2))
#draw(m4.gam, n = 1000, dist = 2)
::as_flextable(m3.gam) %>%
flextable::set_caption("Summary of GAMcoefficients at SWQM 16882") flextable
Estimate | Standard Error | z value | Pr(>|z|) | Signif. | |
s(ewood_precip) | 7.67e+00 | 2.431e-03 | ** | ||
s(ewood_tmax) | 2.18e+00 | 6.979e-02 | . | ||
s(lagPrecip) | 4.37e+00 | 2.071e-02 | * | ||
s(doy) | 3.31e+01 | -2.211e-06 | *** | ||
s(wetness) | 1.46e+01 | 2.385e-04 | *** | ||
s(et) | 4.50e-01 | 2.212e-01 | |||
ti(ewood_precip,ewood_tmax) | 1.89e-05 | 4.391e-01 | |||
ti(ewood_precip,wetness) | 2.20e-06 | 8.499e-01 | |||
ti(ewood_precip,et) | 5.52e+00 | 1.581e-02 | * | ||
ti(ewood_tmax,wetness) | 6.19e-07 | 9.684e-01 | |||
ti(ewood_tmax,et) | 8.37e-06 | 4.014e-01 | |||
ti(wetness,et) | 4.32e+00 | 5.795e-02 | . | ||
Signif. codes: 0 <= '***' < 0.001 < '**' < 0.01 < '*' < 0.05 < '.' < 0.1 < '' < 1 | |||||
| |||||
(Dispersion parameter for binomial family taken to be 1) | |||||
Null deviance: NULL on NULL degrees of freedom | |||||
Residual deviance: NULL on NULL degrees of freedom |
::as_flextable(m4.gam) %>%
flextable::set_caption("Summary of GAMcoefficients at SWQM 16882") flextable
Estimate | Standard Error | z value | Pr(>|z|) | Signif. | |
s(doy) | 815.535 | 0.0000 | *** | ||
s(lagPrecip) | 65.089 | 0.0000 | *** | ||
te(ewood_precip,wetness) | 65.459 | 0.0000 | *** | ||
te(ewood_precip,et) | 454.991 | 0.0000 | *** | ||
te(ewood_tmax,wetness) | 91.155 | 0.0000 | *** | ||
te(ewood_tmax,et) | 272.787 | 0.0000 | *** | ||
te(wetness,et) | 0.251 | 0.1536 | |||
Signif. codes: 0 <= '***' < 0.001 < '**' < 0.01 < '*' < 0.05 < '.' < 0.1 < '' < 1 | |||||
| |||||
(Dispersion parameter for Scaled t(3,0.952) family taken to be 1) | |||||
Null deviance: NULL on NULL degrees of freedom | |||||
Residual deviance: NULL on NULL degrees of freedom |
##get model fits and confidence intervals
<- family(m1.gam)
gam_fam <- gam_fam$linkinv
ilink
%>%
df_16396 bind_cols(
as_tibble(predict(m1.gam, df_16396, se.fit = TRUE)) %>%
mutate(response = ilink(fit),
upr_ci = ilink(fit + (2*se.fit)),
lwr_ci = ilink(fit - (2*se.fit))) %>%
::select(response, upr_ci, lwr_ci)
dplyr-> df_16396_gam
)
##get model fits and confidence intervals
<- family(m2.gam)
gam_fam <- gam_fam$linkinv
ilink
%>%
df_16397 bind_cols(
as_tibble(predict(m2.gam, df_16397, se.fit = TRUE)) %>%
mutate(response = ilink(fit),
upr_ci = ilink(fit + (2*se.fit)),
lwr_ci = ilink(fit - (2*se.fit))) %>%
::select(response, upr_ci, lwr_ci)
dplyr-> df_16397_gam
)
##get model fits and confidence intervals
<- family(m3.gam)
gam_fam <- gam_fam$linkinv
ilink
%>%
df_16882 bind_cols(
as_tibble(predict(m3.gam, df_16882, se.fit = FALSE,
type = "response"))) %>%
mutate(pred_zero = case_when(
< 0.5 ~ 0,
value > 0.5 ~ 1)) %>%
value bind_cols(
as_tibble(predict(m4.gam, df_16882, se.fit = FALSE,
type = "response")) %>%
::rename(response = value)) %>%
dplyrmutate(response = case_when(
== 0 ~ 0,
pred_zero != 0 ~ response)) -> df_16882_gam
pred_zero
%>%
df_16396_gam ggplot() +
geom_point(aes(response, adjusted_flow,
color = "GAM Estimates against Naturalized Flow"),
alpha = 0.4) +
geom_smooth(aes(response, adjusted_flow,
color = "GAM Estimates against Naturalized Flow"),
method = "lm", se = FALSE) +
geom_abline(aes(linetype = "1:1 Line", intercept = 0, slope = 1)) +
scale_x_log10() + scale_y_log10() +
labs(x = "GAM Estimated Flow [cfs]", y = "Naturalized Flow [cfs]") +
guides(color = guide_legend(""), linetype = guide_legend("")) +
theme_ms() +
theme(legend.margin = margin(0,0,0,0),
legend.box = "vertical",
legend.box.margin = margin(0,0,0,0),
legend.direction = "vertical",
legend.position = "none")-> p1
%>%
df_16396_gam ggplot() +
geom_line(aes(date, response, color = "GAM Estimated Flow")) +
geom_line(aes(date, adjusted_flow, color = "Naturalized Flow SWQM 16396"),
alpha = 0.4) +
labs(x = "Date", y = "Mean Daily Flow [cfs]") +
theme_ms() +
theme(legend.title = element_blank(),
legend.direction = "vertical",
legend.position = "none") -> p2
%>%
df_16397_gam ggplot() +
geom_point(aes(response, adjusted_flow,
color = "GAM Estimates against Naturalized Flow"),
alpha = 0.4) +
geom_smooth(aes(response, adjusted_flow,
color = "GAM Estimates against Naturalized Flow"),
method = "lm", se = FALSE) +
geom_abline(aes(linetype = "1:1 Line", intercept = 0, slope = 1)) +
scale_x_log10() + scale_y_log10() +
labs(x = "GAM Estimated Flow [cfs]", y = "Naturalized Flow [cfs]") +
guides(color = guide_legend(""), linetype = guide_legend("")) +
theme_ms() +
theme(legend.margin = margin(0,0,0,0),
legend.box = "vertical",
legend.box.margin = margin(0,0,0,0),
legend.direction = "vertical",
legend.position = "none")-> p3
%>%
df_16397_gam ggplot() +
geom_line(aes(date, response, color = "GAM Estimated Flow")) +
geom_line(aes(date, adjusted_flow, color = "Naturalized Flow SWQM 16397"),
alpha = 0.4) +
labs(x = "Date", y = "Mean Daily Flow [cfs]") +
theme_ms() +
theme(legend.title = element_blank(),
legend.direction = "vertical",
legend.position = "none") -> p4
%>%
df_16882_gam ggplot() +
geom_point(aes(response, adjusted_flow,
color = "GAM Estimates against Naturalized Flow"),
alpha = 0.4) +
geom_smooth(aes(response, adjusted_flow,
color = "GAM Estimates against Naturalized Flow"),
method = "lm", se = FALSE) +
geom_abline(aes(linetype = "1:1 Line", intercept = 0, slope = 1)) +
scale_x_log10() + scale_y_log10() +
labs(x = "GAM Estimated Flow [cfs]", y = "Naturalized Flow [cfs]") +
guides(color = guide_legend(""), linetype = guide_legend("")) +
theme_ms() +
theme(legend.margin = margin(0,0,0,0),
legend.box = "vertical",
legend.box.margin = margin(0,0,0,0),
legend.direction = "vertical")-> p5
%>%
df_16882_gam ggplot() +
geom_line(aes(date, response, color = "GAM Estimated Flow")) +
geom_line(aes(date, adjusted_flow, color = "Naturalized Flow"),
alpha = 0.4) +
labs(x = "Date", y = "Mean Daily Flow [cfs]") +
theme_ms() +
theme(legend.title = element_blank(),
legend.direction = "vertical") -> p6
+p2)/(p3 +p4)/(p5+p6) (p1

Figure 9: GAM estimated streamflows at SWQM 16396 plotted against measured streamflows and over time.
## calculate NSE and KGE as goodness of fit metrics
## for each method
##
tibble(model = c("DAR_08065800", "DAR_08109800", "DAR_08110100",
"Linear Regression",
"GAM"),
'SWQM-16396' = c(
::NSE(dar_results_16396$DAR_Q_08065800, as.numeric(dar_results_16396$adjusted_flow)),
hydroGOF::NSE(dar_results_16396$DAR_Q_08109800, as.numeric(dar_results_16396$adjusted_flow)),
hydroGOF::NSE(dar_results_16396$DAR_Q_08110100, as.numeric(dar_results_16396$adjusted_flow)),
hydroGOF::NSE(df_16396_lm$fits, df_16396_lm$adjusted_flow),
hydroGOF::NSE(as.numeric(df_16396_gam$response), df_16396_gam$adjusted_flow)),
hydroGOF'SWQM-16397' = c(
::NSE(dar_results_16397$DAR_Q_08065800, as.numeric(dar_results_16397$adjusted_flow)),
hydroGOF::NSE(dar_results_16397$DAR_Q_08109800, as.numeric(dar_results_16397$adjusted_flow)),
hydroGOF::NSE(dar_results_16397$DAR_Q_08110100, as.numeric(dar_results_16397$adjusted_flow)),
hydroGOF::NSE(df_16397_lm$fits, df_16397_lm$adjusted_flow),
hydroGOF::NSE(as.numeric(df_16397_gam$response), df_16397_gam$adjusted_flow)),
hydroGOF'SWQM-16882' = c(
::NSE(dar_results_16882$DAR_Q_08065800, as.numeric(dar_results_16882$adjusted_flow)),
hydroGOF::NSE(dar_results_16882$DAR_Q_08109800, as.numeric(dar_results_16882$adjusted_flow)),
hydroGOF::NSE(dar_results_16882$DAR_Q_08110100, as.numeric(dar_results_16882$adjusted_flow)),
hydroGOF::NSE(df_16882_lm$fits, df_16882_lm$adjusted_flow),
hydroGOF::NSE(as.numeric(df_16882_gam$response), df_16882_gam$adjusted_flow))) hydroGOF
## # A tibble: 5 x 4
## model `SWQM-16396` `SWQM-16397` `SWQM-16882`
## <chr> <dbl> <dbl> <dbl>
## 1 DAR_08065800 -6.84 -407. -91.5
## 2 DAR_08109800 0.0293 -0.462 0.151
## 3 DAR_08110100 0.0996 -7.05 -1.35
## 4 Linear Regression -0.603 0.319 -0.237
## 5 GAM 0.601 0.373 -0.109
tibble(model = c("DAR_08065800", "DAR_08109800", "DAR_08110100",
"Linear Regression",
"GAM"),
'SWQM-16396' = c(
::KGE(dar_results_16396$DAR_Q_08065800, as.numeric(dar_results_16396$adjusted_flow)),
hydroGOF::KGE(dar_results_16396$DAR_Q_08109800, as.numeric(dar_results_16396$adjusted_flow)),
hydroGOF::KGE(dar_results_16396$DAR_Q_08110100, as.numeric(dar_results_16396$adjusted_flow)),
hydroGOF::KGE(df_16396_lm$fits, df_16396_lm$adjusted_flow),
hydroGOF::KGE(as.numeric(df_16396_gam$response), df_16396_gam$adjusted_flow)),
hydroGOF'SWQM-16397' = c(
::KGE(dar_results_16397$DAR_Q_08065800, as.numeric(dar_results_16397$adjusted_flow)),
hydroGOF::KGE(dar_results_16397$DAR_Q_08109800, as.numeric(dar_results_16397$adjusted_flow)),
hydroGOF::KGE(dar_results_16397$DAR_Q_08110100, as.numeric(dar_results_16397$adjusted_flow)),
hydroGOF::KGE(df_16397_lm$fits, df_16397_lm$adjusted_flow),
hydroGOF::KGE(as.numeric(df_16397_gam$response), df_16397_gam$adjusted_flow)),
hydroGOF'SWQM-16882' = c(
::KGE(dar_results_16882$DAR_Q_08065800, as.numeric(dar_results_16882$adjusted_flow)),
hydroGOF::KGE(dar_results_16882$DAR_Q_08109800, as.numeric(dar_results_16882$adjusted_flow)),
hydroGOF::KGE(dar_results_16882$DAR_Q_08110100, as.numeric(dar_results_16882$adjusted_flow)),
hydroGOF::KGE(df_16882_lm$fits, df_16882_lm$adjusted_flow),
hydroGOF::KGE(as.numeric(df_16882_gam$response), df_16882_gam$adjusted_flow))) hydroGOF
## # A tibble: 5 x 4
## model `SWQM-16396` `SWQM-16397` `SWQM-16882`
## <chr> <dbl> <dbl> <dbl>
## 1 DAR_08065800 -1.02 -18.4 -9.32
## 2 DAR_08109800 -0.308 0.201 0.182
## 3 DAR_08110100 -0.0809 -1.21 0.0916
## 4 Linear Regression 0.277 0.313 0.138
## 5 GAM 0.636 0.452 0.402
tibble(model = c("DAR_08065800", "DAR_08109800", "DAR_08110100",
"Linear Regression",
"GAM"),
'SWQM-16396' = c(
::pbias(dar_results_16396$DAR_Q_08065800, as.numeric(dar_results_16396$adjusted_flow)),
hydroGOF::pbias(dar_results_16396$DAR_Q_08109800, as.numeric(dar_results_16396$adjusted_flow)),
hydroGOF::pbias(dar_results_16396$DAR_Q_08110100, as.numeric(dar_results_16396$adjusted_flow)),
hydroGOF::pbias(df_16396_lm$fits, df_16396_lm$adjusted_flow),
hydroGOF::pbias(as.numeric(df_16396_gam$response), df_16396_gam$adjusted_flow)),
hydroGOF'SWQM-16397' = c(
::pbias(dar_results_16397$DAR_Q_08065800, as.numeric(dar_results_16397$adjusted_flow)),
hydroGOF::pbias(dar_results_16397$DAR_Q_08109800, as.numeric(dar_results_16397$adjusted_flow)),
hydroGOF::pbias(dar_results_16397$DAR_Q_08110100, as.numeric(dar_results_16397$adjusted_flow)),
hydroGOF::pbias(df_16397_lm$fits, df_16397_lm$adjusted_flow),
hydroGOF::pbias(as.numeric(df_16397_gam$response), df_16397_gam$adjusted_flow)),
hydroGOF'SWQM-16882' = c(
::pbias(dar_results_16882$DAR_Q_08065800, as.numeric(dar_results_16882$adjusted_flow)),
hydroGOF::pbias(dar_results_16882$DAR_Q_08109800, as.numeric(dar_results_16882$adjusted_flow)),
hydroGOF::pbias(dar_results_16882$DAR_Q_08110100, as.numeric(dar_results_16882$adjusted_flow)),
hydroGOF::pbias(df_16882_lm$fits, df_16882_lm$adjusted_flow),
hydroGOF::pbias(as.numeric(df_16882_gam$response), df_16882_gam$adjusted_flow))) hydroGOF
## # A tibble: 5 x 4
## model `SWQM-16396` `SWQM-16397` `SWQM-16882`
## <chr> <dbl> <dbl> <dbl>
## 1 DAR_08065800 58.6 365 581.
## 2 DAR_08109800 -84.1 -55.9 -33.1
## 3 DAR_08110100 -69.8 -12.1 29.3
## 4 Linear Regression -22.7 -7.8 -48.6
## 5 GAM -15.8 -5.8 -12
Model validation
## use k-fold cross validation to estimate
## how well the approach fits to ujnknown data
## randomly split the data 80-20 to fit and validate the GAM.
library(modelr)
##
## Attaching package: 'modelr'
## The following object is masked from 'package:gratia':
##
## add_residuals
## The following objects are masked from 'package:hydroGOF':
##
## mae, mse, rmse
set.seed(123)
%>%
df_16396 ## create a list of data resamples.
## randomly splits the dataset into 80-20
## n number of times
crossv_kfold() %>%
mutate(model = map(train, ~gam(adjusted_flow~
s(ewood_precip, bs = "ts", k = 5) +
s(ewood_tmax, bs = "ts", k = 5) +
s(lagPrecip, bs = "ts", k = 5) +
s(doy, bs = "cc", k = 8) +
s(wetness, bs = "ts", k = 5) +
s(et, bs = "ts", k = 5) +
ti(ewood_precip, lagPrecip, k = 5) +
ti(ewood_precip, ewood_tmax, k = 5) +
ti(ewood_precip, wetness, k = 5) +
ti(ewood_precip, et, k = 5) +
ti(ewood_tmax, wetness, k = 5) +
ti(ewood_tmax, et, k = 5) +
ti(wetness, et, k = 5),
data = as.data.frame(.),
family = scat(link = "log"),
method = "REML",
control = gam.control(nthreads = 2))),
preds = map2(test,model, ~predict(.y, newdata = as.data.frame(.x),
type = "response")),
NSE = map2_dbl(test, preds, ~hydroGOF::NSE(as.numeric(as.data.frame(.x)$adjusted_flow),
as.numeric(.y))),
KGE = map2_dbl(test, preds, ~hydroGOF::KGE(as.numeric(as.data.frame(.x)$adjusted_flow),
as.numeric(.y))),
MAE = map2_dbl(test, preds, ~hydroGOF::mae(as.numeric(as.data.frame(.x)$adjusted_flow),
as.numeric(.y))),
pbias = map2_dbl(test, preds, ~hydroGOF::pbias(as.numeric(as.data.frame(.x)$adjusted_flow),
as.numeric(.y)))) -> df_16396_kfold
## Warning: Problem with `mutate()` input `model`.
## i Fitting terminated with step failure - check results carefully
## i Input `model` is `map(...)`.
## Warning in newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS, L =
## G$L, : Fitting terminated with step failure - check results carefully
## Warning: Problem with `mutate()` input `model`.
## i Fitting terminated with step failure - check results carefully
## i Input `model` is `map(...)`.
## Warning in newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS, L =
## G$L, : Fitting terminated with step failure - check results carefully
## Warning: Problem with `mutate()` input `model`.
## i Fitting terminated with step failure - check results carefully
## i Input `model` is `map(...)`.
## Warning in newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS, L =
## G$L, : Fitting terminated with step failure - check results carefully
df_16396_kfold
## # A tibble: 5 x 9
## train test .id model preds NSE KGE MAE pbias
## <named list> <named list> <chr> <named > <named > <dbl> <dbl> <dbl> <dbl>
## 1 <resample [~ <resample [~ 1 <gam> <dbl [6~ -6.84e-3 -0.489 101. -92.6
## 2 <resample [~ <resample [~ 2 <gam> <dbl [6~ -1.57e-2 -0.647 829. -97.4
## 3 <resample [~ <resample [~ 3 <gam> <dbl [6~ -2.60e-2 -0.515 22.0 -63.3
## 4 <resample [~ <resample [~ 4 <gam> <dbl [6~ -2.13e+1 -3.27 21.0 176.
## 5 <resample [~ <resample [~ 5 <gam> <dbl [6~ -9.27e-1 0.0170 10.4 55.8
#
# m1.gam <- gam(adjusted_flow ~
# s(ewood_precip, bs = "ts", k = 5) +
# s(ewood_tmax, bs = "ts", k = 5) +
# s(lagPrecip, bs = "ts", k = 5) +
# s(doy, bs = "cc", k = 8) +
# s(wetness, bs = "ts", k = 5) +
# s(et, bs = "ts", k = 5) +
# ti(ewood_precip, lagPrecip, k = 5) +
# ti(ewood_precip, ewood_tmax, k = 5) +
# ti(ewood_precip, wetness, k = 5) +
# ti(ewood_precip, et, k = 5) +
# ti(ewood_tmax, wetness, k = 5) +
# ti(ewood_tmax, et, k = 5) +
# ti(wetness, et, k = 5),
# data = df_16396,
# select = TRUE,
# family = scat(link = "log"),
# method = "REML",
# control = gam.control(nthreads = 2))