Dealing with Errors • rATTAINS

library(rATTAINS)
library(jsonlite)
library(tibblify)
library(tidyr)

There are a number of errors that you might encounter using this rATTAINS. Here is a list of potential errors and fixes. Feel free to raise an issue if I missed something.

Network Connectivity

The following error message likely indicates an issue connecting to the EPA server:

state_summary(organization_id = "TCEQMAIN", reporting_cycle = "2022")

Potential issues/fixes:

Check your network connection.
Check attains.epa.gov. If you are able to connect, a warning notice about accessing U.S. Government information systems should show in your web browser.
Occasionally proxy systems used in corporate IT systems cause issues with connections (see: https://stackoverflow.com/questions/59796178/r-curlhas-internet-false-even-though-there-are-internet-connection). I’ve tried to account for this in the package, but you might run into occasional issues.

Server Response

The server might also return http code messages. The most common will be 404 or 429. rATTAINS will generally provide a simple message and error when this is encountered:

actions(action_id = "R8-ND-2018-03")
#> Error: Too Many Requests (HTTP 429)

Potential issues/fixes:

Wait until the server is responsive.
Make less frequent requests.

Parsing Errors

The default behavior in rATTAINS is to parse JSON data downloaded from the API to one or more dataframes. These are returned as a single dataframe or list of dataframes depending on the function. rATTAINS also tries to flatten the data as much as possible. This design choice might have been a mistake because it can become a source of errors if the data returned by the API changes or is inconsistent. As of version 1.0.0 of the package the .unnest argument was added to most functions. By setting .unnest=FALSE many of these problems should be avoided.

Default behavior:

state_summary(organization_id = "TDECWR", 
              reporting_cycle = "2016")
#> # A tibble: 71 × 24
#>    organization_identifer organization_name organization_type_text
#>    <chr>                  <chr>             <chr>                 
#>  1 TDECWR                 Tennessee         State                 
#>  2 TDECWR                 Tennessee         State                 
#>  3 TDECWR                 Tennessee         State                 
#>  4 TDECWR                 Tennessee         State                 
#>  5 TDECWR                 Tennessee         State                 
#>  6 TDECWR                 Tennessee         State                 
#>  7 TDECWR                 Tennessee         State                 
#>  8 TDECWR                 Tennessee         State                 
#>  9 TDECWR                 Tennessee         State                 
#> 10 TDECWR                 Tennessee         State                 
#> # ℹ 61 more rows
#> # ℹ 21 more variables: reporting_cycle <chr>, water_type_code <chr>,
#> #   units_code <chr>, use_name <chr>, fully_supporting <dbl>,
#> #   fully_supporting_count <int>, use_insufficient_information <dbl>,
#> #   use_insufficient_information_count <int>, not_assessed <dbl>,
#> #   not_assessed_count <int>, not_supporting <dbl>, not_supporting_count <int>,
#> #   parameter_group <chr>, parameter_insufficient_information <dbl>, …

Using .unnest=FALSE returns nested columns. The tidyr family of unnest() functions is an easy way to flatten this data:

df <- state_summary(organization_id = "TDECWR", 
                    reporting_cycle = "2016",
                    .unnest = FALSE)
df
#> # A tibble: 1 × 4
#>   organization_identifer organization_name organization_type_text
#>   <chr>                  <chr>             <chr>                 
#> 1 TDECWR                 Tennessee         State                 
#> # ℹ 1 more variable: reporting_cycles <list<tibble[,2]>>

df |>
  tidyr::unnest(reporting_cycles) |> 
  tidyr::unnest(water_types) |> 
  tidyr::unnest(use_attainments)
#> # A tibble: 22 × 16
#>    organization_identifer organization_name organization_type_text
#>    <chr>                  <chr>             <chr>                 
#>  1 TDECWR                 Tennessee         State                 
#>  2 TDECWR                 Tennessee         State                 
#>  3 TDECWR                 Tennessee         State                 
#>  4 TDECWR                 Tennessee         State                 
#>  5 TDECWR                 Tennessee         State                 
#>  6 TDECWR                 Tennessee         State                 
#>  7 TDECWR                 Tennessee         State                 
#>  8 TDECWR                 Tennessee         State                 
#>  9 TDECWR                 Tennessee         State                 
#> 10 TDECWR                 Tennessee         State                 
#> # ℹ 12 more rows
#> # ℹ 13 more variables: reporting_cycle <chr>, water_type_code <chr>,
#> #   units_code <chr>, use_name <chr>, fully_supporting <dbl>,
#> #   fully_supporting_count <int>, use_insufficient_information <dbl>,
#> #   use_insufficient_information_count <int>, not_assessed <dbl>,
#> #   not_assessed_count <int>, not_supporting <dbl>, not_supporting_count <int>,
#> #   parameters <list<tibble[,9]>>

If the above option doesn’t work, rATTAINS can also provide the raw JSON data from the API. The tibblify 📦️ and jsonlite 📦 provide tools to convert JSON to nested lists then tibbles pretty easily. First, use the tidy=FALSE argument to return the unparsed JSON string, then uses jsonlite to convert that data to a nested list, then tibblify to convert to a nested dataframe!

raw_data <- state_summary(organization_id = "TDECWR", 
                    reporting_cycle = "2016",
                    tidy = FALSE)

list_data <- jsonlite::fromJSON(raw_data,
                                simplifyVector = FALSE,
                                simplifyDataFrame = FALSE,
                                flatten = FALSE)

df <- tibblify::tibblify(list_data$data,
                         unspecified = "drop")
#> The spec contains 1 unspecified field:
#> • reportingCycles->combinedCycles
df$reportingCycles
#> # A tibble: 1 × 3
#>   reportingCycle cycleStatus         waterTypes
#>   <chr>          <chr>       <list<tibble[,3]>>
#> 1 2016           Historical             [4 × 3]