Skip to content

data download script does not get all the data

I ran through 00_get_data.Rmd, and was able to download most of the data. I used docker-compose up to get my environment running.

Much of this worked smoothly. However, the docker image does not have some necessary libraries, and some download snippets do not work.

Docker image

docker-compose up runs the image dblodgett/hydrogeoenv-custom:latest. This image needed to have the following run to make it able to run this workflow.:

install.packages("nhdplusTools")
remotes::install_github("DOI-USGS/hyRefactor", upgrade = "never")
install.packages("hyfabric_0.5.7.tar.gz", repos = NULL)
# (not sure if this is the right version, but it was available without having to bind-mount
# [this](https://code.usgs.gov/wma/nhgf/reference-hydrofabric/-/tree/main/hyfabric)
# into the running container)

General Notes

  • The code to check for 7zip assumes Windows. It would be good to accommodate Linux/docker, where the executable is p7zip and you decompress with p7zip -d.
  • At first, the GFv1.1 snippet failed with an error saying that sb didn't exist. I think there may be a bug with verifying authentication, or something like that. I ran sbtools::authenticate_sb() again and then it worked fine.

Snippets with failed download.file() call

The nwm_topology snippet died on a failure to get this file, and the res snippet failed trying to get this file. I strongly suspect this was due to download timeout - they both appear to have had options(timeout = 60). Setting a higher timeout would probably fix this (happy to test this guess if useful). I downloaded them by hand through my browser.

Snippets with an inaccessible ScienceBase item

Several snippets failed due to inaccessible ScienceBase items.

SWIM g3gf

Quitting from lines 60-79 (00_get_data.Rmd)
Error in sbtools::item_file_download("5dcd5f96e4b069579760aedb", names = g3gf,  :
Item does not contain all requested files

GageLocGFinfo.dbf

This is attempting to download from SB ID 5dcd5f96e4b069579760aedb (same as SWIM g3gf), which I can't see.

Various sections sharing the same SB ID

Several snippets failed due to inability to access 5dbc53d4e4b06957974eddae on SB. Is this a placeholder?

  • Thermoelectric Facilities
  • National Inventory of Dams
  • NHDPlusV21_NationalData_GageLoc_05.7z
  • MERIT HydroDEM
  • AK GF Source data
  • HI GF Source data

Snippet with an outdated URL?

The Islands snippet fails with a "no such bucket" error when trying to download from this link. On the EPA site, it looks like this is the current link.

Edited by Ross, Jesse C