data download script does not get all the data
I ran through 00_get_data.Rmd
, and was able to download most of the data. I used docker-compose up
to get my environment running.
Much of this worked smoothly. However, the docker image does not have some necessary libraries, and some download snippets do not work.
Docker image
docker-compose up
runs the image dblodgett/hydrogeoenv-custom:latest
. This image needed to have the following run to make it able to run this workflow.:
install.packages("nhdplusTools")
remotes::install_github("DOI-USGS/hyRefactor", upgrade = "never")
install.packages("hyfabric_0.5.7.tar.gz", repos = NULL)
# (not sure if this is the right version, but it was available without having to bind-mount
# [this](https://code.usgs.gov/wma/nhgf/reference-hydrofabric/-/tree/main/hyfabric)
# into the running container)
General Notes
- The code to check for 7zip assumes Windows. It would be good to accommodate Linux/docker, where the executable is
p7zip
and you decompress withp7zip -d
. - At first, the
GFv1.1
snippet failed with an error saying thatsb
didn't exist. I think there may be a bug with verifying authentication, or something like that. I ransbtools::authenticate_sb()
again and then it worked fine.
download.file()
call
Snippets with failed The nwm_topology
snippet died on a failure to get this file, and the res
snippet failed trying to get this file. I strongly suspect this was due to download timeout - they both appear to have had options(timeout = 60)
. Setting a higher timeout would probably fix this (happy to test this guess if useful). I downloaded them by hand through my browser.
Snippets with an inaccessible ScienceBase item
Several snippets failed due to inaccessible ScienceBase items.
SWIM g3gf
Quitting from lines 60-79 (00_get_data.Rmd)
Error in sbtools::item_file_download("5dcd5f96e4b069579760aedb", names = g3gf, :
Item does not contain all requested files
GageLocGFinfo.dbf
This is attempting to download from SB ID 5dcd5f96e4b069579760aedb
(same as SWIM g3gf), which I can't see.
Various sections sharing the same SB ID
Several snippets failed due to inability to access 5dbc53d4e4b06957974eddae
on SB. Is this a placeholder?
- Thermoelectric Facilities
- National Inventory of Dams
- NHDPlusV21_NationalData_GageLoc_05.7z
- MERIT HydroDEM
- AK GF Source data
- HI GF Source data
Snippet with an outdated URL?
The Islands
snippet fails with a "no such bucket" error when trying to download from this link. On the EPA site, it looks like this is the current link.