Newer
Older
This notebook pulls data from a number of sources and populates the data
directory. Any new data requirements should be added as code chunks here.
Each code chunk should create a path to the file you want to use in a process
step, check if that path exists, and put the data there if it does not. All
paths are stored in a list that is saved to the `cache` directory. If changes
are made to the output of this notebook, they should be checked in.
**If resources from ScienceBase need to be downloaded Rmarkdown document should be run from RStudio so username and password authentication will work**
```{r}
source("R/config.R")
source("R/1_get_data.R")
if(!dir.exists("data")) {dir.create("data")}
if(!dir.exists("bin")) {dir.create("bin")}

Blodgett, David L.
committed
data_dir <- "data"
out_list <- list("data_dir" = data_dir)
out_file <- file.path("cache", "data_paths.json")
sevenz <- "7z"
check_7z <- try(nhdplusTools:::check7z(), silent = TRUE)
if(is(check_7z, "try-error")) {
message("trying to download 7z -- it's not on your path")
# Download command-line Z-zip
if(!file.exists("bin/7za.exe")){
download.file("https://www.7-zip.org/a/7za920.zip",
destfile = "bin/7za920.zip")
unzip("bin/7za920.zip", exdir = "bin")
}
sevenz <- "bin/7za.exe"
}
initialize_sciencebase_session(username = Sys.getenv("sb_user"))
# Enable mapview rendering if desired
mapview <- FALSE
HUC12 (Hydrologic Unit Code, Level 12) outlets derived from the Watershed
Boundary Dataset indexed to the reference fabricform the baseline and extent of
national modeling fabrics.
# Blodgett, D.L., 2022, Mainstem Rivers of the Conterminous United States:
# U.S. Geological Survey data release, https://doi.org/10.5066/P9BTKP3T.
out_list <- c(
out_list,
list(hu12_points_path =
get_sb_file(item = "63cb38b2d34e06fef14f40ad",
item_files = "102020wbd_outlets.gpkg",
out_destination = data_dir)))
if(mapview)(mapview(read_sf(out_list$hu12_points_path)))
SWIM (Streamgage Watershed InforMation(SWIM) includes locations for 12,422 USGS
streamgages as indexed along the network of streams (flowlines) in NHDPlus
Version 2.1 (NHDPlus v2, Moore and Dewald, 2016). The dataset is one of two
datasets developed for the Streamgage Watershed InforMation (SWIM) project. This
dataset, which is referred to as “SWIM streamgage locations,” was created in
support of the second dataset of basin characteristics and disturbance indexes.
# Hayes, L., Chase, K.J., Wieczorek, M.E., and Jackson, S.E., 2021,
# USGS streamgages in the conterminous United States indexed to NHDPlus v2.1
# flowlines to support Streamgage Watershed InforMation (SWIM), 2021: U.S.
# Geological Survey data release, https://doi.org/10.5066/P9J5CK2Y.
out_list <- c(
out_list,
list(SWIM_points_path =
get_sb_file(item = "5ebe92af82ce476925e44b8f",
item_files = "all",
out_destination = file.path(data_dir, "SWIM_gage_loc"))))
if(mapview)(mapview(read_sf(out_list$SWIM_points_path)))
Sites associated with Work by the U.S. Geological Survey (USGS) to estimate
the amount of water that is withdrawn and consumed by thermoelectric power
plants (Diehl and others, 2013; Diehl and Harris, 2014; Harris and Diehl, 2019
Galanter and othes, 2023).
```{r Thermoelectric Facilities}
# Harris, Melissa A. and Diehl, Timothy H., 2017. A Comparison of Three
# Federal Datasets for Thermoelectric Water Withdrawals in the United States
# for 2010. Journal of the American Water Resources Association (JAWRA)
# 53(5): 1062– 1080. https://doi.org/10.1111/1752-1688.12551
#
# Galanter, A.E., Gorman Sanisaca, L.E., Skinner, K.D., Harris, M.A.,
# Diehl, T.H., Chamberlin, C.A., McCarthy, # B.A., # Halper, A.S.,
# Niswonger, R.G., Stewart, J.S., Markstrom, S.L., Embry, I., and
# Worland, S., 2023, Thermoelectric-power water use reanalysis for the
# 2008-2020 period by power plant, month, and year for the conterminous
# United States: U.S. Geological Survey data release,
# https://doi.org/10.5066/P9ZE2FVM.
TE_points_path <- file.path(data_dir, "TE_points")
dir.create(TE_points_path, recursive = TRUE, showWarnings = FALSE)
get_sb_file("5dbc53d4e4b06957974eddae",
"2015_TE_Model_Estimates_lat.long_COMIDs.7z",
get_sb_file("63adc826d34e92aad3ca5af4",
"galanter_and_others_2023.zip",
out_list <- c(out_list, list(TE_points_path = TE_points_path))
if(mapview)(mapview(read_sf(out_list$TE_points_path)))
```
Network locations made to improve the routing capabilities
and ancillary hydrologic attributes of NHDPlusV2 to support modeling and other
hydrologic analyses. The resulting enhanced network is named E2NHDPlusV2_us.
This includes the network locations associated with some diversions and
water use withdrawals.
```{r e2nhd supplemental data - USGS}
# Schwarz, G.E., 2019, E2NHDPlusV2_us: Database of Ancillary Hydrologic
# Attributes and Modified Routing for NHDPlus Version 2.1 Flowlines: U.S.
# Geological Survey data release, https://doi.org/10.5066/P986KZEM.
out_list <- c(
out_list,
list(USGS_IT_path =
get_sb_file("5d16509ee4b0941bde5d8ffe",
"supplemental_files.zip",
file.path(data_dir, "USGS_IT"))))
Two datasets relate hydro location information from the National Inventory of
Dams to the NHDPlus network. One effort is related to the SPARROW work
(Wieczorek and others, 2018), the other related to work quantifying impacts on
natural flow (Wieczorek and others, 2021).
```{r National Inventory of Dams}
# Wieczorek, M.E., Jackson, S.E., and Schwarz, G.E., 2018, Select Attributes
# for NHDPlus Version 2.1 Reach Catchments and Modified Network Routed
# Upstream Watersheds for the Conterminous United States (ver. 2.0,
# November 2019): U.S. Geological Survey data release,
# https://doi.org/10.5066/F7765D7V.
# Wieczorek, M.E., Wolock, D.M., and McCarthy, P.M., 2021, Dam
# impact/disturbance metrics for the conterminous United States, 1800 to 2018:
# U.S. Geological Survey data release, https://doi.org/10.5066/P92S9ZX6.
NID_points_path <- file.path(data_dir, "NID_points")
get_sb_file("5dbc53d4e4b06957974eddae",
"NID_attributes_20170612.txt",
NID_points_path)
get_sb_file("5fb7e483d34eb413d5e14873",
"Final_NID_2018.zip",
NID_points_path)
out_list <- c(out_list, list(NID_points_path = NID_points_path))
if(mapview)(mapview(read_sf(out_list$NID_points_path)))
This next section retrieves NHDPlus datasets related to national modeling
efforts. These include:
1. National Geodatbase
2. Hawaii, Puerto Rico, and Islands Geodatabase
3. GageLoc file of streamgages indexed to NHDPlusv2 flowlines
4. NHDPlusv2 catchment - HUC12 crosswalk.
# NHDPlus Seamless National Data - pulled from NHDPlus national data server;
# post-processed to RDS files by NHDPlusTools GageLoc - Gages snapped to
# NHDPlusv2 flowlines (QAQC not verified)
# NHDPlus HUC12 crosswalk
# Moore, R.B., Johnston, C.M., and Hayes, L., 2019, Crosswalk Table Between
# NHDPlus V2.1 and its Accompanying WBD Snapshot of 12-Digit Hydrologic
# Units: U.S. Geological Survey data release,
# https://doi.org/10.5066/P9CFXHGT.
epa_data_root <- "https://dmap-data-commons-ow.s3.amazonaws.com/NHDPlusV21/Data/"
nhdplus_dir <- file.path(data_dir, "NHDPlusNationalData")
nhdplus_gdb <- file.path(nhdplus_dir, "NHDPlusV21_National_Seamless_Flattened_Lower48.gdb")

Blodgett, David L.
committed
islands_dir <- file.path(data_dir, "islands")
islands_gdb <- file.path(islands_dir, "NHDPlusNationalData/NHDPlusV21_National_Seamless_Flattened_HI_PR_VI_PI.gdb/")
rpu <- file.path(nhdplus_dir, "NHDPlusGlobalData", "BoundaryUnit.shp")
get_sb_file("5dbc53d4e4b06957974eddae", "NHDPlusV21_NationalData_GageLoc_05.7z", nhdplus_dir)
get_sb_file("5c86a747e4b09388244b3da1", "CrosswalkTable_NHDplus_HU12_CSV.7z", nhdplus_dir)
# will download the 7z and unzip into the folder structure in nhdplus_gdb path
download_file(paste0(epa_data_root, "NationalData/NHDPlusV21_NationalData_Seamless_Geodatabase_Lower48_07.7z"),
out_path = data_dir, check_path = nhdplus_gdb)
download_file(paste0(epa_data_root, "NationalData/NHDPlusV21_NationalData_Seamless_Geodatabase_HI_PR_VI_PI_03.7z"),
out_path = islands_dir, check_path = islands_gdb)
# cache the huc12 layer in rds format
hu12_rds <- file.path(nhdplus_dir, "HUC12.rds")
if(!file.exists(hu12_rds)) {
read_sf(nhdplus_gdb, layer = "HUC12") |>
st_make_valid() |>
st_transform(crs = proj_crs) |>
# TODO: convert this to gpkg
saveRDS(file = hu12_rds)
get_sb_file("5dcd5f96e4b069579760aedb", "GageLocGFinfo.dbf", data_dir)
download_file(paste0(epa_data_root, "GlobalData/NHDPlusV21_NHDPlusGlobalData_03.7z"),
out_path = nhdplus_dir, check_path = rpu)
out_list <- c(out_list, list(nhdplus_dir = nhdplus_dir,
nhdplus_gdb = nhdplus_gdb,
islands_dir = islands_dir,
islands_gdb = islands_gdb,
Reference catchments and flowlines are hydrographic products that are derived
from the USGS National Hydrologic Geospatial Fabric and the National Oceanic
and Atmospheric Administration. The Reference Flowlines include modifications
from the National Water Model and e2nhd networks integrated into NHDPlusv2.1,
and the reference catchments are geometrically-simplified to POLYGON geometry to
improve rasterre-gridding efficiency, and have a large number of DEM artifacts
removed.
```{r Reference Fabric}
# Reference Fabric flowpaths and catchments derived by Mike Johnson (NOAA)
ref_fab_path <- file.path(data_dir, "reference_fabric")
ref_cat <- file.path(ref_fab_path, "reference_catchments.gpkg")
ref_fl <- file.path(ref_fab_path, "reference_flowline.gpkg")
nwm_fl <- file.path(ref_fab_path, "nwm_network.gpkg")
for (vpu in c("01", "08", "10L", "15", "02", "04", "05", "06", "07", "09",
"03S", "03W", "03N", "10U", "11", "12", "13", "14", "16",
"17", "18")) {
get_sb_file("6317a72cd34e36012efa4e8a",
paste0(vpu, "_reference_features.gpkg"),
ref_fab_path)
get_sb_file("61295190d34e40dd9c06bcd7",
c("reference_catchments.gpkg", "reference_flowline.gpkg", "nwm_network.gpkg"),
out_destination = ref_fab_path)
out_list <- c(out_list, list(ref_fab_path = ref_fab_path,
ref_cat = ref_cat, ref_fl = ref_fl, nwm_fl = nwm_fl))
NHDPlus Waterbody and Area Polygons converted to an RDS file for easier
loading within R.
```{r NHDPlusV2 Waterbodies}
# Waterbodies - derived after downloading and post-processing
# NHDPlus Seamless National Geodatabase
# Compacted here into a GDB
waterbodies_path <- file.path(nhdplus_dir, "nhdplus_waterbodies.rds")
message("formatting NHDPlus watebodies...")
data.table::rbindlist(
list(
read_sf(out_list$nhdplus_gdb, "NHDWaterbody") |>
st_transform(proj_crs) |>
mutate(layer = "NHDWaterbody"),
read_sf(out_list$nhdplus_gdb, "NHDArea") |>
st_transform(proj_crs) |>
mutate(layer = "NHDArea")
), fill = TRUE) |>
st_as_sf() |>
saveRDS(waterbodies_path)
out_list <- c(out_list, list(waterbodies_path = waterbodies_path))
Formatting a full list of network and non-network catchments for the NHDPlus
domains. This more easily tracks catchments were are off and on the network
when aggregating at points of interest.
fullcat_path <- file.path(nhdplus_dir, "nhdcat_full.rds")
islandcat_path <- file.path(islands_dir, "nhdcat_full.rds")
if(!file.exists(fullcat_path))
saveRDS(cat_rpu(out_list$ref_cat, nhdplus_gdb),
fullcat_path)
if(!file.exists(islandcat_path))
saveRDS(cat_rpu(out_list$islands_gdb, islands_gdb),
islandcat_path)
out_list <- c(out_list, list(fullcats_table = fullcat_path, islandcats_table = islandcat_path))
Download NHDPlusV2 FDR and FAC grids for refactoring and catcment splitting.
```{r NHDPlusV2 FDR_FAC}
# NHDPlus FDR/FAC grids available by raster processing unit
# TODO: set this up for a per-region download for #134
out_list<- c(out_list, make_fdr_fac_list(file.path(data_dir, "fdrfac")))
Download NHDPlusV2 elevation grids for headwater extensions and splitting
catchments into left and right banks.
# NHDPlus elev grids available by raster processing unit
# TODO: set this up for a per-region download for #134
out_list<- c(out_list, make_nhdplus_elev_list(file.path(data_dir, "nhdplusv2_elev")))
Merrit Topographic and Hydrographic data for deriving GIS Features of the
National Hydrologic Modeling, Alaska Domain
```{r MERIT HydroDEM}
# MERIT HydroDEM - used for AK Geospatial Fabric, and potentially
# Mexico portion of R13
#-----------------------------------------------------------------------------
# Yamazaki, D., Ikeshima, D., Sosa, J., Bates, P. D., Allen, G. H., &
# Pavelsky, T. M. ( 2019). MERIT Hydro: a high‐resolution global hydrography
# map based on latest topography dataset. Water Resources Research, 55,
# 5053– 5073. https://doi.org/10.1029/2019WR024873
merit_dir <- file.path(data_dir, "merged_AK_MERIT_Hydro")
get_sb_file("5dbc53d4e4b06957974eddae", "merged_AK_MERIT_Hydro.zip", merit_dir)
# TODO: update to use "6644f85ed34e1955f5a42dc4" when released (roughly Dec 10,)
get_sb_file("5fbbc6b6d34eb413d5e21378", "dem.zip", merit_dir)
get_sb_file("64ff628ed34ed30c2057b430",
c("ak_merit_dem.zip", "ak_merit_fdr.zip", "ak_merit_fac.zip"),
merit_dir)
out_list <- c(
out_list,
list(merit_catchments = file.path(merit_dir,
"merged_AK_MERIT_Hydro",
"cat_pfaf_78_81_82_MERIT_Hydro_v07_Basins_v01.shp"),
merit_rivers = file.path(merit_dir,
"merged_AK_MERIT_Hydro",
"riv_pfaf_78_81_82_MERIT_Hydro_v07_Basins_v01.shp"),
aster_dem = file.path(merit_dir, "dem.tif"),
merit_dem = file.path(merit_dir, "ak_merit_dem.tif"),
merit_fdr = file.path(merit_dir, "ak_merit_fdr.tif"),
merit_fac = file.path(merit_dir, "ak_merit_fac.tif")))
```
Source data for deriving GIS Featurs of the National Hydrologic Modeling,
# TODO: fix this citation
# Bock, A.R., Rosa, S.N., McDonald, R.R., Wieczorek, M.E., Santiago, M.,
# Blodgett, D.L., and Norton, P.A., 2024, Geospatial Fabric for National
# Hydrologic Modeling, Hawaii Domain: U.S. Geological Survey data release,
# https://doi.org/10.5066/P9HMKOP8
get_sb_file("5dbc53d4e4b06957974eddae", AK_GF_source, AK_dir)
out_list <- c(out_list, list(ak_source = file.path(AK_dir, "ak.gpkg")))
Source data for deriving GIS Featurs of the National Hydrologic Modeling,
Hawaii Domain
# Bock, A.R., Rosa, S.N., McDonald, R.R., Wieczorek, M.E., Santiago, M.,
# Blodgett, D.L., and Norton, P.A., 2024, Geospatial Fabric for National
# Hydrologic Modeling, Hawaii Domain: U.S. Geological Survey data release,
# https://doi.org/10.5066/P9HMKOP8
get_sb_file("5dbc53d4e4b06957974eddae", "hi.7z", islands_dir)
out_list <- c(out_list, list(hi_source = file.path(islands_dir, "hi.gpkg")))
GIS Features of the Geospatial Fabric for National Hydrologic Modeling,
version 1.1, Transboundary Geospatial Fabric
# Bock, A.E, Santiago,M., Wieczorek, M.E., Foks, S.S., Norton, P.A., and
# Lombard, M.A., 2020, Geospatial Fabric for National Hydrologic Modeling,
# version 1.1 (ver. 3.0, November 2021): U.S. Geological Survey data release,
# https://doi.org/10.5066/P971JAGF.
out <- list(GFv11_gages_lyr = file.path(data_dir, "GFv11/GFv11_gages.rds"),
GFv11_gdb = file.path(GFv11_dir, "GFv1.1.gdb"),
GFv11_tgf = file.path(GFv11_dir, "TGF.gdb"))
get_sb_file("5e29d1a0e4b0a79317cf7f63", "GFv1.1.gdb.zip", GFv11_dir)
get_sb_file("5d967365e4b0c4f70d113923", "TGF.gdb.zip", GFv11_dir)
cat("", file = file.path(GFv11_dir, "GFv1.1.gdb.zip"))
cat("", file = file.path(GFv11_dir, "TGF.gdb.zip"))
# Extract gages
read_sf(out$GFv11_gdb, "POIs_v1_1") |>
filter(Type_Gage != 0) |>
saveRDS(out$GFv11_gages_lyr)
if(mapview)(mapview(readRDS(out_list$GFv11_gages_lyr)))
# Falcone, J., 2011, GAGES-II: Geospatial Attributes of Gages for Evaluating
# Streamflow: U.S. Geological Survey data release, https://doi.org/10.5066/P96CPHOT.
get_sb_file("631405bbd34e36012efa304a", "gagesII_9322_point_shapefile.zip", SWIM_points_path)
out_list <- c(out_list, list(
gagesii_lyr = file.path(SWIM_points_path, "gagesII_9322_point_shapefile")))
HILARRI dataset of Network-indexed Hydropower structures, reservoirs, and
locations
```{r HILARRI}
# Carly H. Hansen and Paul G. Matson. 2023. Hydropower Infrastructure - LAkes,
# Reservoirs, and RIvers (HILARRI), # Version 2. HydroSource. Oak Ridge
# National Laboratory, Oak Ridge, Tennessee, USA.
# DOI: https/doi.org/10.21951/HILARRI/1960141
hilarri_dir <- file.path(data_dir, "HILARRI")
hilarri_out <- list(hilari_sites = file.path(hilarri_dir, "HILARRI_v2.csv"))
download_file("https://hydrosource.ornl.gov/sites/default/files/2023-03/HILARRI_v2.zip",
out_path = hilarri_dir, check_path = hilarri_out$hilari_sites)
if(mapview) {
mapview(st_as_sf(read.csv(out_list$hilari_sites),
coords = c("longitude", "latitude"),
crs = 4326))
}
```{r Reservoir datasets}
# ResOpsUS
# Steyaert, J.C., Condon, L.E., W.D. Turner, S. et al. ResOpsUS, a dataset of
# historical reservoir operations in the contiguous United States. Sci Data 9,
# 34 (2022). https://doi.org/10.1038/s41597-022-01134-7
# GRanD
# Lehner, B., C. Reidy Liermann, C. Revenga, C. Vörösmarty, B. Fekete,
# P. Crouzet, P. Döll, M. Endejan, K. Frenken, J. Magome, C. Nilsson,
# J.C. Robertson, R. Rodel, N. Sindorf, and D. Wisser. 2011. High-resolution
# mapping of the world’s reservoirs and dams for sustainable river-flow
# management. Frontiers in Ecology and the Environment 9 (9): 494-502.
# https://ln.sync.com/dl/bd47eb6b0/anhxaikr-62pmrgtq-k44xf84f-pyz4atkm/view/default/447819520013
res_path <- file.path(data_dir,"reservoir_data")
# Set Data download links
res_att_url <- "https://zenodo.org/record/5367383/files/ResOpsUS.zip?download=1"
# ISTARF - Inferred Storage Targets and Release Functions for CONUS large reservoirs
istarf_url <- "https://zenodo.org/record/4602277/files/ISTARF-CONUS.csv?download=1"
GRanD_zip <- file.path(res_path, "GRanD_Version_1_3.zip")
download_file(res_att_url, res_path, file_name = "ResOpsUS.zip")
tab_out <- c(out_list, list(res_attributes = file.path(res_path, "ResOpsUS", "attributes",
"reservoir_attributes.csv")))
istarf_csv <- file.path(res_path, "ISTARF-CONUS.csv")
download_file(istarf_url, res_path, istarf_csv, file_name = "ISTARF-CONUS.csv")
out_list <- c(out_list, list(istarf = istarf_csv))
grand_dir <- file.path(res_path, "GRanD_Version_1_3")
if(!dir.exists(grand_dir)) {
if(!file.exists(GRanD_zip))
stop("Download GRanD data from https://ln.sync.com/dl/bd47eb6b0/anhxaikr-62pmrgtq-k44xf84f-pyz4atkm/view/default/447819520013 to ",
res_path)
unzip(GRanD_zip, exdir = res_path)
out_list <- c(out_list, list(GRanD = grand_dir))
resops_to_nid_path <- file.path(res_path, "cw_ResOpsUS_NID.csv")
get_sb_file("5dbc53d4e4b06957974eddae", "cw_ResOpsUS_NID.csv", resops_to_nid_path)
out_list <- c(out_list, list(resops_NID_CW = resops_to_nid_path))
```{r nldi}
# NLDI feature data sources
# https://www.sciencebase.gov/catalog/item/60c7b895d34e86b9389b2a6c
nldi_dir <- file.path(data_dir, "nldi")
get_sb_file("60c7b895d34e86b9389b2a6c", "all", nldi_dir)
```{r}
write_json(out_list, path = out_file, pretty = TRUE, auto_unbox = TRUE)

Blodgett, David L.
committed
rm(out_list)