Enable downloading subsets for certain operations.
Currently, it's hard to know which data is necessary in order to run the pipeline. The 00_get_data.Rmd
script tries to download a whole lot of data. This is a barrier to entry for running the pipeline. Anybody who doesn't have at least 100 GB to spare will run out of space, and the downloads take a long time.
It might be nice to make it clear how to winnow this down. This could start with a configuration section specifying which types of data are necessary, and which regions. It would also be nice for regional modelers to have the ability to download a single VPU for their region, rather than download the entire seamless lower 48 hydrofabric. I don't yet know enough about use cases to get into the details, but as we develop the pipeline into a product which modelers can use themselves, we will want to put some thought into it. Just making a note for now.
CC @mewieczo