Incorperate dask to improve multiprocessing capabilities and arbitrarily scale the library! (and make a Xarray terrain engine?)

While we are very happy with the new support of N dimension parameter grids added in version 2.0, accumulating a parameter across the 288 5-minute time steps in a day for example may be computationally intensive. The USGS's supercomputing resources are one way to get around this, but it is entirely feasible to scale this library up to a next-generational level of performance by natively supporting dask.distributed parallelization!

For example, if fcpgtools was able to check for the presence of dask.distributed in the virtual environment, we could then chunk our N-dimensional xarray.DataArray along the time dimension, and then apply the accumulate parameter function at M time steps simultaneously (where M is the # of cores being utilized by a dask.distributed.Client().

This could be even more efficient if we supported accumulation functions in xarray nmatively, rather than swapping in and out of custom pysheds.Raster/Grid objects. That said, most the performance gain is possible with the existing pysheds terrain engine given integration of dask.distributed and using dask.distributed.futures to rebuild our N dimension output in the desired way.

Very exciting stuff! We could image a work where hundreds of cores could process a timeseries of xarray data at once with only one time step loaded into memory per core.

One limitation is that we can only chunk along the time axis, as accumulation algos need to be able to cross lat/long chunk boundaries, which would actually slow down performance due to communication between multiprocesses.