-
Wieferich, Daniel Joseph authoredWieferich, Daniel Joseph authored
pydrip
The Dam Removal Information Portal (DRIP) is an online representation and visualization tool for the USGS Dam Removal Science Database. DRIP provides a map-based set of visualizations of studied dam removals. The database is intended to be regularly updated so that it represents the most up to date information about the scientific studies associated with dam removals. This pydrip Python package helps to manage and update the source data for DRIP.
USGS Software Release Information
The official USGS software release can be found at https://doi.org/10.5066/P92RIOJS. The main branch will have the most up-to-date version of the code. This branch can be cited as follows.
Wieferich, D.J., Alley, J., Aschmann, M. 2021. pydrip Version-1.0.6. U.S. Geological Survey software release. https://doi.org/10.5066/P92RIOJS.
Contributors
- Daniel Wieferich (dwieferich@usgs.gov): U.S. Geological Survey, Science Analytics and Synthesis
- Jeff Alley (jalley@contractor.usgs.gov): Astor Nationwide Joint Venture, contractor to the U.S. Geological Survey, Fort Collins Science Center
- Matt Aschmann: Cherokee Nation Technologies, contractor to the U.S. Geological Survey, Fort Collins Science Center
Description
This pydrip Python package handles retrieval and preparation of the source data for the Dam Removal Information Portal (DRIP) API. Source data currently come from two sources. The USGS Dam Removal Science Database is distributed by USGS in ScienceBase and a complete list of dam removals is distributed by American Rivers in Figshare. We provide this pydrip Python package in order to support full transparency on what we are doing to merge and prepare the data, to allow us to more easily update DRIP, and to serve as a building block for anyone else that may want to do something similar.
pydrip Modules
drip_sources.py : The drip_sources module contains functions that retrieve and format source data. It uses the ScienceBase and Figshare APIs to retrieve the most current versions of source data from the two data sources: the USGS Dam Removal Science Database and American Rivers Dam Removal Database. Both of the current sources are available in CSV file format.
drip_dam.py : The drip_dam module contains a Python Class 'Dam', allowing us to build an object to store information about any one given dam. In some cases the same dams are represented in both datasets (linked by field AR_ID). When this is the case we take information from the USGS Dam Removal Science Database first, and fill in any missing data using the American Rivers Dam Removal Database.
bis_pipeline.py : Methods used to retrieve source data, process source data, and move process data into Biogeographic Information System pipeline which pushes data to an Elastic Search index that is used for the Dam Removal Information Portal API.
Dependencies and Quick Start
The package uses some basic Python >=3.7 tools along with the specific requirements found in requirements.txt.
It is recommended that you set up a discrete Python >=3.7 environment for this project using your tool of choice. You can install from source with a local clone or directly from the source repo with...
pip install git+https://code.usgs.gov/sas/bioscience/drip/pydrip.git
Note Windows users may need to reconfigure the Shapely package. Testing has shown this to be resolved by pip uninstall Shapely, followed by installation of Shapely with conda.
pip uninstall Shapely
conda install Shapely==1.7.1
To run a local instance of the code run the following after setting up the Python environment and navigating to the pydrip directory. Running the 'python drip_pipeline.py' command will export local .CSV instances of the data tables to help with review of the data before merging into the larger DRIP pipeline(s). The following .CSV files will be created: Accession.csv, Citation.csv, dam removal science.csv, Dam.csv, dam_removals.csv, DamCitations.csv, Design.csv, Results.csv, source_datasets.csv.
python drip_pipeline.py
Testing
Tests are built to validate data processing steps. To run tests first create and activate a discrete Python environment, then navigate to the repository's main directory. Next install packages found in requirements_dev.txt and run pytests as follows...
pip install requirements_dev.py
python -m pytest
Versioning
Bumpversion is used to version this code. To version, create a development environment that includes bumpversion and use the appropriate command below.
Small adjustments to code, improved documentation, and/or updated tests
bumpversion patch --allow-dirty
Improved methods within modules
bumpversion minor --allow-dirty
New module, or release
bumpversion major --allow-dirty
Copyright and License
This USGS product is considered to be in the U.S. public domain, and is licensed under unlicense_
.. _unlicense: https://unlicense.org/
This software is preliminary or provisional and is subject to revision. It is being provided to meet the need for timely best science. The software has not received final approval by the U.S. Geological Survey (USGS). No warranty, expressed or implied, is made by the USGS or the U.S. Government as to the functionality of the software and related material nor shall the fact of release constitute any such warranty. The software is provided on the condition that neither the USGS nor the U.S. Government shall be held liable for any damages resulting from the authorized or unauthorized use of the software.