Review of DataRetrieval.jl
Review checklist for @[DataRetrieval.jl]
Background information for reviewers here
Please check off boxes as applicable, and elaborate in comments below.
- Code location https://code.usgs.gov/water/computational-tools/DataRetrieval.jl
- author @tclements
Conflict of interest
-
I confirm that I have no COIs with reviewing this work, meaning that there is no relationship with the product or the product's authors or affiliated institutions that could influence or be perceived to influence the outcome of the review (if you are unsure whether you have a conflict, please speak to your supervisor before starting your review).
Adherence to Fundamental Science Practices
-
I confirm that I read and will adhere to the Federal Source Code Policy for Scientific Software and relevant federal guidelines for approved software release as outlined in SM502.1 and SM502.4.
Security Review
-
No proprietary code is included -
No Personally Identifiable Information (PII) is included -
No other sensitive information such as data base passwords are included
General checks
-
Repository: Is the source code for this software available? -
License: Does the repository contain a plain-text LICENSE file? -
Disclaimer: Does the repository have the USGS-required provisional Disclaimer? -
Contribution and authorship: Has the submitting author made major contributions to the software? Does the full list of software authors seem appropriate and complete? -
Does the repository have a code.json file?
Documentation
-
A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is? -
Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution. -
Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems)? -
Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)? -
Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified? -
Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support? This information could be found in the README, CONTRIBUTING, or DESCRIPTION sections of the documentation. -
References: When present, do references in the text use the proper citation syntax?
Functionality
-
Installation: Does installation succeed as outlined in the documentation? -
Functionality: Have the functional claims of the software been confirmed? -
Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.) -
Automated tests: Do unit tests cover essential functions of the software and a reasonable range of inputs and conditions? Do all tests pass when run locally? -
Packaging guidelines: Does the software conform to the applicable packaging guidelines? R packaging guidelines here; Python packaging guidelines here
Review Comments
DataRetrieval.jl is a Julia language package for requesting and downloading USGS water data from web services. DataRetrieval.jl follow the style of previous dataretrieval packages written in R and Python. DataRetrieval.jl is well documented and follows a consistent Julia syntax. The included examples in the docs page were illustrative and helpful.
I installed DataRetrieval.jl in a clean Julia v1.7.1 environment using the provided instructions. I had a problem running the first example in the README.md page:
julia> using DataRetrieval
julia> df, response = readNWISsite("05114000")
ERROR: UndefVarError: wait_connected not defined
...
I was able to fix this by adding the latest version of HTTP.jl to my environment. After this fix, I was able to run the examples on the docs page and pass the test suite. I believe this could be fixed on the developer side by adding compat entries for dependent packages in the Project.toml file.
Suggestions
These suggestions are not required but could improve the user experience.
- I suggest removing Plots.jl as a dependency in DataRetrieval.jl. This would have two benefits:
- Faster installation and startup time. Plots.jl is a heavyweight dependency.
- Make DataRetrieval.jl plotting package agnostic. As most of DataRetrieval.jl's user are coming from R and Python, they may be interested in using PyPlot.jl, based on the popular Pyplot python package, and Gadfly.jl, based on the popular ggplot R package, for plotting.
-
If the author(s) consider registering DataRetrieval.jl as an official Julia package in the future, I would suggest changing the name of DataRetrieval.jl to something more specific, such as USGSWaterRetrieval.jl.
-
DataRetrieval.jl returns requests in columnar format in
DataFrame
structures. Currently all data are returned asString
type. The_readRDB
function could be changed to automatically detect column formats by passing known types for each request parameter, e.g.types=(:dec_lat_va=Float64, :site_no=Int,...)
, to theCSV.File
function. -
In the documentation I would change the line
discharge = map(x->parse(Float64,x), df."69928_00060")
todischarge = parse.(Float64, df."69928_00060")
to return aVector{Float64}
rather than aPooledArrays.PooledVector{Float64, UInt32, Vector{UInt32}}
.
Reviewer checklist source statement
This checklist combines elements of the rOpenSci review guidelines and the Journal of Open Source Science (JOSS) review checklist: it has been modified for use with USGS software releases.