The dataRetrievaldemo package was created to simplify the process of loading hydrologic data into the R environment. It is designed to retrieve the major data types of U.S. Geological Survey (USGS) hydrologic data that are available on the Web, as well as data from the Water Quality Portal, which currently houses water quality data from the Environmental Protection Agency (EPA), U.S. Department of Agriculture (USDA), and USGS.
The dataRetrievaldemo package was created to simplify the process of loading hydrologic data into the R environment. It is designed to retrieve the major data types of U.S. Geological Survey (USGS) hydrologic data that are available on the Web, as well as data from the Water Quality Portal (WQP), which currently houses water quality data from the Environmental Protection Agency (EPA), U.S. Department of Agriculture (USDA), and USGS. Direct USGS data is obtained from a service called the National Water Information System (NWIS). A lot of useful information about NWIS can be obtained here:
\url{http://help.waterdata.usgs.gov/}
For information on getting started in R and installing the package, see (\ref{sec:appendix1}): Getting Started. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
For information on getting started in R and installing the package, see (\ref{sec:appendix1}): Getting Started. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
In this section, examples of Web retrievals document how to get raw data. This data includes site information (\ref{sec:usgsSite}), measured parameter information (\ref{sec:usgsParams}), historical daily values(\ref{sec:usgsDaily}), unit values (which include real-time data but can also include other sensor data stored at regular time intervals) (\ref{sec:usgsRT}), water quality data (\ref{sec:usgsWQP}), groundwater level data (\ref{sec:gwl}), peak flow data (\ref{sec:peak}), rating curve data (\ref{sec:rating}, and surface-water measurement data (\ref{sec:meas}). We will mainly use the Choptank River near Greensboro, MD as an example. Daily discharge measurements are available as far back as 1948. Additionally, nitrate has been measured since 1964.
In this section, examples of Web retrievals document how to get raw data. This data includes site information (\ref{sec:usgsSite}), measured parameter information (\ref{sec:usgsParams}), historical daily values(\ref{sec:usgsDaily}), unit values (which include real-time data but can also include other sensor data stored at regular time intervals) (\ref{sec:usgsRT}), water quality data (\ref{sec:usgsWQP}), groundwater level data (\ref{sec:gwl}), peak flow data (\ref{sec:peak}), rating curve data (\ref{sec:rating}, and surface-water measurement data (\ref{sec:meas}). We will mainly use the Choptank River near Greensboro, MD as an example. Daily discharge data are available as far back as 1948. Additionally, nitrate has been measured since 1964.
The USGS organizes hydrologic data in a standard structure. Streamgages are located throughout the United States, and each streamgage has a unique ID (referred in this document and throughout the dataRetrievaldemo package as \enquote{siteNumber}). Often (but not always), these ID's are 8 digits. The first step to finding data is discovering this siteNumber. There are many ways to do this, one is the National Water Information System: Mapper \url{http://maps.waterdata.usgs.gov/mapper/index.html}.
The USGS organizes hydrologic data in a standard structure. Streamgages are located throughout the United States, and each streamgage has a unique ID (referred in this document and throughout the dataRetrievaldemo package as \enquote{siteNumber}). Often (but not always), these ID's are 8 digits. The first step to finding data is discovering this siteNumber. There are many ways to do this, one is the National Water Information System: Mapper \url{http://maps.waterdata.usgs.gov/mapper/index.html}.
Once the siteNumber is known, the next required input for USGS data retrievals is the \enquote{parameter code}. This is a 5-digit code that specifies the measured parameter being requested. For example, parameter code 00631 represents \enquote{Nitrate plus nitrite, water, filtered, milligrams per liter as nitrogen}, with units of \enquote{mg/l as N}. A complete list of possible USGS parameter codes can be found at \url{http://nwis.waterdata.usgs.gov/usa/nwis/pmcodes?help}.
Once the siteNumber is known, the next required input for USGS data retrievals is the \enquote{parameter code}. This is a 5-digit code that specifies the measured parameter being requested. For example, parameter code 00631 represents \enquote{Nitrate plus nitrite, water, filtered, milligrams per liter as nitrogen}, with units of \enquote{mg/l as N}.
A useful place to discover USGS codes information, along with other NWIS information is:
Two output columns that may not be obvious are \enquote{srsname} and \enquote{casrn}. Srsname stands for \enquote{Substance Registry Services}. More information on the srs name can be found here:
For unit values data (sensor data measured at regular time intervals such as 15 minutes or hourly), knowing the parameter code and siteNumber is enough to make a request for data. For most variables that are measured on a continuous basis, the USGS also stores the historical data as daily values. These daily values are statistical summaries of the continuous data, e.g. maximum, minimum, mean, or median. The different statistics are specified by a 5-digit statistics code. A complete list of statistic codes can be found here:
For unit values data (sensor data measured at regular time intervals such as 15 minutes or hourly), knowing the parameter code and siteNumber is enough to make a request for data. For most variables that are measured on a continuous basis, the USGS also stores the historical data as daily values. These daily values are statistical summaries of the continuous data, e.g. maximum, minimum, mean, or median. The different statistics are specified by a 5-digit statistics code. A complete list of statistic codes can be found here:
To discover what data is available for a particular USGS site, including measured parameters, period of record, and number of samples (count), use the \texttt{whatNWISdata} function. It is possible to limit the retrieval information to a subset of services (\texttt{"}dv\texttt{"}, \texttt{"}uv\texttt{"}, or \texttt{"}qw\texttt{"}). In the following example, we limit the retrieved Choptank data to only daily data. Leaving the \texttt{"}service\texttt{"} argument blank returns all of the available data for that site.
To discover what data is available for a particular USGS site, including measured parameters, period of record, and number of samples (count), use the \texttt{whatNWISdata} function. It is possible to limit the retrieval information to a subset of services for example (\texttt{"}dv\texttt{"}, \texttt{"}uv\texttt{"}, or \texttt{"}qw\texttt{"}). In the following example, we limit the retrieved Choptank data to only daily data. The deafault for \texttt{"}service\texttt{"} is \enquote{all}, which returns all of the available data for that site. Likewise, there are arguments for parameter code (parameterCd) and statistic code (statCd) to filter the results. The default for both is to return all possible values (\enquote{all}).
caption="Daily mean data availabile at the Choptank River near Greensboro, MD. [Some columns deleted for space considerations]"),
caption="Reformatted version of output from \\texttt{whatNWISdata} function for the Choptank River near Greensboro, MD, and from Seneca Creek at Dawsonville, MD from the daily values service [Some columns deleted for space considerations]"),
caption.placement="top",
caption.placement="top",
size = "\\footnotesize",
size = "\\footnotesize",
latex.environment=NULL,
latex.environment=NULL,
...
@@ -430,7 +444,10 @@ A specific example piece of information, in this case parameter name, can be obt
...
@@ -430,7 +444,10 @@ A specific example piece of information, in this case parameter name, can be obt
<<siteNames, echo=TRUE>>=
<<siteNames, echo=TRUE>>=
parameterINFO$parameter_nm
parameterINFO$parameter_nm
@
@
Parameter information can obtained from \url{http://nwis.waterdata.usgs.gov/usa/nwis/pmcodes}
The column \texttt{"}datetime\texttt{"} in the returned dataframe is automatically imported as a variable of class \texttt{"}Date\texttt{"} in R. Each requested parameter has a value and remark code column. The names of these columns depend on the requested parameter and stat code combinations. USGS remark codes are often \texttt{"}A\texttt{"} (approved for publication) or \texttt{"}P\texttt{"} (provisional data subject to revision). A more complete list of remark codes can be found here:
The column \texttt{"}datetime\texttt{"} in the returned dataframe is automatically imported as a variable of class \texttt{"}Date\texttt{"} in R. Each requested parameter has a value and remark code column. The names of these columns depend on the requested parameter and stat code combinations. USGS daily value qualification codes are often \texttt{"}A\texttt{"} (approved for publication) or \texttt{"}P\texttt{"} (provisional data subject to revision). A more complete list of daily value qualification codes can be found here:
Daily data is pulled from \url{http://waterservices.usgs.gov/rest/DV-Test-Tool.html}.
Daily data is pulled from \url{http://waterservices.usgs.gov/rest/DV-Test-Tool.html}.
The column names can be automatically adjusted based on the parameter and statistic codes using the \texttt{renameNWISColumns} function. This is not necessary, but may be useful when analyzing the data.
The column names can be shortened and simplified using the \texttt{renameNWISColumns} function. This is not necessary, but may streamline subsequent data analysis and presentation.
<<label=renameColumns, echo=TRUE>>=
<<label=renameColumns, echo=TRUE>>=
names(temperatureAndFlow)
names(temperatureAndFlow)
...
@@ -519,17 +537,30 @@ Any data collected at regular time intervals (such as 15-minute or hourly) are k
...
@@ -519,17 +537,30 @@ Any data collected at regular time intervals (such as 15-minute or hourly) are k
The retrieval produces a data frame that contains 96 rows (one for every 15 minute period in the day). They include all data collected from the startDate through the endDate (starting and ending with midnight locally-collected time). The dateTime column is converted to \enquote{UTC} (Coordinated Universal Time), so midnight EST will be 5 hours earlier in the dateTime column (the previous day, at 7pm).
<<dischargeData, echo=TRUE>>=
<<dischargeData, echo=TRUE>>=
head(dischargeToday)
head(dischargeUnit)
@
@
Note that time now becomes important, so the variable datetime is a POSIXct, and the reported time zone is included in a separate column. The datetime column is converted automatically to \enquote{UTC} (Coordinated Universal Time). To override the timezone,
To override the UTC timezone, specify a valid timezone in the tz argument. Default is \texttt{""}, which will keep the dateTime column in UTC. Other valid timezones are:
\begin{verbatim}
America/New_York
America/Chicago
America/Denver
America/Los_Angeles
America/Anchorage
America/Honolulu
America/Jamaica
America/Managua
America/Phoenix
America/Metlakatla
\end{verbatim}
Data are retrieved from \url{http://waterservices.usgs.gov/rest/IV-Test-Tool.html}. There are occasions where NWIS values are not reported as numbers, instead a common example is \enquote{Ice.} Any value that cannot be converted to a number will be reported as NA in this package.
Data are retrieved from \url{http://waterservices.usgs.gov/rest/IV-Test-Tool.html}. There are occasions where NWIS values are not reported as numbers, instead a common example is \enquote{Ice.} Any value that cannot be converted to a number will be reported as NA in this package.
Peak discharge measurements can be obtained with the \texttt{readNWISpeak} function.
Peak flow data are instantaneous discharge or stage data that record the maximum values of these variables during a flood event. They include the annual peak flood event but can also include records of other peaks that are lower than the annual maximum. Peak discharge measurements can be obtained with the \texttt{readNWISpeak} function.
Rating curves are the calibration curves that are used to convert measurements of stage to discharge. Because of changing hydrologic conditions these rating curves change over time.
Rating curves can be obtained with the \texttt{readNWISrating} function.
Rating curves can be obtained with the \texttt{readNWISrating} function.
There are additional water quality data sets available from the Water Quality Data Portal (\url{http://www.waterqualitydata.us/}). These data sets can be housed in either the STORET database (data from EPA), NWIS database (data from USGS), STEWARDS database (data from USDA), and additional databases are slated to be included. Because only USGS uses parameter codes, a \texttt{"}characteristic name\texttt{"} must be supplied. The \texttt{readWQPqw} function can take either a USGS parameter code, or a more general characteristic name in the parameterCd input argument. The Water Quality Data Portal includes data discovery tools and information on characteristic names. The following example retrieves specific conductance from a DNR site in Wisconsin.
There are additional water quality data sets available from the Water Quality Data Portal (\url{http://www.waterqualitydata.us/}). These data sets can be housed in either the STORET database (data from EPA), NWIS database (data from USGS), STEWARDS database (data from USDA), and additional databases are slated to be included in the future. Because only USGS uses parameter codes, a \texttt{"}characteristic name\texttt{"} must be supplied. The \texttt{readWQPqw} function can take either a USGS parameter code, or a more general characteristic name in the parameterCd input argument. The Water Quality Data Portal includes data discovery tools and information on characteristic names. The following example retrieves specific conductance from a DNR site in Wisconsin.
<<label=getQWData, echo=TRUE, eval=FALSE>>=
<<label=getQWData, echo=TRUE, eval=FALSE>>=
...
@@ -798,21 +822,15 @@ This section describes the options for downloading and installing the dataRetrie
...
@@ -798,21 +822,15 @@ This section describes the options for downloading and installing the dataRetrie
If you are new to R, you will need to first install the latest version of R, which can be found here: \url{http://www.r-project.org/}.
If you are new to R, you will need to first install the latest version of R, which can be found here: \url{http://www.r-project.org/}.
At any time, you can get information about any function in R by typing a question mark before the functions name. This will open a file (in RStudio, in the Help window) that describes the function, the required arguments, and provides working examples.
At any time, you can get information about any function in R by typing a question mark before the functions name. This will open a file (in RStudio, in the Help window) that describes the function, the required arguments, and provides working examples. This will open a help file similar to Figure \ref{fig:help}. To see the raw code for a particular code, type the name of the function, without parentheses.
<<helpFunc,eval = FALSE>>=
<<helpFunc,eval = FALSE>>=
?readNWISpCode
?readNWISpCode
@
@
This will open a help file similar to Figure \ref{fig:help}.
\FloatBarrier
\FloatBarrier
To see the raw code for a particular code, type the name of the function, without parentheses.:
<<rawFunc,eval = TRUE>>=
readNWISpCode
@
\begin{figure}[ht!]
\begin{figure}[ht!]
\centering
\centering
...
@@ -834,10 +852,7 @@ vignette(dataRetrievaldemo)
...
@@ -834,10 +852,7 @@ vignette(dataRetrievaldemo)
The following command installs dataRetrievaldemo and subsequent required packages:
The following command installs dataRetrievaldemo and subsequent required packages:
After installing the package, you need to open the library each time you re-start R. This is done with the simple command:
After installing the package, you need to open the library each time you re-start R. This is done with the simple command:
...
@@ -899,12 +914,12 @@ Next, follow the steps below to open this file in Excel:
...
@@ -899,12 +914,12 @@ Next, follow the steps below to open this file in Excel:
\item Use the many formatting tools within Excel to customize the table
\item Use the many formatting tools within Excel to customize the table
\end{enumerate}
\end{enumerate}
From Excel, it is simple to copy and paste the tables in other Microsoft\textregistered\ software. An example using one of the default Excel table formats is here.
From Excel, it is simple to copy and paste the tables in other Microsoft\textregistered\ software. An example using one of the default Excel table formats is here. Additional formatting could be requried in Excel, for example converting u to $\mu$.