Skip to content
Snippets Groups Projects
dataRetrieval.tex 69.8 KiB
Newer Older
%\VignetteIndexEntry{Introduction to the dataRetrieval package}
%\VignetteEngine{knitr::knitr}
%\VignetteDepends{}
%\VignetteSuggests{xtable,EGRET}
%\VignetteImports{zoo, XML, RCurl}
%\VignettePackage{dataRetrieval}

\documentclass[a4paper,11pt]{article}\usepackage[]{graphicx}\usepackage[]{color}
%% maxwidth is the original width if it is less than linewidth
%% otherwise use linewidth (to make sure the graphics do not exceed the margin)
\makeatletter
\def\maxwidth{ %
  \ifdim\Gin@nat@width>\linewidth
    \linewidth
  \else
    \Gin@nat@width
  \fi
}
\makeatother

\definecolor{fgcolor}{rgb}{0.345, 0.345, 0.345}
\newcommand{\hlnum}[1]{\textcolor[rgb]{0.686,0.059,0.569}{#1}}%
\newcommand{\hlstr}[1]{\textcolor[rgb]{0.192,0.494,0.8}{#1}}%
\newcommand{\hlcom}[1]{\textcolor[rgb]{0.678,0.584,0.686}{\textit{#1}}}%
\newcommand{\hlopt}[1]{\textcolor[rgb]{0,0,0}{#1}}%
\newcommand{\hlstd}[1]{\textcolor[rgb]{0.345,0.345,0.345}{#1}}%
\newcommand{\hlkwa}[1]{\textcolor[rgb]{0.161,0.373,0.58}{\textbf{#1}}}%
\newcommand{\hlkwb}[1]{\textcolor[rgb]{0.69,0.353,0.396}{#1}}%
\newcommand{\hlkwc}[1]{\textcolor[rgb]{0.333,0.667,0.333}{#1}}%
\newcommand{\hlkwd}[1]{\textcolor[rgb]{0.737,0.353,0.396}{\textbf{#1}}}%

\usepackage{framed}
\makeatletter
\newenvironment{kframe}{%
 \def\at@end@of@kframe{}%
 \ifinner\ifhmode%
  \def\at@end@of@kframe{\end{minipage}}%
  \begin{minipage}{\columnwidth}%
 \fi\fi%
 \def\FrameCommand##1{\hskip\@totalleftmargin \hskip-\fboxsep
 \colorbox{shadecolor}{##1}\hskip-\fboxsep
     % There is no \\@totalrightmargin, so:
     \hskip-\linewidth \hskip-\@totalleftmargin \hskip\columnwidth}%
 \MakeFramed {\advance\hsize-\width
   \@totalleftmargin\z@ \linewidth\hsize
   \@setminipage}}%
 {\par\unskip\endMakeFramed%
 \at@end@of@kframe}
\makeatother

\definecolor{shadecolor}{rgb}{.97, .97, .97}
\definecolor{messagecolor}{rgb}{0, 0, 0}
\definecolor{warningcolor}{rgb}{1, 0, 1}
\definecolor{errorcolor}{rgb}{1, 0, 0}
\newenvironment{knitrout}{}{} % an empty environment to be redefined in TeX

\usepackage{alltt}

\usepackage{amsmath}
\usepackage{times}
\usepackage{hyperref}
\usepackage[numbers, round]{natbib}
\usepackage[american]{babel}
\usepackage{authblk}
\usepackage{subfig}
\usepackage{placeins}
\usepackage{footnote}
\usepackage{tabularx}
\renewcommand\Affilfont{\itshape\small}
\renewcommand{\topfraction}{0.85}
\renewcommand{\textfraction}{0.1}
\usepackage{graphicx}


\textwidth=6.2in
\textheight=8.5in
\parskip=.3cm
\oddsidemargin=.1in
\evensidemargin=.1in
\headheight=-.3in

%------------------------------------------------------------
% newcommand
%------------------------------------------------------------
\newcommand{\scscst}{\scriptscriptstyle}
\newcommand{\scst}{\scriptstyle}
\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Rclass}[1]{\textit{#1}}
\newcommand{\Rpackage}[1]{\textit{#1}}
\newcommand{\Rexpression}[1]{\texttt{#1}}
\newcommand{\Rmethod}[1]{{\texttt{#1}}}
\newcommand{\Rfunarg}[1]{{\texttt{#1}}}
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}

%------------------------------------------------------------
\title{The dataRetrieval R package}
%------------------------------------------------------------
\author[1]{Laura De Cicco}
\author[1]{Robert Hirsch}
\affil[1]{United States Geological Survey}



\maketitle
\tableofcontents

%------------------------------------------------------------
\section{Introduction to dataRetrieval}
%------------------------------------------------------------ 
Laura A DeCicco's avatar
Laura A DeCicco committed
The dataRetrieval package was created to simplify the process of getting hydrologic data in the R enviornment. It has been specifically designed to work seamlessly with the EGRET R package: Exploration and Graphics for RivEr Trends (EGRET). See: \url{https://github.com/USGS-R/EGRET/wiki} for information on EGRET. EGRET is designed to provide analysis of water quality data sets using the WRTDS method of data analysis (WRTDS is Weighted Regressions on Time, Discharge and Season) as well as analysis of discharge trends using robust time-series smoothing techniques.  Both of these capabilities provide both tabular and graphical analyses of long-term data sets.


The dataRetrieval package is designed to retrieve many of the major data types of USGS hydrologic data that are available on the web, but also allows users to make use of other data that they supply from spreadsheets.  Section 2 provides examples of how one can obtain raw data from USGS sources on the web and ingest them into data frames within the R environment.  The functionality described in section 2 is for general use and is not tailored for the specific uses of the EGRET package.  The functionality described in section 3 is tailored specifically to obtaining input from the web and structuring them specifically for use in the EGRET package.  The functionality described in section 4 is for converting hydrologic data from user-supplied spreadsheets and structuring them specifically for use in the EGRET package.

For information on getting started in R and installing the package, see Appendix (\ref{sec:appendix1}): Getting Started.


%------------------------------------------------------------
\section{General USGS Web Retrievals}
%------------------------------------------------------------ 
In this section, we will run through 5 examples, documenting how to get raw data from the web. This includes site information (\ref{sec:usgsSite}), measured parameter information (\ref{sec:usgsParams}), historical daily values(\ref{sec:usgsDaily}), real-time (unit) values (\ref{sec:usgsRT}), and water quality data (\ref{sec:usgsWQP}) or (\ref{sec:usgsSTORET}). We will use the Choptank River near Greensboro, MD as an example.  The site-ID for this gage station is 01491000. Daily discharge measurements are available as far back as 1948.  Additionally, forms of nitrate have been measured dating back to 1964. The functions/examples in this section are for raw data retrieval.  This may or may not be the easiest data to work with.  In the next section, we will use functions that retrieve and process the data in a dataframe that may prove more friendly for R analysis.

\newpage
Quick workflow example:
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlkwd{library}\hlstd{(dataRetrieval)}
\hlcom{# Site ID for Choptank River near Greensboro, MD}
\hlstd{siteNumber} \hlkwb{<-} \hlstr{"01491000"}
\hlstd{ChoptankInfo} \hlkwb{<-} \hlkwd{getSiteFileData}\hlstd{(siteNumber)}
\hlstd{parameterCd} \hlkwb{<-} \hlstr{"00060"}
\hlstd{rawDailyData} \hlkwb{<-} \hlkwd{retrieveNWISData}\hlstd{(siteNumber,parameterCd,}
                      \hlstr{"1980-01-01"}\hlstd{,}\hlstr{"2010-01-01"}\hlstd{)}
\hlkwd{head}\hlstd{(rawDailyData)}
\end{alltt}
\begin{verbatim}
  agency_cd  site_no   datetime X02_00060_00003
1      USGS 01491000 1980-01-01             100
2      USGS 01491000 1980-01-02              96
3      USGS 01491000 1980-01-03              92
4      USGS 01491000 1980-01-04              89
5      USGS 01491000 1980-01-05              91
6      USGS 01491000 1980-01-06             100
  X02_00060_00003_cd
1                  A
2                  A
3                  A
4                  A
5                  A
6                  A
\end{verbatim}
\begin{alltt}
\hlstd{Daily} \hlkwb{<-} \hlkwd{getDVData}\hlstd{(siteNumber,parameterCd,}
                      \hlstr{"1980-01-01"}\hlstd{,}\hlstr{"2010-01-01"}\hlstd{)}
\end{alltt}
\begin{verbatim}
There are 10959 data points, and 10959 days.
\end{verbatim}
\begin{alltt}
\hlkwd{head}\hlstd{(Daily)}
\end{alltt}
\begin{verbatim}
        Date     Q Julian Month Day DecYear MonthSeq
1 1980-01-01 2.832  47481     1   1    1980     1561
2 1980-01-02 2.718  47482     1   2    1980     1561
3 1980-01-03 2.605  47483     1   3    1980     1561
4 1980-01-04 2.520  47484     1   4    1980     1561
5 1980-01-05 2.577  47485     1   5    1980     1561
6 1980-01-06 2.832  47486     1   6    1980     1561
  Qualifier i   LogQ Q7 Q30
1         A 1 1.0409 NA  NA
2         A 2 1.0000 NA  NA
3         A 3 0.9575 NA  NA
4         A 4 0.9243 NA  NA
5         A 5 0.9466 NA  NA
6         A 6 1.0409 NA  NA
\end{verbatim}
\begin{alltt}
\hlstd{ChoptankInfo} \hlkwb{<-} \hlkwd{getSiteFileData}\hlstd{(siteNumber)}
\hlkwd{colnames}\hlstd{(ChoptankInfo)}
\end{alltt}
\begin{verbatim}
 [1] "agency.cd"             "site.no"              
 [3] "station.nm"            "site.tp.cd"           
 [5] "lat.va"                "long.va"              
 [7] "dec.lat.va"            "dec.long.va"          
 [9] "coord.meth.cd"         "coord.acy.cd"         
[11] "coord.datum.cd"        "dec.coord.datum.cd"   
[13] "district.cd"           "state.cd"             
[15] "county.cd"             "country.cd"           
[17] "land.net.ds"           "map.nm"               
[19] "map.scale.fc"          "alt.va"               
[21] "alt.meth.cd"           "alt.acy.va"           
[23] "alt.datum.cd"          "huc.cd"               
[25] "basin.cd"              "topo.cd"              
[27] "instruments.cd"        "construction.dt"      
[29] "inventory.dt"          "drain.area.va"        
[31] "contrib.drain.area.va" "tz.cd"                
[33] "local.time.fg"         "reliability.cd"       
[35] "gw.file.cd"            "nat.aqfr.cd"          
[37] "aqfr.cd"               "aqfr.type.cd"         
[39] "well.depth.va"         "hole.depth.va"        
[41] "depth.src.cd"          "project.no"           
[43] "queryTime"            
\end{verbatim}
\begin{alltt}
\hlstd{ChoptankAvail} \hlkwb{<-} \hlkwd{getDataAvailability}\hlstd{(siteNumber)}
\hlkwd{head}\hlstd{(ChoptankAvail)}
\end{alltt}
\begin{verbatim}
  parameter_cd statCd  startDate    endDate count service
1        00001        1974-11-04 1984-08-01   109      qw
2        00004        2013-03-27 2013-03-27     1      qw
3        00008        1972-10-24 1973-12-26    12      qw
4        00009        1974-03-22 1974-03-22     1      qw
5        00010  00001 1988-10-01 2012-05-09   894      dv
6        00010  00002 2010-10-01 2012-05-09   529      dv
  parameter_group_nm
1        Information
2           Physical
3        Information
4        Information
5           Physical
6           Physical
                                                                 parameter_nm
1  Location in cross section, distance from right bank looking upstream, feet
2                                                          Stream width, feet
3                                                    Sample accounting number
4 Location in cross section, distance from left bank looking downstream, feet
5                                         Temperature, water, degrees Celsius
6                                         Temperature, water, degrees Celsius
  casrn                              srsname
1                                           
2       Instream features, est. stream width
3                                           
4                                           
5                         Temperature, water
6                         Temperature, water
  parameter_units
1              ft
2              ft
3              nu
4              ft
5           deg C
6           deg C
\end{verbatim}
\begin{alltt}
\hlstd{parameterCd} \hlkwb{<-} \hlstr{"00618"} \hlcom{# Nitrate}
\hlstd{parameterINFO} \hlkwb{<-} \hlkwd{getParameterInfo}\hlstd{(parameterCd)}
\hlkwd{colnames}\hlstd{(parameterINFO)}
\end{alltt}
\begin{verbatim}
[1] "parameter_cd"       "parameter_group_nm"
[3] "parameter_nm"       "casrn"             
[5] "srsname"            "parameter_units"   
\end{verbatim}
\end{kframe}
\end{knitrout}


\newpage

\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{ChoptankAvail} \hlkwb{<-} \hlkwd{getDataAvailability}\hlstd{(siteNumber)}
\hlkwd{head}\hlstd{(ChoptankAvail)}
\end{alltt}
\begin{verbatim}
  parameter_cd statCd  startDate    endDate count service
1        00001        1974-11-04 1984-08-01   109      qw
2        00004        2013-03-27 2013-03-27     1      qw
3        00008        1972-10-24 1973-12-26    12      qw
4        00009        1974-03-22 1974-03-22     1      qw
5        00010  00001 1988-10-01 2012-05-09   894      dv
6        00010  00002 2010-10-01 2012-05-09   529      dv
  parameter_group_nm
1        Information
2           Physical
3        Information
4        Information
5           Physical
6           Physical
                                                                 parameter_nm
1  Location in cross section, distance from right bank looking upstream, feet
2                                                          Stream width, feet
3                                                    Sample accounting number
4 Location in cross section, distance from left bank looking downstream, feet
5                                         Temperature, water, degrees Celsius
6                                         Temperature, water, degrees Celsius
  casrn                              srsname
1                                           
2       Instream features, est. stream width
3                                           
4                                           
5                         Temperature, water
6                         Temperature, water
  parameter_units
1              ft
2              ft
3              nu
4              ft
5           deg C
6           deg C
\end{verbatim}
\end{kframe}
\end{knitrout}


%------------------------------------------------------------
\subsection{Introduction}
%------------------------------------------------------------
Laura A DeCicco's avatar
Laura A DeCicco committed
The United States Geological Survey organizes their hydrological data in standard structure.  Streamgages are located throughout the United States, and each streamgage has a unique ID.  Often (but not always), these ID's are 8 digits.  The first step to finding data is discoving this 8-digit ID. There are many ways to do this, one is the National Water Information System: Mapper \url{http://maps.waterdata.usgs.gov/mapper/index.html}.
Once the site-ID is known, the next required input for USGS data retrievals is the 'parameter code'.  This is a 5-digit code that specifies what measured paramater is being requested.  A complete list of possible USGS parameter codes can be found at \url{http://go.usa.gov/bVDz}.

Not every station will measure all parameters. A short list of commonly measured parameters is shown in Table \ref{tab:params}.


% latex table generated in R 3.0.2 by xtable 1.7-1 package
% Mon Oct 28 16:27:10 2013
\begin{table}[ht]
\begin{tabular}{rll}
 & pCode & shortName \\ 
1 & 00060 & Discharge [cfs] \\ 
  2 & 00065 & Gage height [ft] \\ 
  3 & 00010 & Temperature [C] \\ 
  4 & 00045 & Precipitation [in] \\ 
  5 & 00400 & pH \\ 
   \hline
\end{tabular}
\caption{Common USGS Parameter Codes} 
\label{tab:params}
For real-time data, the parameter code and site ID will suffice.  For most variables that are measured on a continuous basis, the USGS stores the historical data as daily values.  These daily values may be in the form of statistics such as the daily mean values, but they can also include daily maximums, minimums or medians.  These different statistics are specified by a 5-digit \texttt{"}stat code\texttt{"}.  A complete list of stat codes can be found here:

\url{http://nwis.waterdata.usgs.gov/nwis/help/?read_file=stat&format=table}

Some common stat codes are shown in Table \ref{tab:stat}.
% latex table generated in R 3.0.2 by xtable 1.7-1 package
% Mon Oct 28 16:27:10 2013
\begin{table}[ht]
\begin{tabular}{rll}
 & StatCode & shortName \\ 
1 & 00001 & Maximum \\ 
  2 & 00002 & Minimum \\ 
  3 & 00003 & Mean \\ 
  4 & 00008 & Median \\ 
   \hline
\end{tabular}
Laura A DeCicco's avatar
Laura A DeCicco committed
\caption{Commonly used USGS Stat Codes} 
\label{tab:stat}
%------------------------------------------------------------
\subsection{Site Information}
\label{sec:usgsSite}
%------------------------------------------------------------

%------------------------------------------------------------
\subsubsection{getSiteFileData}
\label{sec:usgsSiteFileData}
%------------------------------------------------------------
Use the getSiteFileData function to obtain all of the information available for a particular USGS site such as full station name, drainage area, latitude, and longitude:


\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlkwd{library}\hlstd{(dataRetrieval)}
\hlcom{# Site ID for Choptank River near Greensboro, MD}
\hlstd{siteNumber} \hlkwb{<-} \hlstr{"01491000"}
\hlstd{ChoptankInfo} \hlkwb{<-} \hlkwd{getSiteFileData}\hlstd{(siteNumber)}
\end{alltt}
\end{kframe}
\end{knitrout}


A list of the available columns are found in Appendix \ref{sec:appendix2INFO}: INFO dataframe. Pulling out a specific example piece of information, in this case station name can be done as follows:

\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{ChoptankInfo}\hlopt{$}\hlstd{station.nm}
\end{alltt}
\begin{verbatim}
Laura A DeCicco's avatar
Laura A DeCicco committed
[1] "CHOPTANK RIVER NEAR GREENSBORO, MD"
\end{verbatim}
\end{kframe}
\end{knitrout}

Site information is obtained from \url{http://waterservices.usgs.gov/rest/Site-Test-Tool.html}
\FloatBarrier
%------------------------------------------------------------
\subsubsection{getDataAvailability}
\label{sec:usgsDataAvailability}
%------------------------------------------------------------
To find out the available data at a particular USGS site, including measured parameters, period of record, and number of samples (count), use the getDataAvailability function:

\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlcom{# Continuing from the previous example:}
\hlstd{ChoptankAvailableData} \hlkwb{<-} \hlkwd{getDataAvailability}\hlstd{(siteNumber)}
\hlkwd{head}\hlstd{(ChoptankAvailableData)}
\end{alltt}
\begin{verbatim}
Laura A DeCicco's avatar
Laura A DeCicco committed
  parameter_cd statCd  startDate    endDate count service
1        00001        1974-11-04 1984-08-01   109      qw
2        00004        2013-03-27 2013-03-27     1      qw
3        00008        1972-10-24 1973-12-26    12      qw
4        00009        1974-03-22 1974-03-22     1      qw
5        00010  00001 1988-10-01 2012-05-09   894      dv
6        00010  00002 2010-10-01 2012-05-09   529      dv
  parameter_group_nm
1        Information
2           Physical
3        Information
4        Information
5           Physical
6           Physical
                                                                 parameter_nm
1  Location in cross section, distance from right bank looking upstream, feet
2                                                          Stream width, feet
3                                                    Sample accounting number
4 Location in cross section, distance from left bank looking downstream, feet
5                                         Temperature, water, degrees Celsius
6                                         Temperature, water, degrees Celsius
  casrn                              srsname
1                                           
2       Instream features, est. stream width
3                                           
4                                           
5                         Temperature, water
6                         Temperature, water
  parameter_units
1              ft
2              ft
3              nu
4              ft
5           deg C
6           deg C
\end{verbatim}
\end{kframe}
\end{knitrout}


There is an additional argument to the getDataAvailability called longNames, which defaults to FALSE. Setting longNames to TRUE will cause the function to make a web service call for each parameter and return expanded information on that parameter. Currently, this is a very slow process because each parameter code makes a unique web service call. If the site does not have many measured parameters, setting longNames to TRUE is reasonable.

It is also possible to only request parameter information for a subset of variables. In the following example, we retrieve just the daily mean parameter information from the Choptank data availability dataframe (excluding all unit value and water quality values).
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlcom{# Continuing from the previous example:}
\hlcom{# This pulls out just the daily data:}
\hlstd{ChoptankDailyData} \hlkwb{<-} \hlkwd{subset}\hlstd{(ChoptankAvailableData,}
                            \hlstr{"dv"} \hlopt{==} \hlstd{service)}
\hlcom{# This pulls out the mean:}
\hlstd{ChoptankDailyData} \hlkwb{<-} \hlkwd{subset}\hlstd{(ChoptankDailyData,}
                            \hlstr{"00003"} \hlopt{==} \hlstd{statCd)}
\end{alltt}
\end{kframe}
\end{knitrout}

% latex table generated in R 3.0.2 by xtable 1.7-1 package
% Mon Oct 28 16:27:11 2013
\begin{table}[ht]
\centering
\begin{tabular}{rlllll}
 & shortName & Start & End & Count & Units \\ 
1 & Temperature, water & 2010-10-01 & 2012-05-09 & 529 & deg C \\ 
  2 & Stream flow, mean. daily & 1948-01-01 & 2013-10-27 & 24042 & ft3/s \\ 
  3 & Specific conductance & 2010-10-01 & 2012-05-09 & 527 & uS/cm @25C \\ 
  4 & Suspended sediment concentration (SSC) & 1980-10-01 & 1991-09-30 & 3651 & mg/l \\ 
  5 & Suspended sediment discharge & 1980-10-01 & 1991-09-30 & 3652 & tons/day \\ 
   \hline
\end{tabular}
Laura A DeCicco's avatar
Laura A DeCicco committed
\caption{Daily mean data availabile at the Choptank River near Greensboro, MD} 
\label{tab:gda}
\end{table}


Laura A DeCicco's avatar
Laura A DeCicco committed
See Appendix \ref{app:createWordTable} for instructions on converting an R dataframe to a table in Microsoft Excel or Word.
%------------------------------------------------------------
\subsection{Parameter Information}
\label{sec:usgsParams}
%------------------------------------------------------------
To obtain all of the available information concerning a measured parameter, use the getParameterInfo function:
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlcom{# Using defaults:}
\hlstd{parameterCd} \hlkwb{<-} \hlstr{"00618"}
\hlstd{parameterINFO} \hlkwb{<-} \hlkwd{getParameterInfo}\hlstd{(parameterCd)}
\hlkwd{colnames}\hlstd{(parameterINFO)}
\end{alltt}
\begin{verbatim}
Laura A DeCicco's avatar
Laura A DeCicco committed
[1] "parameter_cd"       "parameter_group_nm"
[3] "parameter_nm"       "casrn"             
[5] "srsname"            "parameter_units"   
\end{verbatim}
\end{kframe}
\end{knitrout}


Pulling out a specific example piece of information, in this case parameter name can be done as follows:
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{parameterINFO}\hlopt{$}\hlstd{parameter_nm}
\end{alltt}
\begin{verbatim}
Laura A DeCicco's avatar
Laura A DeCicco committed
[1] "Nitrate, water, filtered, milligrams per liter as nitrogen"
\end{verbatim}
\end{kframe}
\end{knitrout}

Parameter information is obtained from \url{http://nwis.waterdata.usgs.gov/nwis/pmcodes/}
\FloatBarrier
%------------------------------------------------------------
\subsection{Daily Values}
\label{sec:usgsDaily}
%------------------------------------------------------------
Laura A DeCicco's avatar
Laura A DeCicco committed
To obtain historic daily records of USGS data, use the retrieveNWISData function. The arguments for this function are siteNumber, parameterCd, startDate, endDate, statCd, and a logical (TRUE/FALSE) interactive. There are 2 default arguments: statCd (defaults to \texttt{"}00003\texttt{"}), and interactive (defaults to TRUE).  If you want to use the default values, you do not need to list them in the function call. Setting the \texttt{"}interactive\texttt{"} option to TRUE will walk you through the function. It might make more sense to run large batch collections with the interactive option set to FALSE. 

The dates (start and end) need to be in the format \texttt{"}YYYY-MM-DD\texttt{"} (note: the user does need to include the quotes).  Setting the start date to \texttt{"}\texttt{"} will indicate to the program to ask for the earliest date, setting the end date to \texttt{"}\texttt{"} will ask for the latest available date.

\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlcom{# Continuing with our Choptank River example}
\hlstd{parameterCd} \hlkwb{<-} \hlstr{"00060"}  \hlcom{# Discharge (cfs)}
\hlstd{startDate} \hlkwb{<-} \hlstr{""}  \hlcom{# Will request earliest date}
\hlstd{endDate} \hlkwb{<-} \hlstr{""} \hlcom{# Will request latest date}
\hlstd{discharge} \hlkwb{<-} \hlkwd{retrieveNWISData}\hlstd{(siteNumber,}
                    \hlstd{parameterCd, startDate, endDate)}
\end{alltt}
\end{kframe}
\end{knitrout}


The variable datetime is automatically imported as a Date. Each requested parameter has a value and remark code column.  The names of these columns depend on the requested parameter and stat code combinations. USGS remark codes are often \texttt{"}A\texttt{"} (approved for publication) or \texttt{"}P\texttt{"} (provisional data subject to revision). A more complete list of remark codes can be found here:
\url{http://waterdata.usgs.gov/usa/nwis/help?codes_help}

Another example that doesn't use the defaults would be a request for mean and maximum daily temperature and discharge in early 2012:
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{parameterCd} \hlkwb{<-} \hlkwd{c}\hlstd{(}\hlstr{"00010"}\hlstd{,}\hlstr{"00060"}\hlstd{)}  \hlcom{# Temperature and discharge}
\hlstd{statCd} \hlkwb{<-} \hlkwd{c}\hlstd{(}\hlstr{"00001"}\hlstd{,}\hlstr{"00003"}\hlstd{)}  \hlcom{# Mean and maximum}
\hlstd{startDate} \hlkwb{<-} \hlstr{"2012-01-01"}
\hlstd{endDate} \hlkwb{<-} \hlstr{"2012-05-01"}
\hlstd{temperatureAndFlow} \hlkwb{<-} \hlkwd{retrieveNWISData}\hlstd{(siteNumber, parameterCd,}
        \hlstd{startDate, endDate,} \hlkwc{StatCd}\hlstd{=statCd)}
\hlstd{temperatureAndFlow} \hlkwb{<-} \hlkwd{renameColumns}\hlstd{(temperatureAndFlow)}
\end{alltt}
\end{kframe}
\end{knitrout}


Daily data is pulled from \url{http://waterservices.usgs.gov/rest/DV-Test-Tool.html}. 

An example of plotting the above data (Figure \ref{fig:getNWIStemperaturePlot}):

\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlkwd{par}\hlstd{(}\hlkwc{mar}\hlstd{=}\hlkwd{c}\hlstd{(}\hlnum{5}\hlstd{,}\hlnum{5}\hlstd{,}\hlnum{5}\hlstd{,}\hlnum{5}\hlstd{))}
\hlkwd{with}\hlstd{(temperatureAndFlow,} \hlkwd{plot}\hlstd{(}
  \hlstd{datetime, Temperature_water_degrees_Celsius_Max_01,}
  \hlkwc{xlab}\hlstd{=}\hlstr{"Date"}\hlstd{,}\hlkwc{ylab}\hlstd{=}\hlstr{"Temperature [C]"}
  \hlstd{))}
\hlkwd{par}\hlstd{(}\hlkwc{new}\hlstd{=}\hlnum{TRUE}\hlstd{)}
\hlkwd{with}\hlstd{(temperatureAndFlow,} \hlkwd{plot}\hlstd{(}
  \hlstd{datetime, Discharge_cubic_feet_per_second,}
  \hlkwc{col}\hlstd{=}\hlstr{"red"}\hlstd{,}\hlkwc{type}\hlstd{=}\hlstr{"l"}\hlstd{,}\hlkwc{xaxt}\hlstd{=}\hlstr{"n"}\hlstd{,}\hlkwc{yaxt}\hlstd{=}\hlstr{"n"}\hlstd{,}\hlkwc{xlab}\hlstd{=}\hlstr{""}\hlstd{,}\hlkwc{ylab}\hlstd{=}\hlstr{""}\hlstd{,}\hlkwc{axes}\hlstd{=}\hlnum{FALSE}
  \hlstd{))}
\hlkwd{axis}\hlstd{(}\hlnum{4}\hlstd{,}\hlkwc{col}\hlstd{=}\hlstr{"red"}\hlstd{,}\hlkwc{col.axis}\hlstd{=}\hlstr{"red"}\hlstd{)}
\hlkwd{mtext}\hlstd{(}\hlstr{"Discharge [cfs]"}\hlstd{,}\hlkwc{side}\hlstd{=}\hlnum{4}\hlstd{,}\hlkwc{line}\hlstd{=}\hlnum{3}\hlstd{,}\hlkwc{col}\hlstd{=}\hlstr{"red"}\hlstd{)}
\hlkwd{title}\hlstd{(}\hlkwd{paste}\hlstd{(ChoptankInfo}\hlopt{$}\hlstd{station.nm,}\hlstr{"2012"}\hlstd{,}\hlkwc{sep}\hlstd{=}\hlstr{" "}\hlstd{))}
\end{alltt}
\end{kframe}\begin{figure}[]

Laura A DeCicco's avatar
Laura A DeCicco committed
\includegraphics[width=1\linewidth,height=1\linewidth]{figure/getNWIStemperaturePlot} \caption[Temperature and discharge plot of Choptank River in 2012]{Temperature and discharge plot of Choptank River in 2012.\label{fig:getNWIStemperaturePlot}}
There are occasions where NWIS values are not reported as numbers, instead there might be text describing a certain event such as \texttt{"}Ice\texttt{"}.  Any value that cannot be converted to a number will be reported as NA in this package.

\FloatBarrier
%------------------------------------------------------------
\subsection{Unit Values}
\label{sec:usgsRT}
%------------------------------------------------------------
Any data that are collected at regular time intervals (such as 15-minute or hourly) are known as \texttt{"}Unit Values\texttt{"} - many of these are delivered on a real time basis and very recent data (even less than an hour old in many cases) are available through the function retrieveUnitNWISData.  Some of these Unit Values are available for the past several years, and some are only available for a recent time period such as 120 days or a year.  Here is an example of a retrieval of such data.  

\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{parameterCd} \hlkwb{<-} \hlstr{"00060"}  \hlcom{# Discharge (cfs)}
\hlstd{startDate} \hlkwb{<-} \hlstr{"2012-05-12"}
\hlstd{endDate} \hlkwb{<-} \hlstr{"2012-05-13"}
\hlstd{dischargeToday} \hlkwb{<-} \hlkwd{retrieveUnitNWISData}\hlstd{(siteNumber, parameterCd,}
        \hlstd{startDate, endDate)}
\end{alltt}
\end{kframe}
\end{knitrout}

Which produces the following dataframe:
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{verbatim}
Laura A DeCicco's avatar
Laura A DeCicco committed
  agency     site            dateTime X02_00060_00011
1   USGS 01491000 2012-05-12 00:00:00              83
2   USGS 01491000 2012-05-12 00:15:00              83
3   USGS 01491000 2012-05-12 00:30:00              83
4   USGS 01491000 2012-05-12 00:45:00              83
5   USGS 01491000 2012-05-12 01:00:00              85
6   USGS 01491000 2012-05-12 01:15:00              83
  X02_00060_00011_cd
1                  A
2                  A
3                  A
4                  A
5                  A
6                  A
\end{verbatim}
\end{kframe}
\end{knitrout}


Note that time now becomes important, so the variable datetime is a POSIXct, and the time zone is included in a separate column. Data is pulled from \url{http://waterservices.usgs.gov/rest/IV-Test-Tool.html}. There are occasions where NWIS values are not reported as numbers, instead a common example is \texttt{"}Ice\texttt{"}.  Any value that cannot be converted to a number will be reported as NA in this package.

\newpage


\FloatBarrier
%------------------------------------------------------------
\subsection{Water Quality Values}
\label{sec:usgsWQP}
%------------------------------------------------------------
Laura A DeCicco's avatar
Laura A DeCicco committed
To get USGS water quality data from water samples collected at the streamgage (as distinct from unit values collected through some type of automatic monitor) we can use the Water Quality Data Portal: \url{http://www.waterqualitydata.us/}. The raw data are obtained from the function  getRawQWData, with the similar input arguments: siteNumber, parameterCd, startDate, endDate, and interactive. The difference is in parameterCd, in this function multiple parameters can be queried using a vector, and setting parameterCd to \texttt{"}\texttt{"} will return all of the measured observations. The raw data can be overwelming (see Appendix \ref{sec:appendix2WQP}), a simplified version of the data can be obtained using getQWData.There is a large amount of data returned for each observation. 
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlcom{# Dissolved Nitrate parameter codes:}
\hlstd{parameterCd} \hlkwb{<-} \hlkwd{c}\hlstd{(}\hlstr{"00618"}\hlstd{,}\hlstr{"71851"}\hlstd{)}
\hlstd{startDate} \hlkwb{<-} \hlstr{"1979-10-11"}
\hlstd{endDate} \hlkwb{<-} \hlstr{"2012-12-18"}
\hlstd{dissolvedNitrate} \hlkwb{<-} \hlkwd{getRawQWData}\hlstd{(siteNumber, parameterCd,}
      \hlstd{startDate, endDate)}
\hlstd{dissolvedNitrateSimple} \hlkwb{<-} \hlkwd{getQWData}\hlstd{(siteNumber, parameterCd,}
        \hlstd{startDate, endDate)}
\hlkwd{names}\hlstd{(dissolvedNitrateSimple)}
\end{alltt}
\begin{verbatim}
[1] "dateTime"        "qualifier.00618" "value.00618"    
[4] "qualifier.71851" "value.71851"    
\end{verbatim}
\end{kframe}
\end{knitrout}

Note that in this dataframe, datetime is imported as Dates (no times are included), and the qualifier is either blank or \texttt{"}\verb@<@\texttt{"} signifying a censored value. A plotting example is shown in Figure \ref{fig:getQWtemperaturePlot}.

\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlkwd{with}\hlstd{(dissolvedNitrateSimple,} \hlkwd{plot}\hlstd{(}
  \hlstd{dateTime, value.00618,}
  \hlkwc{xlab}\hlstd{=}\hlstr{"Date"}\hlstd{,}\hlkwc{ylab} \hlstd{=} \hlkwd{paste}\hlstd{(parameterINFO}\hlopt{$}\hlstd{srsname,}
      \hlstr{"["}\hlstd{,parameterINFO}\hlopt{$}\hlstd{parameter_units,}\hlstr{"]"}\hlstd{)}
  \hlstd{))}
\hlkwd{title}\hlstd{(ChoptankInfo}\hlopt{$}\hlstd{station.nm)}
\end{alltt}
\end{kframe}\begin{figure}[]

\includegraphics[width=\maxwidth]{figure/getQWtemperaturePlot} \caption[Nitrate plot of Choptank River]{Nitrate plot of Choptank River.\label{fig:getQWtemperaturePlot}}
%------------------------------------------------------------
\subsection{STORET Water Quality Retrievals}
\label{sec:usgsSTORET}
%------------------------------------------------------------
There are additional data sets available on the Water Quality Data Portal (\url{http://www.waterqualitydata.us/}).  These data sets can be housed in either the STORET (data from EPA) or NWIS database.  Since STORET does not use USGS parameter codes, a \texttt{"}characteristic name\texttt{"} must be supplied.  The following example retrieves specific conductance from a DNR site in Wisconsin.

\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{specificCond} \hlkwb{<-} \hlkwd{getWQPData}\hlstd{(}\hlstr{'WIDNR_WQX-10032762'}\hlstd{,}
        \hlstr{'Specific conductance'}\hlstd{,} \hlstr{''}\hlstd{,} \hlstr{''}\hlstd{)}
\hlkwd{head}\hlstd{(specificCond)}
\end{alltt}
\begin{verbatim}
Laura A DeCicco's avatar
Laura A DeCicco committed
    dateTime qualifier.Specific conductance
1 2011-02-14                               
2 2011-02-17                               
3 2011-03-03                               
4 2011-03-10                               
5 2011-03-29                               
6 2011-04-07                               
  value.Specific conductance
1                       1360
2                       1930
3                       1240
4                       1480
5                       1130
6                       1200
\end{verbatim}
\end{kframe}
\end{knitrout}


\FloatBarrier
%------------------------------------------------------------
\subsection{URL Construction}
\label{sec:usgsURL}
%------------------------------------------------------------
There may be times when you might be interested in seeing the URL (web address) that was used to obtain the raw data. The constructNWISURL function returns the URL.  Aside from input variables that have already been described, there is a new argument \texttt{"}service\texttt{"}. The service argument can be \texttt{"}dv\texttt{"} (daily values), \texttt{"}uv\texttt{"} (unit values), \texttt{"}qw\texttt{"} (NWIS water quality values), or \texttt{"}wqp\texttt{"} (general Water Quality Portal values).
 

\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlcom{# Dissolved Nitrate parameter codes:}
\hlstd{pCode} \hlkwb{<-} \hlkwd{c}\hlstd{(}\hlstr{"00618"}\hlstd{,}\hlstr{"71851"}\hlstd{)}
\hlstd{startDate} \hlkwb{<-} \hlstr{"1964-06-11"}
\hlstd{endDate} \hlkwb{<-} \hlstr{"2012-12-18"}
\hlstd{url_qw} \hlkwb{<-} \hlkwd{constructNWISURL}\hlstd{(siteNumber,pCode,startDate,endDate,}\hlstr{'qw'}\hlstd{)}
\hlstd{url_dv} \hlkwb{<-} \hlkwd{constructNWISURL}\hlstd{(siteNumber,}\hlstr{"00060"}\hlstd{,startDate,endDate,}
                           \hlstr{'dv'}\hlstd{,}\hlkwc{statCd}\hlstd{=}\hlstr{"00003"}\hlstd{)}
\hlstd{url_uv} \hlkwb{<-} \hlkwd{constructNWISURL}\hlstd{(siteNumber,}\hlstr{"00060"}\hlstd{,startDate,endDate,}\hlstr{'uv'}\hlstd{)}
\end{alltt}
\end{kframe}
\end{knitrout}

%------------------------------------------------------------
\section{Data Retrievals Structured For Use In The EGRET Package}
%------------------------------------------------------------ 
Rather than using the raw data as retrieved by the web, the dataRetrieval package also includes functions that return the data in a structure that has been designed to work with the EGRET R package (\url{https://github.com/USGS-R/EGRET/wiki}). In general, these dataframes may be much more 'R-friendly' than the raw data, and will contain additional date information that allows for efficient data analysis.

In this section, we use 3 dataRetrieval functions to get sufficient data to perform an EGRET analysis.  We will continue analyzing the Choptank River. We will be retrieving essentially the same data that were retrieved in the previous section, but in this case it will be structured into three EGRET-specific dataframes.  The daily discharge data will be placed in a dataframe called Daily.  The nitrate sample data will be placed in a dataframe called Sample.  The data about the site and the parameter will be placed in a dataframe called INFO.  Although these dataframes were designed to work with the EGRET R package, they can be very useful for a wide range of hydrologic studies that don't use EGRET.

%------------------------------------------------------------
\subsection{INFO Data}
%------------------------------------------------------------
The function to obtain metadata, or data about the streamgage and measured parameters is getMetaData. This function combines getSiteFileData and getParameterInfo, producing one dataframe called INFO.

\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{parameterCd} \hlkwb{<-} \hlstr{"00618"}
\hlstd{INFO} \hlkwb{<-}\hlkwd{getMetaData}\hlstd{(siteNumber,parameterCd,} \hlkwc{interactive}\hlstd{=}\hlnum{FALSE}\hlstd{)}
\end{alltt}
\end{kframe}
\end{knitrout}


Column names in the INFO dataframe are listed in Appendix 2 (\ref{sec:appendix2INFO}).

\FloatBarrier
%------------------------------------------------------------
\subsection{Daily Data}
%------------------------------------------------------------
The function to obtain the daily values (discharge in this case) is getDVData.  It requires the inputs siteNumber, ParameterCd, StartDate, EndDate, interactive, and convert. Most of these arguments are described in the previous section, however \texttt{"}convert\texttt{"} is a new argument (defaults to TRUE), and it tells the program to convert the values from cubic feet per second (cfs) to cubic meters per second (cms). For EGRET applications with NWIS web retrieval, do not use this argument (the default is TRUE), EGRET assumes that discharge is always in cubic meters per second. If you don't want this conversion and are not using EGRET, set convert=FALSE in the function call. 

\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{siteNumber} \hlkwb{<-} \hlstr{"01491000"}
\hlstd{startDate} \hlkwb{<-} \hlstr{"2000-01-01"}
\hlstd{endDate} \hlkwb{<-} \hlstr{"2013-01-01"}
\hlcom{# This call will get NWIS (cfs) data , and convert it to cms:}
\hlstd{Daily} \hlkwb{<-} \hlkwd{getDVData}\hlstd{(siteNumber,} \hlstr{"00060"}\hlstd{, startDate, endDate)}
\end{alltt}
\begin{verbatim}
There are 4750 data points, and 4750 days.
\end{verbatim}
\end{kframe}
\end{knitrout}


Details of the Daily dataframe are listed below:

% latex table generated in R 3.0.2 by xtable 1.7-1 package
% Mon Oct 28 16:27:21 2013
\begin{table}[ht]
\centering
\begin{tabular}{rllll}
 & ColumnName & Type & Description & Units \\ 
1 & Date & Date & Date & date \\ 
  2 & Q & number & Discharge in cms & cms \\ 
  3 & Julian & number & Number of days since January 1, 1850 & days \\ 
  4 & Month & integer & Month of the year [1-12] & months \\ 
  5 & Day & integer & Day of the year [1-366] & days \\ 
  6 & DecYear & number & Decimal year & years \\ 
  7 & MonthSeq & integer & Number of months since January 1, 1850 & months \\ 
  8 & Qualifier & string & Qualifing code & character \\ 
  9 & i & integer & Index of days, starting with 1 & days \\ 
  10 & LogQ & number & Natural logarithm of Q & numeric \\ 
  11 & Q7 & number & 7 day running average of Q & cms \\ 
  12 & Q30 & number & 30 running average of Q & cms \\ 
\end{tabular}
\caption{Daily dataframe} 
\end{table}



Laura A DeCicco's avatar
Laura A DeCicco committed
If there are discharge values of zero, the code will add a small constant to all of the daily discharges.  This constant is 0.001 times the mean discharge.  The code will also report on the number of zero and negative values and the size of the constant.  EGRET should only be used if the number of zero values is a very small fraction of the total days in the record (say less than 0.1\% of the days), and there are no negative discharge values.  Columns Q7 and Q30 are the 7 and 30 day running averages for the 7 or 30 days ending on this specific date.
%------------------------------------------------------------
\subsection{Sample Data}
%------------------------------------------------------------
The function to obtain USGS sample data from the water quality portal is getSampleData. The arguments for this function are also siteNumber, ParameterCd, StartDate, EndDate, interactive. These are the same inputs as getRawQWData or getQWData as described in the previous section.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{parameterCd} \hlkwb{<-} \hlstr{"00618"}
\hlstd{Sample} \hlkwb{<-}\hlkwd{getSampleData}\hlstd{(siteNumber,parameterCd,}
      \hlstd{startDate, endDate)}
\end{alltt}
\end{kframe}
\end{knitrout}


The function to obtain STORET sample data from the water quality portal is getSTORETSampleData. The arguments for this function are also siteNumber, ParameterCd, StartDate, EndDate, interactive. These are the same inputs as getRawQWData or getQWData as described in the previous section.

\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{site} \hlkwb{<-} \hlstr{'WIDNR_WQX-10032762'}
\hlstd{characteristicName} \hlkwb{<-} \hlstr{'Specific conductance'}
\hlstd{Sample} \hlkwb{<-}\hlkwd{getSTORETSampleData}\hlstd{(site,characteristicName,}
      \hlstd{startDate, endDate)}
\end{alltt}
\end{kframe}
\end{knitrout}


\pagebreak

Details of the Sample dataframe are listed below:

\begin{table}[!ht]
\begin{minipage}{\linewidth}
\begin{center}
\caption{Sample dataframe} 
\begin{tabular}{llll}
  \hline
ColumnName & Type & Description & Units \\ 
  \hline
Date & Date & Date & date \\ 
  ConcLow & number & Lower limit of concentration & mg/L \\ 
  ConcHigh & number & Upper limit of concentration & mg/L \\ 
  Uncen & integer & Uncensored data (1=true, 0=false) & integer \\ 
  ConcAve & number & Average of ConcLow and ConcHigh & mg/L \\ 
  Julian & number & Number of days since January 1, 1850 & days \\ 
  Month & integer & Month of the year [1-12] & months \\ 
  Day & integer & Day of the year [1-366] & days \\ 
  DecYear & number & Decimal year & years \\ 
  MonthSeq & integer & Number of months since January 1, 1850 & months \\ 
  SinDY & number & Sine of DecYear & numeric \\ 
  CosDY & number & Cosine of DecYear & numeric \\ 
  Q \footnotemark[1] & number & Discharge & cms \\ 
Laura A DeCicco's avatar
Laura A DeCicco committed
  LogQ \footnotemark[1] & number & Natural logarithm of discharge & numeric \\ 
   \hline
\end{tabular}
\end{center}
\end{minipage}
\end{table}

Laura A DeCicco's avatar
Laura A DeCicco committed
\footnotetext[1]{Discharge columns are populated from data in the Daily dataframe after calling the mergeReport function.}
%------------------------------------------------------------
\subsection{Censored Values: Summation Explanation}
%------------------------------------------------------------
In the typical case where none of the data are censored (that is, no values are reported as \texttt{"}less-than\texttt{"} values) the ConcLow = ConcHigh = ConcAve all of which are equal to the reported value and Uncen=0.  In the typical form of censoring where a value is reported as less than the reporting limit, then ConcLow = NA, ConcHigh = reporting limit, ConcAve = 0.5 * reporting limit, and Uncen = 1.

As an example to understand how the dataRetrieval package handles a more complex censoring problem, let us say that in 2004 and earlier, we computed a total phosphorus (tp) as the sum of dissolved phosphorus (dp) and particulate phosphorus (pp). From 2005 and onward, we have direct measurements of total phosphorus (tp). A small subset of this fictional data looks like this:

\begin{center}

% latex table generated in R 3.0.2 by xtable 1.7-1 package
% Mon Oct 28 16:27:22 2013
\begin{table}[ht]
\centering
\begin{tabular}{rllrlrlr}
 & cdate & rdp & dp & rpp & pp & rtp & tp \\ 
1 & 2003-02-15 &  & 0.02 &  & 0.50 &  &  \\ 
  2 & 2003-06-30 & $<$ & 0.01 &  & 0.30 &  &  \\ 
  3 & 2004-09-15 & $<$ & 0.00 & $<$ & 0.20 &  &  \\ 
  4 & 2005-01-30 &  &  &  &  &  & 0.43 \\ 
  5 & 2005-05-30 &  &  &  &  & $<$ & 0.05 \\ 
  6 & 2005-10-30 &  &  &  &  & $<$ & 0.02 \\ 
   \hline
\end{tabular}
\caption{Example data} 
\end{table}



\end{center}


The dataRetrieval package will \texttt{"}add up\texttt{"} all the values in a given row to form the total for that sample. Thus, you only want to enter data that should be added together. For example, we might know the value for dp on 5/30/2005, but we don't want to put it in the table because under the rules of this data set, we are not suppose to add it in to the values in 2005.

For every sample, the EGRET package requires a pair of numbers to define an interval in which the true value lies (ConcLow and ConcHigh). In a simple non-censored case (the reported value is above the detection limit), ConcLow equals ConcHigh and the interval collapses down to a single point.In a simple censored case, the value might be reported as \verb@<@0.2, then ConcLow=NA and ConcHigh=0.2. We use NA instead of 0 as a way to elegantly handle future logarithm calculations.

For the more complex example case, let us say dp is reported as \verb@<@0.01 and pp is reported as 0.3. We know that the total must be at least 0.3 and could be as much as 0.31. Therefore, ConcLow=0.3 and ConcHigh=0.31. Another case would be if dp is reported as \verb@<@0.005 and pp is reported \verb@<@0.2. We know in this case that the true value could be as low as zero, but could be as high as 0.205. Therefore, in this case, ConcLow=NA and ConcHigh=0.205. The Sample dataframe for the example data is therefore:

%------------------------------------------------------------ 
\subsection{User-Generated Data Files}
%------------------------------------------------------------ 
Aside from retrieving data from the USGS web services, the dataRetrieval package includes functions to generate the Daily and Sample data frame from local files.

%------------------------------------------------------------ 
\subsubsection{getDailyDataFromFile}
%------------------------------------------------------------ 
getDailyDataFromFile will load a user-supplied text file and convert it to the Daily dataframe. The file should have two columns, the first dates, the second values.  The dates should be formatted either mm/dd/yyyy or yyyy-mm-dd. Using a 4-digit year is required. This function has the following inputs: filePath, fileName,hasHeader (TRUE/FALSE), separator, qUnit, and interactive (TRUE/FALSE). filePath is a string that defines the path to your file. This can either be a full path, or path relative to your R working directory. The input fileName is a string that defines the file name (including the extension).

Text files that contain this sort of data require some sort of a separator, for example, a 'csv' file (comma-separated value) file uses a comma to separate the date and value column. A tab delimited file would use a tab (\texttt{"}\verb@\t@\texttt{"}) rather than the comma (\texttt{"},\texttt{"}). The type of separator you use can be defined in the function call in the \texttt{"}separator\texttt{"} argument, the default is \texttt{"},\texttt{\texttt{"}}. Another function input is a logical variable: hasHeader.  The default is TRUE. If your data does not have column names, set this variable to FALSE.