From f6a3c7a5174e2a0fc73a80e8569280824e20a402 Mon Sep 17 00:00:00 2001
From: Laura DeCicco <ldecicco@usgs.gov>
Date: Tue, 18 Feb 2014 16:01:03 -0600
Subject: [PATCH] Included all of Jeff's corrections.

---
 vignettes/dataRetrieval.Rnw | 219 +++++++++++++++++++++++++-----------
 1 file changed, 156 insertions(+), 63 deletions(-)

diff --git a/vignettes/dataRetrieval.Rnw b/vignettes/dataRetrieval.Rnw
index 6f67a662..e569c707 100644
--- a/vignettes/dataRetrieval.Rnw
+++ b/vignettes/dataRetrieval.Rnw
@@ -17,6 +17,8 @@
 \usepackage{placeins}
 \usepackage{footnote}
 \usepackage{tabularx}
+\usepackage{threeparttable}
+\usepackage{parskip}
 \renewcommand\Affilfont{\itshape\small}
 
 \renewcommand{\topfraction}{0.85}
@@ -77,10 +79,10 @@ knit_hooks$set(crop = hook_pdfcrop)
 %------------------------------------------------------------
 \section{Introduction to dataRetrieval}
 %------------------------------------------------------------ 
-The dataRetrieval package was created to simplify the process of getting hydrology data in the R environment. It has been specifically designed to work seamlessly with the EGRET R package: Exploration and Graphics for RivEr Trends (EGRET). See: \url{https://github.com/USGS-R/EGRET/wiki} for information on EGRET. EGRET is designed to provide analysis of water quality data sets using the WRTDS method of data analysis (WRTDS is Weighted Regressions on Time, Discharge and Season) as well as analysis of discharge trends using robust time-series smoothing techniques.  Both of these capabilities provide both tabular and graphical analyses of long-term data sets.
+The dataRetrieval package was created to simplify the process of loading hydrology data into the R environment. It has been specifically designed to work seamlessly with the EGRET R package: Exploration and Graphics for RivEr Trends. See: \url{https://github.com/USGS-R/EGRET/wiki} for information on EGRET. EGRET is designed to provide analysis of water quality data sets using the Weighted Regressions on Time, Discharge and Season (WRTDS) method as well as analysis of discharge trends using robust time-series smoothing techniques.  Both of these capabilities provide both tabular and graphical analyses of long-term data sets.
 
 
-The dataRetrieval package is designed to retrieve many of the major data types of United States Geological Survey (USGS) hydrology data that are available on the web, but also allows users to make use of other data that they supply from spreadsheets.  Section 2 provides examples of how one can obtain raw data from USGS sources on the web and ingest them into data frames within the R environment.  The functionality described in section 2 is for general use and is not tailored for the specific uses of the EGRET package.  The functionality described in section 3 is tailored specifically to obtaining input from the web and structuring them for use in the EGRET package.  The functionality described in section 4 is for converting hydrology data from user-supplied spreadsheets and structuring them specifically for use in the EGRET package.
+The dataRetrieval package is designed to retrieve many of the major data types of United States Geological Survey (USGS) hydrology data that are available on the web. Users may also load data from other sources (text files, spreadsheets) using dataRetrieval.  Section \ref{sec:genRetrievals} provides examples of how one can obtain raw data from USGS sources on the web and ingest them into data frames within the R environment.  The functionality described in section \ref{sec:genRetrievals} is for general use and is not tailored for the specific uses of the EGRET package.  The functionality described in section \ref{sec:EGRETdfs} is tailored specifically to obtaining input from the web and structuring it for use in the EGRET package.  The functionality described in section \ref{sec:summary} is for converting hydrology data from user-supplied files and structuring it specifically for use in the EGRET package.
 
 For information on getting started in R and installing the package, see (\ref{sec:appendix1}): Getting Started.
 
@@ -92,7 +94,7 @@ siteNumber <- "01491000"
 ChoptankInfo <- getSiteFileData(siteNumber)
 parameterCd <- "00060"
 
-#Raw data:
+#Raw daily data:
 rawDailyData <- retrieveNWISData(siteNumber,parameterCd,
                       "1980-01-01","2010-01-01")
 # Data compiled for EGRET analysis
@@ -115,15 +117,16 @@ Sample <- mergeReport()
 
 %------------------------------------------------------------
 \section{General USGS Web Retrievals}
+\label{sec:genRetrievals}
 %------------------------------------------------------------ 
-In this section, we will run through 5 examples, which document how to get raw data from the web. This includes site information (\ref{sec:usgsSite}), measured parameter information (\ref{sec:usgsParams}), historical daily values(\ref{sec:usgsDaily}), unit values (which include real-time data but can also include other sensor data stored at regular time intervals) (\ref{sec:usgsRT}), and water quality data (\ref{sec:usgsWQP}) or (\ref{sec:usgsSTORET}). We will use the Choptank River near Greensboro, MD as an example.  The site-ID for this gage station is 01491000. Daily discharge measurements are available as far back as 1948.  Additionally, forms of nitrate have been measured dating back to 1964. The functions/examples in this section are for raw data retrieval.  In the next section, we will use functions that retrieve and process the data in a dataframe that may prove more friendly for R analysis, and which is specifically tailored to EGRET analysis.
+In this section, we will run through 5 examples, which document how to get raw data from the web. This data includes site information (\ref{sec:usgsSite}), measured parameter information (\ref{sec:usgsParams}), historical daily values(\ref{sec:usgsDaily}), unit values (which include real-time data but can also include other sensor data stored at regular time intervals) (\ref{sec:usgsRT}), and water quality data (\ref{sec:usgsWQP}) or (\ref{sec:usgsSTORET}). We will use the Choptank River near Greensboro, MD as an example.  The site-ID for this gage station is 01491000. Daily discharge measurements are available as far back as 1948.  Additionally, nitrate has been measured dating back to 1964. The functions/examples in this section are for raw data retrieval.  In the next section, we will use functions that retrieve and process the data in a dataframe that may prove friendier for R analysis, and is specifically tailored to EGRET analysis.
 
 %------------------------------------------------------------
 \subsection{Introduction}
 %------------------------------------------------------------
 The USGS organizes their hydrology data in a standard structure.  Streamgages are located throughout the United States, and each streamgage has a unique ID.  Often (but not always), these ID's are 8 digits.  The first step to finding data is discovering this 8-digit ID. There are many ways to do this, one is the National Water Information System: Mapper \url{http://maps.waterdata.usgs.gov/mapper/index.html}.
 
-Once the site-ID is known, the next required input for USGS data retrievals is the 'parameter code'.  This is a 5-digit code that specifies what measured parameter is being requested.  A complete list of possible USGS parameter codes can be found at \url{http://go.usa.gov/bVDz}.
+Once the site-ID is known, the next required input for USGS data retrievals is the `parameter code'.  This is a 5-digit code that specifies what measured parameter is being requested.  For example, parameter code 00631 represents `Nitrate plus nitrite, water, filtered, milligrams per liter as nitrogen', with units of `mg/l as N'. A complete list of possible USGS parameter codes can be found at \url{http://go.usa.gov/bVDz}.
 
 Not every station will measure all parameters. A short list of commonly measured parameters is shown in Table \ref{tab:params}.
 
@@ -151,11 +154,11 @@ subset(parameterCdFile,parameter_cd %in% c("00060","00010","00400"))
 @
 
 
-For unit values data (sensor data), knowing the parameter code and site ID is enough to make a request for data.  For most variables that are measured on a continuous basis, the USGS also stores the historical data as daily values.  These daily values may be in the form of statistics such as the daily mean values, but they can also include daily maximums, minimums or medians.  These different statistics are specified by a 5-digit \texttt{"}stat code\texttt{"}.  A complete list of stat codes can be found here:
+For unit values data (sensor data), knowing the parameter code and site ID is enough to make a request for data.  For most variables that are measured on a continuous basis, the USGS also stores the historical data as daily values.  These daily values are statistical summaries of the continuous data, e.g. maximum, minimum, mean, median. The different statistics are specified by a 5-digit statistics code.  A complete list of statistic codes can be found here:
 
 \url{http://nwis.waterdata.usgs.gov/nwis/help/?read_file=stat&format=table}
 
-Some common stat codes are shown in Table \ref{tab:stat}.
+Some common codes are shown in Table \ref{tab:stat}.
 <<tableStatCodes, echo=FALSE,results='asis'>>=
 StatCode <- c('00001', '00002', '00003','00008')
 shortName <- c("Maximum","Minimum","Mean", "Median")
@@ -202,7 +205,7 @@ Site information is obtained from \url{http://waterservices.usgs.gov/rest/Site-T
 \subsubsection{getDataAvailability}
 \label{sec:usgsDataAvailability}
 %------------------------------------------------------------
-To find out the available data at a particular USGS site, including measured parameters, period of record, and number of samples (count), use the getDataAvailability function. It is also possible to only request parameter information for a subset of variables. In the following example, we retrieve just the daily mean parameter information from the Choptank data availability dataframe (excluding all unit value and water quality values).
+To discover what data is available for a particular USGS site, including measured parameters, period of record, and number of samples (count), use the getDataAvailability function. It is possible to limit the retrieval information to a subset of variables. In the following example, we limit the retrieved Choptank data to only daily mean parameter (excluding all unit value and water quality values).
 
 
 <<getSiteExtended, echo=TRUE>>=
@@ -235,7 +238,7 @@ xtable(tableData,label="tab:gda",
 
 @
 
-See \ref{app:createWordTable} for instructions on converting an R dataframe to a table in Microsoft Excel or Word.
+See Section \ref{app:createWordTable} for instructions on converting an R dataframe to a table in Microsoft Excel or Word to display a data availability table similar to Table \ref{tab:gda}.
 
 \FloatBarrier
 
@@ -261,11 +264,11 @@ Parameter information is obtained from \url{http://nwis.waterdata.usgs.gov/nwis/
 \subsection{Daily Values}
 \label{sec:usgsDaily}
 %------------------------------------------------------------
-To obtain historic daily records of USGS data, use the retrieveNWISData function. The arguments for this function are siteNumber, parameterCd, startDate, endDate, statCd, and a logical (TRUE/FALSE) interactive. There are 2 default arguments: statCd (defaults to \texttt{"}00003\texttt{"}), and interactive (defaults to TRUE).  If you want to use the default values, you do not need to list them in the function call. Setting the \texttt{"}interactive\texttt{"} option to TRUE will walk you through the function. It might make more sense to run large batch collections with the interactive option set to FALSE. 
+To obtain daily records of USGS data, use the retrieveNWISData function. The arguments for this function are siteNumber, parameterCd, startDate, endDate, statCd, and a logical (TRUE/FALSE) interactive. There are 2 default arguments: statCd (defaults to \texttt{"}00003\texttt{"}), and interactive (defaults to TRUE).  If you want to use the default values, you do not need to list them in the function call. Setting the \texttt{"}interactive\texttt{"} option to TRUE will walk you through the function. It might make more sense to run large batch collections with the interactive option set to FALSE. 
 
-The dates (start and end) need to be in the format \texttt{"}YYYY-MM-DD\texttt{"} (note: the user does need to include the quotes).  Setting the start date to \texttt{"}\texttt{"} will indicate to the program to ask for the earliest date, setting the end date to \texttt{"}\texttt{"} will ask for the latest available date.
+The dates (start and end) need to be in the format \texttt{"}YYYY-MM-DD\texttt{"} (note: the user does need to include the quotes).  Setting the start date to \texttt{"}\texttt{"} (no space) will indicate to the program to ask for the earliest date, setting the end date to \texttt{"}\texttt{"} (no space) will ask for the latest available date.
 
-<<label=getNWISDaily, echo=TRUE, eval=FALSE>>=
+<<label=getNWISDaily, echo=TRUE, eval=TRUE>>=
 
 # Continuing with our Choptank River example
 parameterCd <- "00060"  # Discharge (cfs)
@@ -274,9 +277,10 @@ endDate <- "" # Will request latest date
 
 discharge <- retrieveNWISData(siteNumber, 
                     parameterCd, startDate, endDate)
+names(discharge)
 @
 
-The variable datetime is automatically imported as a Date. Each requested parameter has a value and remark code column.  The names of these columns depend on the requested parameter and stat code combinations. USGS remark codes are often \texttt{"}A\texttt{"} (approved for publication) or \texttt{"}P\texttt{"} (provisional data subject to revision). A more complete list of remark codes can be found here:
+The column `datetime' in the returned dataframe is automatically imported as a variable of class `Date' in R. Each requested parameter has a value and remark code column.  The names of these columns depend on the requested parameter and stat code combinations. USGS remark codes are often `A' (approved for publication) or `P' (provisional data subject to revision). A more complete list of remark codes can be found here:
 \url{http://waterdata.usgs.gov/usa/nwis/help?codes_help}
 
 Another example that doesn't use the defaults would be a request for mean and maximum daily temperature and discharge in early 2012:
@@ -290,16 +294,23 @@ endDate <- "2012-05-01"
 temperatureAndFlow <- retrieveNWISData(siteNumber, parameterCd, 
         startDate, endDate, StatCd=statCd)
 
-temperatureAndFlow <- renameColumns(temperatureAndFlow)
-
 @
 
 Daily data is pulled from \url{http://waterservices.usgs.gov/rest/DV-Test-Tool.html}. 
 
+The column names can be automatically adjusted based on the parameter and statistic codes using the renameColumns function. This is not necessary, but may be useful when analyzing the data. 
+
+<<label=renameColumns, echo=TRUE>>=
+names(temperatureAndFlow)
+
+temperatureAndFlow <- renameColumns(temperatureAndFlow)
+names(temperatureAndFlow)
+@
+
 An example of plotting the above data (Figure \ref{fig:getNWIStemperaturePlot}):
 
 <<getNWIStemperaturePlot, echo=TRUE, fig.cap="Temperature and discharge plot of Choptank River in 2012.",out.width='1\\linewidth',out.height='1\\linewidth',fig.show='hold'>>=
-par(mar=c(5,5,5,5))
+par(mar=c(5,5,5,5)) #sets the size of the plot window
 
 with(temperatureAndFlow, plot(
   datetime, Temperature_water_degrees_Celsius_Max_01,
@@ -313,10 +324,12 @@ with(temperatureAndFlow, plot(
 axis(4,col="red",col.axis="red")
 mtext("Mean Discharge [cfs]",side=4,line=3,col="red")
 title(paste(ChoptankInfo$station.nm,"2012",sep=" "))
+legend("topleft", c("Max Temperature", "Mean Discharge"), 
+       col=c("black","red"),lty=c(NA,1),pch=c(1,NA))
 @
 
 
-There are occasions where NWIS values are not reported as numbers, instead there might be text describing a certain event such as \texttt{"}Ice\texttt{"}.  Any value that cannot be converted to a number will be reported as NA in this package.
+There are occasions where NWIS values are not reported as numbers, instead there might be text describing a certain event such as `Ice'.  Any value that cannot be converted to a number will be reported as NA in this package (not including remark code columns).
 
 \FloatBarrier
 
@@ -324,7 +337,7 @@ There are occasions where NWIS values are not reported as numbers, instead there
 \subsection{Unit Values}
 \label{sec:usgsRT}
 %------------------------------------------------------------
-Any data that are collected at regular time intervals (such as 15-minute or hourly) are known as \texttt{"}Unit Values\texttt{"} - many of these are delivered on a real time basis and very recent data (even less than an hour old in many cases) are available through the function retrieveUnitNWISData.  Some of these Unit Values are available for the past several years, and some are only available for a recent time period such as 120 days or a year.  Here is an example of a retrieval of such data.  
+Any data that are collected at regular time intervals (such as 15-minute or hourly) are known as `unit values'. Many of these are delivered on a real time basis and very recent data (even less than an hour old in many cases) are available through the function retrieveUnitNWISData.  Some of these unit values are available for many years, and some are only available for a recent time period such as 120 days.  Here is an example of a retrieval of such data.  
 
 <<label=getNWISUnit, echo=TRUE>>=
 
@@ -415,6 +428,7 @@ url_uv <- constructNWISURL(siteNumber,"00060",startDate,endDate,'uv')
 
 %------------------------------------------------------------
 \section{Data Retrievals Structured For Use In The EGRET Package}
+\label{sec:EGRETdfs}
 %------------------------------------------------------------ 
 Rather than using the raw data as retrieved by the web, the dataRetrieval package also includes functions that return the data in a structure that has been designed to work with the EGRET R package (\url{https://github.com/USGS-R/EGRET/wiki}). In general, these dataframes may be much more 'R-friendly' than the raw data, and will contain additional date information that allows for efficient data analysis.
 
@@ -422,6 +436,7 @@ In this section, we use 3 dataRetrieval functions to get sufficient data to perf
 
 %------------------------------------------------------------
 \subsection{INFO Data}
+\label{INFOsubsection}
 %------------------------------------------------------------
 The function to obtain metadata, or data about the streamgage and measured parameters is getMetaData. This function combines getSiteFileData and getParameterInfo, producing one dataframe called INFO.
 
@@ -435,6 +450,7 @@ INFO <-getMetaData(siteNumber,parameterCd, interactive=FALSE)
 
 %------------------------------------------------------------
 \subsection{Daily Data}
+\label{Dailysubsection}
 %------------------------------------------------------------
 The function to obtain the daily values (discharge in this case) is getDVData.  It requires the inputs siteNumber, ParameterCd, StartDate, EndDate, interactive, and convert. Most of these arguments are described in the previous section, however \texttt{"}convert\texttt{"} is a new argument (defaults to TRUE), and it tells the program to convert the values from cubic feet per second (cfs) to cubic meters per second (cms). For EGRET applications with NWIS web retrieval, do not use this argument (the default is TRUE), EGRET assumes that discharge is always in cubic meters per second. If you don't want this conversion and are not using EGRET, set convert=FALSE in the function call. 
 
@@ -451,7 +467,7 @@ Details of the Daily dataframe are listed below:
 <<colNamesDaily, echo=FALSE,results='asis'>>=
 ColumnName <- c("Date", "Q", "Julian","Month","Day","DecYear","MonthSeq","Qualifier","i","LogQ","Q7","Q30")
 Type <- c("Date", "number", "number","integer","integer","number","integer","string","integer","number","number","number")
-Description <- c("Date", "Discharge in cms", "Number of days since January 1, 1850", "Month of the year [1-12]", "Day of the year [1-366]", "Decimal year", "Number of months since January 1, 1850", "Qualifing code", "Index of days, starting with 1", "Natural logarithm of Q", "7 day running average of Q", "30 running average of Q")
+Description <- c("Date", "Discharge in cms", "Number of days since January 1, 1850", "Month of the year [1-12]", "Day of the year [1-366]", "Decimal year", "Number of months since January 1, 1850", "Qualifing code", "Index of days, starting with 1", "Natural logarithm of Q", "7 day running average of Q", "30 day running average of Q")
 Units <- c("date", "cms","days", "months","days","years","months", "character","days","numeric","cms","cms")
 
 DF <- data.frame(ColumnName,Type,Description,Units)
@@ -467,6 +483,7 @@ If there are discharge values of zero, the code will add a small constant to all
 
 %------------------------------------------------------------
 \subsection{Sample Data}
+\label{Samplesubsection}
 %------------------------------------------------------------
 The function to obtain USGS sample data from the water quality portal is getSampleData. The arguments for this function are also siteNumber, ParameterCd, StartDate, EndDate, interactive. These are the same inputs as getRawQWData or getQWData as described in the previous section.
 
@@ -490,11 +507,42 @@ Sample <-getSTORETSampleData(site,characteristicName,
 
 Details of the Sample dataframe are listed below:
 
-\begin{table}[!ht]
-\begin{minipage}{\linewidth}
-\begin{center}
-\caption{Sample dataframe} 
-\begin{tabular}{llll}
+
+% \begin{table}[!ht]
+% \begin{minipage}{\linewidth}
+% \begin{center}
+% \caption{Sample dataframe} 
+% \begin{tabular}{llll}
+%   \hline
+% ColumnName & Type & Description & Units \\ 
+%   \hline
+% Date & Date & Date & date \\ 
+%   ConcLow & number & Lower limit of concentration & mg/L \\ 
+%   ConcHigh & number & Upper limit of concentration & mg/L \\ 
+%   Uncen & integer & Uncensored data (1=true, 0=false) & integer \\ 
+%   ConcAve & number & Average of ConcLow and ConcHigh & mg/L \\ 
+%   Julian & number & Number of days since January 1, 1850 & days \\ 
+%   Month & integer & Month of the year [1-12] & months \\ 
+%   Day & integer & Day of the year [1-366] & days \\ 
+%   DecYear & number & Decimal year & years \\ 
+%   MonthSeq & integer & Number of months since January 1, 1850 & months \\ 
+%   SinDY & number & Sine of DecYear & numeric \\ 
+%   CosDY & number & Cosine of DecYear & numeric \\ 
+%   Q \footnotemark[1] & number & Discharge & cms \\ 
+%   LogQ \footnotemark[1] & number & Natural logarithm of discharge & numeric \\ 
+%    \hline
+% \end{tabular}
+% \end{center}
+% \footnotetext[1]{Discharge columns are populated from data in the Daily dataframe after calling the mergeReport function.}
+% \end{minipage}
+% \end{table}
+
+\begin{table}
+  \centering
+  \begin{threeparttable}[b]
+  \caption{Sample dataframe}
+  \label{tab:SampleDataframe}
+  \begin{tabular}{llll}
   \hline
 ColumnName & Type & Description & Units \\ 
   \hline
@@ -510,15 +558,16 @@ Date & Date & Date & date \\
   MonthSeq & integer & Number of months since January 1, 1850 & months \\ 
   SinDY & number & Sine of DecYear & numeric \\ 
   CosDY & number & Cosine of DecYear & numeric \\ 
-  Q \footnotemark[1] & number & Discharge & cms \\ 
-  LogQ \footnotemark[1] & number & Natural logarithm of discharge & numeric \\ 
+  Q \tnote{1} & number & Discharge & cms \\ 
+  LogQ \tnote{1} & number & Natural logarithm of discharge & numeric \\ 
    \hline
 \end{tabular}
-\end{center}
-\end{minipage}
-\end{table}
 
-\footnotetext[1]{Discharge columns are populated from data in the Daily dataframe after calling the mergeReport function.}
+  \begin{tablenotes}
+    \item[1] Discharge columns are populated from data in the Daily dataframe after calling the mergeReport function.
+  \end{tablenotes}
+ \end{threeparttable}
+\end{table}
 
 The next section will talk about summing multiple constituents, including how interval censoring is used. Since the Sample data frame is structured to only contain one constituent, when more than one parameter codes are requested, the getSampleData function will sum the values of each constituent as described below.
 
@@ -530,11 +579,11 @@ The next section will talk about summing multiple constituents, including how in
 %------------------------------------------------------------
 In the typical case where none of the data are censored (that is, no values are reported as \texttt{"}less-than\texttt{"} values) the ConcLow = ConcHigh = ConcAve all of which are equal to the reported value and Uncen=0.  In the typical form of censoring where a value is reported as less than the reporting limit, then ConcLow = NA, ConcHigh = reporting limit, ConcAve = 0.5 * reporting limit, and Uncen = 1.
 
-As an example to understand how the dataRetrieval package handles a more complex censoring problem, let us say that in 2004 and earlier, we computed a total phosphorus (tp) as the sum of dissolved phosphorus (dp) and particulate phosphorus (pp). From 2005 and onward, we have direct measurements of total phosphorus (tp). A small subset of this fictional data looks like \ref{tab:exampleComplexQW}.
+As an example to understand how the dataRetrieval package handles a more complex censoring problem, let us say that in 2004 and earlier, we computed a total phosphorus (tp) as the sum of dissolved phosphorus (dp) and particulate phosphorus (pp). From 2005 and onward, we have direct measurements of total phosphorus (tp). A small subset of this fictional data looks like Table \ref{tab:exampleComplexQW}.
 
 
 
-<<label=exampleComplexQW, echo=FALSE, eval=TRUE,results='asis'>>=
+<<label=tab:exampleComplexQW, echo=FALSE, eval=TRUE,results='asis'>>=
 cdate <- c("2003-02-15","2003-06-30","2004-09-15","2005-01-30","2005-05-30","2005-10-30")
 rdp <- c("", "<","<","","","")
 dp <- c(0.02,0.01,0.005,NA,NA,NA)
@@ -545,13 +594,10 @@ tp <- c(NA,NA,NA,0.43,0.05,0.02)
 
 DF <- data.frame(cdate,rdp,dp,rpp,pp,rtp,tp,stringsAsFactors=FALSE)
 
-xtable(DF, caption="Example data")
+xtable(DF, caption="Example data",digits=c(0,0,0,3,0,3,0,3),label="tab:exampleComplexQW")
 
 @
 
-
-
-
 The dataRetrieval package will \texttt{"}add up\texttt{"} all the values in a given row to form the total for that sample. Thus, you only want to enter data that should be added together. For example, we might know the value for dp on 5/30/2005, but we don't want to put it in the table because under the rules of this data set, we are not suppose to add it in to the values in 2005.
 
 For every sample, the EGRET package requires a pair of numbers to define an interval in which the true value lies (ConcLow and ConcHigh). In a simple non-censored case (the reported value is above the detection limit), ConcLow equals ConcHigh and the interval collapses down to a single point.In a simple censored case, the value might be reported as \verb@<@0.2, then ConcLow=NA and ConcHigh=0.2. We use NA instead of 0 as a way to elegantly handle future logarithm calculations.
@@ -585,7 +631,7 @@ Text files that contain this sort of data require some sort of a separator, for
 
 Finally, qUnit is a numeric argument that defines the discharge units used in the input file.  The default is qUnit = 1 which assumes discharge is in cubic feet per second.  If the discharge in the file is already in cubic meters per second then set qUnit = 2.  If it is in some other units (like liters per second or acre-feet per day), the user will have to pre-process the data with a unit conversion that changes it to either cubic feet per second or cubic meters per second.
 
-So, if you have a file called \texttt{"}ChoptankRiverFlow.txt\texttt{"} located in a folder called \texttt{"}RData\texttt{"} on the C drive (this is a Window's example), and the file is structured as follows (tab-separated):
+So, if you have a file called \texttt{"}ChoptankRiverFlow.txt\texttt{"} located in a folder called \texttt{"}RData\texttt{"} on the C drive (this is a Windows example), and the file is structured as follows (tab-separated):
 \begin{verbatim}
 date  Qdaily
 10/1/1999  107
@@ -632,12 +678,12 @@ Sample <- getSampleDataFromFile(filePath,fileName,
 If multiple constituents are going to be summed, the format can be date, remark\_A, value\_A, remark\_b, value\_b, etc... A tab-separated example might look like this, where the columns are remark dissolved phosphate (rdp), dissolved phosphate (dp), remark particulate phosphorus (rpp), particulate phosphorus (pp), remark total phosphate (rtp), and total phosphate (tp):
 \begin{verbatim}
 date  rdp	dp	rpp	pp	rtp	tp
-2003-02-15		0.02		0.5		
-2003-06-30	<	0.01		0.3		
-2004-09-15	<	0	<	0.2		
-2005-01-30						0.43
-2005-05-30					<	0.05
-2005-10-30					<	0.02
+2003-02-15		0.020		0.500		
+2003-06-30	<	0.010		0.300		
+2004-09-15	<	0.005	<	0.200		
+2005-01-30						0.430
+2005-05-30					<	0.050
+2005-10-30					<	0.020
 ...
 \end{verbatim}
 
@@ -674,7 +720,7 @@ head(Sample)
 %------------------------------------------------------------
 \subsection{EGRET Plots}
 %------------------------------------------------------------
-As has been mentioned, the data is specifically formatted to be used with the EGRET package. The EGRET package has powerful modeling capabilities using WRTDS, but also has a variety of graphing and tabular tools to explore the data without using the WRTDS algorithm. See the EGRET vignette, user guide, and/or wiki (\url{https://github.com/USGS-R/EGRET/wiki}) for detailed information. The following figure is an example of one of the plotting functions that can be used directly from the dataRetrieval dataframes.
+As has been mentioned, the Daily, Sample, and INFO data frames whose construction is described in Secs. \ref{INFOsubsection} - \ref{Samplesubsection} are specifically formatted to be used with the EGRET package. The EGRET package has powerful modeling capabilities using WRTDS, but also has a variety of graphing and tabular tools to explore the data without using the WRTDS algorithm. See the EGRET vignette, user guide, and/or wiki (\url{https://github.com/USGS-R/EGRET/wiki}) for detailed information. The following figure is an example of one of the plotting functions that can be used directly from the dataRetrieval dataframes.
 
 <<egretEx, echo=TRUE, eval=TRUE, fig.cap="Default multiPlotDataOverview">>=
 # Continuing Choptank example from the previous sections
@@ -682,51 +728,89 @@ library(EGRET)
 multiPlotDataOverview()
 @
 
-
+\FloatBarrier
 \clearpage
 
+
 %------------------------------------------------------------
 \section{Summary}
+\label{sec:summary}
 %------------------------------------------------------------
 
-The following table summarizes the data retrieval functions:
+Tables \ref{tab:dataRetrievalFunctions1} and \ref{tab:dataRetrievalMisc} summarize the data retrieval functions:
 
-\begin{table}[!ht]
-\begin{minipage}{\linewidth}
-\begin{center}
-\caption{dataRetrieval functions} 
+\begin{table}
+  \centering
+  \begin{threeparttable}[b]
+  \caption{dataRetrieval functions}
+  \label{tab:dataRetrievalFunctions1}
 \begin{tabular}{lll}
   \hline
 Data Type & Function Name & Description \\ 
   \hline
   Daily & retrieveNWISData & Raw USGS daily data \\
-  Daily\footnotemark[1] & getDVData & USGS daily values \\
-  Daily\footnotemark[1] & getDailyDataFromFile & User generated daily data \\
+  Daily\tnote{1} & getDVData & USGS daily values \\
+  Daily\tnote{1} & getDailyDataFromFile & User generated daily data \\
   Sample & retrieveNWISqwData & Raw USGS water quality data \\
   Sample & getRawQWData & Raw Water Quality Data Portal data \\
   Sample & getQWDataFromFile & Raw user generated water quality data \\
   Sample & getQWData & USGS Water Quality Portal data \\
   Sample & getWQPData & General Water Quality Portal\\
-  Sample\footnotemark[1] & getSampleData & USGS water quality data\\  
-  Sample\footnotemark[1] & getSTORETSampleData & STORET Water Quality Data Portal data \\
-  Sample\footnotemark[1] & getSampleDataFromFile & User generated sample data \\
+  Sample\tnote{1} & getSampleData & USGS water quality data\\  
+  Sample\tnote{1} & getSTORETSampleData & STORET Water Quality Data Portal data \\
+  Sample\tnote{1} & getSampleDataFromFile & User generated sample data \\
   Unit & retrieveUnitNWISData & Raw USGS instantaneous data \\  
-  Information\footnotemark[1] & getMetaData & USGS station and parameter code information \\
+  Information\tnote{1} & getMetaData & USGS station and parameter code information \\
   Information & getParameterInfo & USGS parameter code information \\
   Information & getSiteFileData & USGS station information \\
   Information & getDataAvailability & Data available at USGS stations \\
    \hline
 \end{tabular}
-\end{center}
-\end{minipage}
+
+  \begin{tablenotes}
+    \item[1] Indicates that the function creates a data frame suitable for use in EGRET software
+  \end{tablenotes}
+ \end{threeparttable}
 \end{table}
 
-\footnotetext[1]{Indicates that function creates a data frame suitable for use in EGRET software}
+
+% \begin{table}[!ht]
+% \begin{minipage}{\linewidth}
+% \begin{center}
+% \caption{dataRetrieval functions} 
+% \begin{tabular}{lll}
+%   \hline
+% Data Type & Function Name & Description \\ 
+%   \hline
+%   Daily & retrieveNWISData & Raw USGS daily data \\
+%   Daily\footnotemark[1] & getDVData & USGS daily values \\
+%   Daily\footnotemark[1] & getDailyDataFromFile & User generated daily data \\
+%   Sample & retrieveNWISqwData & Raw USGS water quality data \\
+%   Sample & getRawQWData & Raw Water Quality Data Portal data \\
+%   Sample & getQWDataFromFile & Raw user generated water quality data \\
+%   Sample & getQWData & USGS Water Quality Portal data \\
+%   Sample & getWQPData & General Water Quality Portal\\
+%   Sample\footnotemark[1] & getSampleData & USGS water quality data\\  
+%   Sample\footnotemark[1] & getSTORETSampleData & STORET Water Quality Data Portal data \\
+%   Sample\footnotemark[1] & getSampleDataFromFile & User generated sample data \\
+%   Unit & retrieveUnitNWISData & Raw USGS instantaneous data \\  
+%   Information\footnotemark[1] & getMetaData & USGS station and parameter code information \\
+%   Information & getParameterInfo & USGS parameter code information \\
+%   Information & getSiteFileData & USGS station information \\
+%   Information & getDataAvailability & Data available at USGS stations \\
+%    \hline
+% \end{tabular}
+% \end{center}
+% \end{minipage}
+% \end{table}
+% 
+% \footnotetext[1]{Indicates that function creates a data frame suitable for use in EGRET software}
 
 \begin{table}[!ht]
 \begin{minipage}{\linewidth}
 \begin{center}
 \caption{dataRetrieval miscellaneous functions} 
+\label{tab:dataRetrievalMisc}
 \begin{tabular}{ll}
   \hline
 Function Name & Description \\ 
@@ -745,6 +829,7 @@ Function Name & Description \\
 \end{minipage}
 \end{table}
 
+\FloatBarrier
 \clearpage
 
 
@@ -767,7 +852,7 @@ At any time, you can get information about any function in R by typing a questio
 ?removeDuplicates
 @
 
-To see the raw code for a particular code, type the name of the function:
+To see the raw code for a particular code, type the name of the function, without parentheses.:
 <<rawFunc,eval = TRUE>>=
 removeDuplicates
 @
@@ -781,14 +866,22 @@ vignette(dataRetrieval)
 %------------------------------------------------------------
 \subsection{R User: Installing dataRetrieval}
 %------------------------------------------------------------ 
-Before installing dataRetrieval, the zoo packages must be installed from CRAN:
+Before installing dataRetrieval, a number of packages upon which dataRetrieval depends need to be installed must be installed from CRAN:
 
 <<installFromCran,eval = FALSE>>=
 install.packages(c("zoo","XML","RCurl","plyr"))
 install.packages("dataRetrieval", repos="http://usgs-r.github.com")
 @
 
-It is a good idea to re-start R after installing the package, especially if installing an updated version. Some users have found it necessary to delete the previous version's package folder before installing newer version of dataRetrieval. If you are experiencing issues after updating a package, trying deleting the package folder - the default location for Windows is something like this: C:/Users/userA/Documents/R/win-library/2.15/dataRetrieval, and the default for a Mac: /Users/userA/Library/R/2.15/library/dataRetrieval. Then, re-install the package using the directions above. Moving to CRAN should solve this problem.
+It is a good idea to re-start R after installing the package, especially if installing an updated version. Some users have found it necessary to delete the previous version's package folder before installing newer version of dataRetrieval. If you are experiencing issues after updating a package, trying deleting the package folder - the default location for Windows is something like:
+
+C:/Users/userA/Documents/R/win-library/2.15/dataRetrieval
+
+The default for a Mac is something like:
+
+/Users/userA/Library/R/2.15/library/dataRetrieval
+
+Then, re-install the package using the directions above. Moving to CRAN should solve this problem.
 
 After installing the package, you need to open the library each time you re-start R.  This is done with the simple command:
 <<openLibraryTest, eval=FALSE>>=
@@ -800,7 +893,7 @@ library(dataRetrieval)
 \section{Creating tables in Microsoft from R}
 \label{app:createWordTable}
 %------------------------------------------------------------
-There are a few steps that are required in order to create a table in a Microsoft product (Excel, Word, Powerpoint, etc.) from an R dataframe. There are certainly a variety of good methods, one of which is detailed here. The example we will step through here will be to create a table in Microsoft Word based on the dataframe tableData:
+There are a few steps that are required in order to create a table in a Microsoft product (Excel, Word, Powerpoint, etc.) from an R dataframe. There are certainly a variety of good methods, one of which is detailed here. The example we will step through here will be to create a table in Microsoft Excel based on the dataframe tableData:
 
 <<label=getSiteApp, echo=TRUE>>=
 availableData <- getDataAvailability(siteNumber)
-- 
GitLab