Skip to content
Snippets Groups Projects
Commit 2c50d96f authored by Jessica Thompson's avatar Jessica Thompson
Browse files

updates to vignette

parent 0d7bd849
No related branches found
No related tags found
No related merge requests found
...@@ -79,10 +79,10 @@ knit_hooks$set(crop = hook_pdfcrop) ...@@ -79,10 +79,10 @@ knit_hooks$set(crop = hook_pdfcrop)
%------------------------------------------------------------ %------------------------------------------------------------
\section{Introduction to dataRetrieval} \section{Introduction to dataRetrieval}
%------------------------------------------------------------ %------------------------------------------------------------
The dataRetrieval package was created to simplify the process of loading hydrology data into the R environment. It has been specifically designed to work seamlessly with the EGRET R package: Exploration and Graphics for RivEr Trends. See: \url{https://github.com/USGS-R/EGRET/wiki} for information on EGRET. EGRET is designed to provide analysis of water quality data sets using the Weighted Regressions on Time, Discharge and Season (WRTDS) method as well as analysis of discharge trends using robust time-series smoothing techniques. Both of these capabilities provide both tabular and graphical analyses of long-term data sets. The dataRetrieval package was created to simplify the process of loading hydrologic data into the R environment. It has been specifically designed to work seamlessly with the EGRET R package: Exploration and Graphics for RivEr Trends. See: \url{https://github.com/USGS-R/EGRET/wiki} for information on EGRET. EGRET is designed to provide analysis of water quality data sets using the Weighted Regressions on Time, Discharge and Season (WRTDS) method as well as analysis of discharge trends using robust time-series smoothing techniques. Both of these capabilities provide both tabular and graphical analyses of long-term data sets.
The dataRetrieval package is designed to retrieve many of the major data types of United States Geological Survey (USGS) hydrology data that are available on the web. Users may also load data from other sources (text files, spreadsheets) using dataRetrieval. Section \ref{sec:genRetrievals} provides examples of how one can obtain raw data from USGS sources on the web and ingest them into data frames within the R environment. The functionality described in section \ref{sec:genRetrievals} is for general use and is not tailored for the specific uses of the EGRET package. The functionality described in section \ref{sec:EGRETdfs} is tailored specifically to obtaining input from the web and structuring it for use in the EGRET package. The functionality described in section \ref{sec:summary} is for converting hydrology data from user-supplied files and structuring it specifically for use in the EGRET package. The dataRetrieval package is designed to retrieve many of the major data types of United States Geological Survey (USGS) hydrologic data that are available on the web. Users may also load data from other sources (text files, spreadsheets) using dataRetrieval. Section \ref{sec:genRetrievals} provides examples of how one can obtain raw data from USGS sources on the web and ingest them into data frames within the R environment. The functionality described in section \ref{sec:genRetrievals} is for general use and is not tailored for the specific uses of the EGRET package. The functionality described in section \ref{sec:EGRETdfs} is tailored specifically to obtaining input from the web and structuring it for use in the EGRET package. The functionality described in section \ref{sec:summary} is for converting hydrologic data from user-supplied files and structuring it specifically for use in the EGRET package.
For information on getting started in R and installing the package, see (\ref{sec:appendix1}): Getting Started. For information on getting started in R and installing the package, see (\ref{sec:appendix1}): Getting Started.
...@@ -119,14 +119,14 @@ Sample <- mergeReport() ...@@ -119,14 +119,14 @@ Sample <- mergeReport()
\section{General USGS Web Retrievals} \section{General USGS Web Retrievals}
\label{sec:genRetrievals} \label{sec:genRetrievals}
%------------------------------------------------------------ %------------------------------------------------------------
In this section, we will run through 5 examples, which document how to get raw data from the web. This data includes site information (\ref{sec:usgsSite}), measured parameter information (\ref{sec:usgsParams}), historical daily values(\ref{sec:usgsDaily}), unit values (which include real-time data but can also include other sensor data stored at regular time intervals) (\ref{sec:usgsRT}), and water quality data (\ref{sec:usgsWQP}) or (\ref{sec:usgsSTORET}). We will use the Choptank River near Greensboro, MD as an example. The site-ID for this gage station is 01491000. Daily discharge measurements are available as far back as 1948. Additionally, nitrate has been measured dating back to 1964. The functions/examples in this section are for raw data retrieval. In the next section, we will use functions that retrieve and process the data in a dataframe that may prove friendlier for R analysis, and is specifically tailored to EGRET analysis. In this section, we will run through 5 examples, which document how to get raw data from the web. This data includes site information (\ref{sec:usgsSite}), measured parameter information (\ref{sec:usgsParams}), historical daily values(\ref{sec:usgsDaily}), unit values (which include real-time data but can also include other sensor data stored at regular time intervals) (\ref{sec:usgsRT}), and water quality data (\ref{sec:usgsWQP}) or (\ref{sec:usgsSTORET}). We will use the Choptank River near Greensboro, MD as an example. The site-ID for this streamgage is 01491000. Daily discharge measurements are available as far back as 1948. Additionally, nitrate has been measured since 1964. The functions/examples in this section are for raw data retrieval. In the next section, we will use functions that retrieve and process the data in a dataframe that may prove friendlier for R analysis, and is specifically tailored to EGRET analysis.
%------------------------------------------------------------ %------------------------------------------------------------
\subsection{Introduction} \subsection{Introduction}
%------------------------------------------------------------ %------------------------------------------------------------
The USGS organizes their hydrology data in a standard structure. Streamgages are located throughout the United States, and each streamgage has a unique ID. Often (but not always), these ID's are 8 digits. The first step to finding data is discovering this 8-digit ID. There are many ways to do this, one is the National Water Information System: Mapper \url{http://maps.waterdata.usgs.gov/mapper/index.html}. The USGS organizes hydrologic data in a standard structure. Streamgages are located throughout the United States, and each streamgage has a unique ID. Often (but not always), these ID's are 8 digits. The first step to finding data is discovering this 8-digit ID. There are many ways to do this, one is the National Water Information System: Mapper \url{http://maps.waterdata.usgs.gov/mapper/index.html}.
Once the site-ID is known, the next required input for USGS data retrievals is the `parameter code'. This is a 5-digit code that specifies what measured parameter is being requested. For example, parameter code 00631 represents `Nitrate plus nitrite, water, filtered, milligrams per liter as nitrogen', with units of `mg/l as N'. A complete list of possible USGS parameter codes can be found at \url{http://go.usa.gov/bVDz}. Once the site-ID is known, the next required input for USGS data retrievals is the `parameter code'. This is a 5-digit code that specifies what measured parameter is being requested. For example, parameter code 00631 represents `Nitrate plus nitrite, water, filtered, milligrams per liter as nitrogen', with units of `mg/l as N'. A complete list of possible USGS parameter codes can be found at \url{http://nwis.waterdata.usgs.gov/usa/nwis/pmcodes?help}.
Not every station will measure all parameters. A short list of commonly measured parameters is shown in Table \ref{tab:params}. Not every station will measure all parameters. A short list of commonly measured parameters is shown in Table \ref{tab:params}.
...@@ -154,7 +154,7 @@ subset(parameterCdFile,parameter_cd %in% c("00060","00010","00400")) ...@@ -154,7 +154,7 @@ subset(parameterCdFile,parameter_cd %in% c("00060","00010","00400"))
@ @
For unit values data (sensor data), knowing the parameter code and site ID is enough to make a request for data. For most variables that are measured on a continuous basis, the USGS also stores the historical data as daily values. These daily values are statistical summaries of the continuous data, e.g. maximum, minimum, mean, median. The different statistics are specified by a 5-digit statistics code. A complete list of statistic codes can be found here: For unit values data (sensor data measured at regular time intervals such as 15 minutes or hourly), knowing the parameter code and site ID is enough to make a request for data. For most variables that are measured on a continuous basis, the USGS also stores the historical data as daily values. These daily values are statistical summaries of the continuous data, e.g. maximum, minimum, mean, or median. The different statistics are specified by a 5-digit statistics code. A complete list of statistic codes can be found here:
\url{http://nwis.waterdata.usgs.gov/nwis/help/?read_file=stat&format=table} \url{http://nwis.waterdata.usgs.gov/nwis/help/?read_file=stat&format=table}
...@@ -363,7 +363,7 @@ Note that time now becomes important, so the variable datetime is a POSIXct, and ...@@ -363,7 +363,7 @@ Note that time now becomes important, so the variable datetime is a POSIXct, and
\subsection{Water Quality Values} \subsection{Water Quality Values}
\label{sec:usgsWQP} \label{sec:usgsWQP}
%------------------------------------------------------------ %------------------------------------------------------------
To get USGS water quality data from water samples collected at the streamgage (as distinct from unit values collected through some type of automatic monitor) we can use the Water Quality Data Portal: \url{http://www.waterqualitydata.us/}. The raw data are obtained from the function getRawQWData, with the similar input arguments: siteNumber, parameterCd, startDate, endDate, and interactive. The difference is in parameterCd, in this function multiple parameters can be queried using a vector, and setting parameterCd to \texttt{"}\texttt{"} will return all of the measured observations. The raw data may be overwhelming, a simplified version of the data can be obtained using getQWData. There is a large amount of data returned for each observation. To get USGS water quality data from water samples collected at the streamgage or other monitoring site (as distinct from unit values collected through some type of automatic monitor) we can use the Water Quality Data Portal: \url{http://www.waterqualitydata.us/}. The raw data are obtained from the function getRawQWData, with the similar input arguments: siteNumber, parameterCd, startDate, endDate, and interactive. The difference is in parameterCd, in this function multiple parameters can be queried using a vector, and setting parameterCd to \texttt{"}\texttt{"} will return all of the measured observations. The raw data may be overwhelming, a simplified version of the data can be obtained using getQWData. There is a large amount of data returned for each observation.
<<label=getQW, echo=TRUE>>= <<label=getQW, echo=TRUE>>=
...@@ -454,7 +454,7 @@ INFO <-getMetaData(siteNumber,parameterCd, interactive=FALSE) ...@@ -454,7 +454,7 @@ INFO <-getMetaData(siteNumber,parameterCd, interactive=FALSE)
\subsection{Daily Data} \subsection{Daily Data}
\label{Dailysubsection} \label{Dailysubsection}
%------------------------------------------------------------ %------------------------------------------------------------
The function to obtain the daily values (discharge in this case) is getDVData. It requires the inputs siteNumber, ParameterCd, StartDate, EndDate, interactive, and convert. Most of these arguments are described in the previous section, however `convert' is a new argument (defaults to TRUE). The convert argument tells the program to convert the values from cubic feet per second (cfs) to cubic meters per second (cms). For EGRET applications with NWIS web retrieval, do not use this argument (the default is TRUE), EGRET assumes that discharge is always in cubic meters per second. If you don't want this conversion and are not using EGRET, set convert=FALSE in the function call. The function to obtain the daily values (discharge in this case) is getDVData. It requires the inputs siteNumber, ParameterCd, StartDate, EndDate, interactive, and convert. Most of these arguments are described in the previous section, however `convert' is a new argument (defaults to TRUE). The convert argument tells the program to convert the values from cubic feet per second (cfs) to cubic meters per second (cms). For EGRET applications with NWIS web retrieval, do not use this argument (the default is TRUE), EGRET assumes that discharge is always stored in units of cubic meters per second. If you don't want this conversion and are not using EGRET, set convert=FALSE in the function call.
<<firstExample>>= <<firstExample>>=
siteNumber <- "01491000" siteNumber <- "01491000"
...@@ -479,7 +479,7 @@ xtable(DF, caption="Daily dataframe") ...@@ -479,7 +479,7 @@ xtable(DF, caption="Daily dataframe")
@ @
If there are discharge values of zero, the code will add a small constant to all of the daily discharges. This constant is 0.001 times the mean discharge. The code will also report on the number of zero and negative values and the size of the constant. EGRET should only be used if the number of zero values is a very small fraction of the total days in the record (say less than 0.1\% of the days), and there are no negative discharge values. Columns Q7 and Q30 are the 7 and 30 day running averages for the 7 or 30 days ending on this specific date. If there are negative discharge values or discharge values of zero, the code will set all of these to zero and then add a small constant to all of the daily discharge values. This constant is 0.001 times the mean discharge. The code will also report on the number of zero and negative values and the size of the constant. EGRET should only be used if the number of zero values is a very small fraction of the total days in the record (say less than 0.1\% of the days), and there are no negative discharge values. Columns Q7 and Q30 are the 7 and 30 day running averages for the 7 or 30 days ending on this specific date.
\FloatBarrier \FloatBarrier
...@@ -579,7 +579,7 @@ The next section will talk about summing multiple constituents, including how in ...@@ -579,7 +579,7 @@ The next section will talk about summing multiple constituents, including how in
%------------------------------------------------------------ %------------------------------------------------------------
\subsection{Censored Values: Summation Explanation} \subsection{Censored Values: Summation Explanation}
%------------------------------------------------------------ %------------------------------------------------------------
In the typical case where none of the data are censored (that is, no values are reported as `less-than' values) the ConcLow = ConcHigh = ConcAve all of which are equal to the reported value and Uncen=0. For the most common type of censoring, where a value is reported as less than the reporting limit, then ConcLow = NA, ConcHigh = reporting limit, ConcAve = 0.5 * reporting limit, and Uncen = 1. In the typical case where none of the data are censored (that is, no values are reported as `less-than' values) the ConcLow = ConcHigh = ConcAve all of which are equal to the reported value and Uncen=1. For the most common type of censoring, where a value is reported as less than the reporting limit, then ConcLow = NA, ConcHigh = reporting limit, ConcAve = 0.5 * reporting limit, and Uncen = 0.
As an example to understand how the dataRetrieval package handles a more complex censoring problem, let us say that in 2004 and earlier, we computed total phosphorus (tp) as the sum of dissolved phosphorus (dp) and particulate phosphorus (pp). From 2005 and onward, we have direct measurements of total phosphorus (tp). A small subset of this fictional data looks like Table \ref{tab:exampleComplexQW}. As an example to understand how the dataRetrieval package handles a more complex censoring problem, let us say that in 2004 and earlier, we computed total phosphorus (tp) as the sum of dissolved phosphorus (dp) and particulate phosphorus (pp). From 2005 and onward, we have direct measurements of total phosphorus (tp). A small subset of this fictional data looks like Table \ref{tab:exampleComplexQW}.
...@@ -604,7 +604,7 @@ The dataRetrieval package will \texttt{"}add up\texttt{"} all the values in a gi ...@@ -604,7 +604,7 @@ The dataRetrieval package will \texttt{"}add up\texttt{"} all the values in a gi
For example, we might know the value for dp on 5/30/2005, but we don't want to put it in the table because under the rules of this data set, we are not supposed to add it in to the values in 2005. For example, we might know the value for dp on 5/30/2005, but we don't want to put it in the table because under the rules of this data set, we are not supposed to add it in to the values in 2005.
For every sample, the EGRET package requires a pair of numbers to define an interval in which the true value lies (ConcLow and ConcHigh). In a simple non-censored case (the reported value is above the detection limit), ConcLow equals ConcHigh and the interval collapses down to a single point.In a simple censored case, the value might be reported as \verb@<@0.2, then ConcLow=NA and ConcHigh=0.2. We use NA instead of 0 as a way to elegantly handle future logarithm calculations. For every sample, the EGRET package requires a pair of numbers to define an interval in which the true value lies (ConcLow and ConcHigh). In a simple non-censored case (the reported value is above the detection limit), ConcLow equals ConcHigh and the interval collapses down to a single point. In a simple censored case, the value might be reported as \verb@<@0.2, then ConcLow=NA and ConcHigh=0.2. We use NA instead of 0 as a way to elegantly handle future logarithm calculations.
For the more complex example case, let us say dp is reported as \verb@<@0.01 and pp is reported as 0.3. We know that the total must be at least 0.3 and could be as much as 0.31. Therefore, ConcLow=0.3 and ConcHigh=0.31. Another case would be if dp is reported as \verb@<@0.005 and pp is reported \verb@<@0.2. We know in this case that the true value could be as low as zero, but could be as high as 0.205. Therefore, in this case, ConcLow=NA and ConcHigh=0.205. The Sample dataframe for the example data would be: For the more complex example case, let us say dp is reported as \verb@<@0.01 and pp is reported as 0.3. We know that the total must be at least 0.3 and could be as much as 0.31. Therefore, ConcLow=0.3 and ConcHigh=0.31. Another case would be if dp is reported as \verb@<@0.005 and pp is reported \verb@<@0.2. We know in this case that the true value could be as low as zero, but could be as high as 0.205. Therefore, in this case, ConcLow=NA and ConcHigh=0.205. The Sample dataframe for the example data would be:
......
This diff is collapsed.
No preview for this file type
No preview for this file type
...@@ -120,10 +120,10 @@ ...@@ -120,10 +120,10 @@
%------------------------------------------------------------ %------------------------------------------------------------
\section{Introduction to dataRetrieval} \section{Introduction to dataRetrieval}
%------------------------------------------------------------ %------------------------------------------------------------
The dataRetrieval package was created to simplify the process of loading hydrology data into the R environment. It has been specifically designed to work seamlessly with the EGRET R package: Exploration and Graphics for RivEr Trends. See: \url{https://github.com/USGS-R/EGRET/wiki} for information on EGRET. EGRET is designed to provide analysis of water quality data sets using the Weighted Regressions on Time, Discharge and Season (WRTDS) method as well as analysis of discharge trends using robust time-series smoothing techniques. Both of these capabilities provide both tabular and graphical analyses of long-term data sets. The dataRetrieval package was created to simplify the process of loading hydrologic data into the R environment. It has been specifically designed to work seamlessly with the EGRET R package: Exploration and Graphics for RivEr Trends. See: \url{https://github.com/USGS-R/EGRET/wiki} for information on EGRET. EGRET is designed to provide analysis of water quality data sets using the Weighted Regressions on Time, Discharge and Season (WRTDS) method as well as analysis of discharge trends using robust time-series smoothing techniques. Both of these capabilities provide both tabular and graphical analyses of long-term data sets.
The dataRetrieval package is designed to retrieve many of the major data types of United States Geological Survey (USGS) hydrology data that are available on the web. Users may also load data from other sources (text files, spreadsheets) using dataRetrieval. Section \ref{sec:genRetrievals} provides examples of how one can obtain raw data from USGS sources on the web and ingest them into data frames within the R environment. The functionality described in section \ref{sec:genRetrievals} is for general use and is not tailored for the specific uses of the EGRET package. The functionality described in section \ref{sec:EGRETdfs} is tailored specifically to obtaining input from the web and structuring it for use in the EGRET package. The functionality described in section \ref{sec:summary} is for converting hydrology data from user-supplied files and structuring it specifically for use in the EGRET package. The dataRetrieval package is designed to retrieve many of the major data types of United States Geological Survey (USGS) hydrologic data that are available on the web. Users may also load data from other sources (text files, spreadsheets) using dataRetrieval. Section \ref{sec:genRetrievals} provides examples of how one can obtain raw data from USGS sources on the web and ingest them into data frames within the R environment. The functionality described in section \ref{sec:genRetrievals} is for general use and is not tailored for the specific uses of the EGRET package. The functionality described in section \ref{sec:EGRETdfs} is tailored specifically to obtaining input from the web and structuring it for use in the EGRET package. The functionality described in section \ref{sec:summary} is for converting hydrologic data from user-supplied files and structuring it specifically for use in the EGRET package.
For information on getting started in R and installing the package, see (\ref{sec:appendix1}): Getting Started. For information on getting started in R and installing the package, see (\ref{sec:appendix1}): Getting Started.
...@@ -164,20 +164,20 @@ Quick workflow for major dataRetrieval functions: ...@@ -164,20 +164,20 @@ Quick workflow for major dataRetrieval functions:
\section{General USGS Web Retrievals} \section{General USGS Web Retrievals}
\label{sec:genRetrievals} \label{sec:genRetrievals}
%------------------------------------------------------------ %------------------------------------------------------------
In this section, we will run through 5 examples, which document how to get raw data from the web. This data includes site information (\ref{sec:usgsSite}), measured parameter information (\ref{sec:usgsParams}), historical daily values(\ref{sec:usgsDaily}), unit values (which include real-time data but can also include other sensor data stored at regular time intervals) (\ref{sec:usgsRT}), and water quality data (\ref{sec:usgsWQP}) or (\ref{sec:usgsSTORET}). We will use the Choptank River near Greensboro, MD as an example. The site-ID for this gage station is 01491000. Daily discharge measurements are available as far back as 1948. Additionally, nitrate has been measured dating back to 1964. The functions/examples in this section are for raw data retrieval. In the next section, we will use functions that retrieve and process the data in a dataframe that may prove friendlier for R analysis, and is specifically tailored to EGRET analysis. In this section, we will run through 5 examples, which document how to get raw data from the web. This data includes site information (\ref{sec:usgsSite}), measured parameter information (\ref{sec:usgsParams}), historical daily values(\ref{sec:usgsDaily}), unit values (which include real-time data but can also include other sensor data stored at regular time intervals) (\ref{sec:usgsRT}), and water quality data (\ref{sec:usgsWQP}) or (\ref{sec:usgsSTORET}). We will use the Choptank River near Greensboro, MD as an example. The site-ID for this streamgage is 01491000. Daily discharge measurements are available as far back as 1948. Additionally, nitrate has been measured since 1964. The functions/examples in this section are for raw data retrieval. In the next section, we will use functions that retrieve and process the data in a dataframe that may prove friendlier for R analysis, and is specifically tailored to EGRET analysis.
%------------------------------------------------------------ %------------------------------------------------------------
\subsection{Introduction} \subsection{Introduction}
%------------------------------------------------------------ %------------------------------------------------------------
The USGS organizes their hydrology data in a standard structure. Streamgages are located throughout the United States, and each streamgage has a unique ID. Often (but not always), these ID's are 8 digits. The first step to finding data is discovering this 8-digit ID. There are many ways to do this, one is the National Water Information System: Mapper \url{http://maps.waterdata.usgs.gov/mapper/index.html}. The USGS organizes hydrologic data in a standard structure. Streamgages are located throughout the United States, and each streamgage has a unique ID. Often (but not always), these ID's are 8 digits. The first step to finding data is discovering this 8-digit ID. There are many ways to do this, one is the National Water Information System: Mapper \url{http://maps.waterdata.usgs.gov/mapper/index.html}.
Once the site-ID is known, the next required input for USGS data retrievals is the `parameter code'. This is a 5-digit code that specifies what measured parameter is being requested. For example, parameter code 00631 represents `Nitrate plus nitrite, water, filtered, milligrams per liter as nitrogen', with units of `mg/l as N'. A complete list of possible USGS parameter codes can be found at \url{http://go.usa.gov/bVDz}. Once the site-ID is known, the next required input for USGS data retrievals is the `parameter code'. This is a 5-digit code that specifies what measured parameter is being requested. For example, parameter code 00631 represents `Nitrate plus nitrite, water, filtered, milligrams per liter as nitrogen', with units of `mg/l as N'. A complete list of possible USGS parameter codes can be found at \url{http://nwis.waterdata.usgs.gov/usa/nwis/pmcodes?help}.
Not every station will measure all parameters. A short list of commonly measured parameters is shown in Table \ref{tab:params}. Not every station will measure all parameters. A short list of commonly measured parameters is shown in Table \ref{tab:params}.
% latex table generated in R 3.0.2 by xtable 1.7-1 package % latex table generated in R 3.0.2 by xtable 1.7-1 package
% Tue Feb 18 17:38:48 2014 % Fri Apr 11 13:45:43 2014
\begin{table}[ht] \begin{table}[ht]
\centering \centering
\begin{tabular}{rll} \begin{tabular}{rll}
...@@ -234,13 +234,13 @@ A complete list (as of September 25, 2013) is available as data attached to the ...@@ -234,13 +234,13 @@ A complete list (as of September 25, 2013) is available as data attached to the
For unit values data (sensor data), knowing the parameter code and site ID is enough to make a request for data. For most variables that are measured on a continuous basis, the USGS also stores the historical data as daily values. These daily values are statistical summaries of the continuous data, e.g. maximum, minimum, mean, median. The different statistics are specified by a 5-digit statistics code. A complete list of statistic codes can be found here: For unit values data (sensor data measured at regular time intervals such as 15 minutes or hourly), knowing the parameter code and site ID is enough to make a request for data. For most variables that are measured on a continuous basis, the USGS also stores the historical data as daily values. These daily values are statistical summaries of the continuous data, e.g. maximum, minimum, mean, or median. The different statistics are specified by a 5-digit statistics code. A complete list of statistic codes can be found here:
\url{http://nwis.waterdata.usgs.gov/nwis/help/?read_file=stat&format=table} \url{http://nwis.waterdata.usgs.gov/nwis/help/?read_file=stat&format=table}
Some common codes are shown in Table \ref{tab:stat}. Some common codes are shown in Table \ref{tab:stat}.
% latex table generated in R 3.0.2 by xtable 1.7-1 package % latex table generated in R 3.0.2 by xtable 1.7-1 package
% Tue Feb 18 17:38:48 2014 % Fri Apr 11 13:45:43 2014
\begin{table}[ht] \begin{table}[ht]
\centering \centering
\begin{tabular}{rll} \begin{tabular}{rll}
...@@ -329,7 +329,7 @@ To discover what data is available for a particular USGS site, including measure ...@@ -329,7 +329,7 @@ To discover what data is available for a particular USGS site, including measure
% latex table generated in R 3.0.2 by xtable 1.7-1 package % latex table generated in R 3.0.2 by xtable 1.7-1 package
% Tue Feb 18 17:38:49 2014 % Fri Apr 11 13:45:44 2014
\begin{table}[ht] \begin{table}[ht]
\centering \centering
\begin{tabular}{rlllll} \begin{tabular}{rlllll}
...@@ -337,7 +337,7 @@ To discover what data is available for a particular USGS site, including measure ...@@ -337,7 +337,7 @@ To discover what data is available for a particular USGS site, including measure
& srsname & startDate & endDate & count & units \\ & srsname & startDate & endDate & count & units \\
\hline \hline
1 & Temperature, water & 2010-10-01 & 2012-05-09 & 529 & deg C \\ 1 & Temperature, water & 2010-10-01 & 2012-05-09 & 529 & deg C \\
2 & Stream flow, mean. daily & 1948-01-01 & 2014-02-17 & 24155 & ft3/s \\ 2 & Stream flow, mean. daily & 1948-01-01 & 2014-04-10 & 24207 & ft3/s \\
3 & Specific conductance & 2010-10-01 & 2012-05-09 & 527 & uS/cm @25C \\ 3 & Specific conductance & 2010-10-01 & 2012-05-09 & 527 & uS/cm @25C \\
4 & Suspended sediment concentration (SSC) & 1980-10-01 & 1991-09-30 & 3651 & mg/l \\ 4 & Suspended sediment concentration (SSC) & 1980-10-01 & 1991-09-30 & 3651 & mg/l \\
5 & Suspended sediment discharge & 1980-10-01 & 1991-09-30 & 3652 & tons/day \\ 5 & Suspended sediment discharge & 1980-10-01 & 1991-09-30 & 3652 & tons/day \\
...@@ -560,7 +560,7 @@ Note that time now becomes important, so the variable datetime is a POSIXct, and ...@@ -560,7 +560,7 @@ Note that time now becomes important, so the variable datetime is a POSIXct, and
\subsection{Water Quality Values} \subsection{Water Quality Values}
\label{sec:usgsWQP} \label{sec:usgsWQP}
%------------------------------------------------------------ %------------------------------------------------------------
To get USGS water quality data from water samples collected at the streamgage (as distinct from unit values collected through some type of automatic monitor) we can use the Water Quality Data Portal: \url{http://www.waterqualitydata.us/}. The raw data are obtained from the function getRawQWData, with the similar input arguments: siteNumber, parameterCd, startDate, endDate, and interactive. The difference is in parameterCd, in this function multiple parameters can be queried using a vector, and setting parameterCd to \texttt{"}\texttt{"} will return all of the measured observations. The raw data may be overwhelming, a simplified version of the data can be obtained using getQWData. There is a large amount of data returned for each observation. To get USGS water quality data from water samples collected at the streamgage or other monitoring site (as distinct from unit values collected through some type of automatic monitor) we can use the Water Quality Data Portal: \url{http://www.waterqualitydata.us/}. The raw data are obtained from the function getRawQWData, with the similar input arguments: siteNumber, parameterCd, startDate, endDate, and interactive. The difference is in parameterCd, in this function multiple parameters can be queried using a vector, and setting parameterCd to \texttt{"}\texttt{"} will return all of the measured observations. The raw data may be overwhelming, a simplified version of the data can be obtained using getQWData. There is a large amount of data returned for each observation.
\begin{knitrout} \begin{knitrout}
...@@ -621,23 +621,14 @@ There are additional data sets available on the Water Quality Data Portal (\url{ ...@@ -621,23 +621,14 @@ There are additional data sets available on the Water Quality Data Portal (\url{
\begin{alltt} \begin{alltt}
\hlstd{specificCond} \hlkwb{<-} \hlkwd{getWQPData}\hlstd{(}\hlstr{'WIDNR_WQX-10032762'}\hlstd{,} \hlstd{specificCond} \hlkwb{<-} \hlkwd{getWQPData}\hlstd{(}\hlstr{'WIDNR_WQX-10032762'}\hlstd{,}
\hlstr{'Specific conductance'}\hlstd{,} \hlstr{''}\hlstd{,} \hlstr{''}\hlstd{)} \hlstr{'Specific conductance'}\hlstd{,} \hlstr{''}\hlstd{,} \hlstr{''}\hlstd{)}
\end{alltt}
{\ttfamily\noindent\color{warningcolor}{Warning: No data retrieved}}\begin{alltt}
\hlkwd{head}\hlstd{(specificCond)} \hlkwd{head}\hlstd{(specificCond)}
\end{alltt} \end{alltt}
\begin{verbatim} \begin{verbatim}
dateTime qualifier.Specific conductance [1] "No data retrieved"
1 2011-02-14
2 2011-02-17
3 2011-03-03
4 2011-03-10
5 2011-03-29
6 2011-04-07
value.Specific conductance
1 1360
2 1930
3 1240
4 1480
5 1130
6 1200
\end{verbatim} \end{verbatim}
\end{kframe} \end{kframe}
\end{knitrout} \end{knitrout}
...@@ -700,7 +691,7 @@ The function to obtain metadata, or data about the streamgage and measured param ...@@ -700,7 +691,7 @@ The function to obtain metadata, or data about the streamgage and measured param
\subsection{Daily Data} \subsection{Daily Data}
\label{Dailysubsection} \label{Dailysubsection}
%------------------------------------------------------------ %------------------------------------------------------------
The function to obtain the daily values (discharge in this case) is getDVData. It requires the inputs siteNumber, ParameterCd, StartDate, EndDate, interactive, and convert. Most of these arguments are described in the previous section, however `convert' is a new argument (defaults to TRUE). The convert argument tells the program to convert the values from cubic feet per second (cfs) to cubic meters per second (cms). For EGRET applications with NWIS web retrieval, do not use this argument (the default is TRUE), EGRET assumes that discharge is always in cubic meters per second. If you don't want this conversion and are not using EGRET, set convert=FALSE in the function call. The function to obtain the daily values (discharge in this case) is getDVData. It requires the inputs siteNumber, ParameterCd, StartDate, EndDate, interactive, and convert. Most of these arguments are described in the previous section, however `convert' is a new argument (defaults to TRUE). The convert argument tells the program to convert the values from cubic feet per second (cfs) to cubic meters per second (cms). For EGRET applications with NWIS web retrieval, do not use this argument (the default is TRUE), EGRET assumes that discharge is always stored in units of cubic meters per second. If you don't want this conversion and are not using EGRET, set convert=FALSE in the function call.
\begin{knitrout} \begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe} \definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
...@@ -721,7 +712,7 @@ There are 4750 data points, and 4750 days. ...@@ -721,7 +712,7 @@ There are 4750 data points, and 4750 days.
Details of the Daily dataframe are listed below: Details of the Daily dataframe are listed below:
% latex table generated in R 3.0.2 by xtable 1.7-1 package % latex table generated in R 3.0.2 by xtable 1.7-1 package
% Tue Feb 18 17:39:02 2014 % Fri Apr 11 13:46:09 2014
\begin{table}[ht] \begin{table}[ht]
\centering \centering
\begin{tabular}{rllll} \begin{tabular}{rllll}
...@@ -748,7 +739,7 @@ Details of the Daily dataframe are listed below: ...@@ -748,7 +739,7 @@ Details of the Daily dataframe are listed below:
If there are discharge values of zero, the code will add a small constant to all of the daily discharges. This constant is 0.001 times the mean discharge. The code will also report on the number of zero and negative values and the size of the constant. EGRET should only be used if the number of zero values is a very small fraction of the total days in the record (say less than 0.1\% of the days), and there are no negative discharge values. Columns Q7 and Q30 are the 7 and 30 day running averages for the 7 or 30 days ending on this specific date. If there are negative discharge values or discharge values of zero, the code will set all of these to zero and then add a small constant to all of the daily discharge values. This constant is 0.001 times the mean discharge. The code will also report on the number of zero and negative values and the size of the constant. EGRET should only be used if the number of zero values is a very small fraction of the total days in the record (say less than 0.1\% of the days), and there are no negative discharge values. Columns Q7 and Q30 are the 7 and 30 day running averages for the 7 or 30 days ending on this specific date.
\FloatBarrier \FloatBarrier
...@@ -858,14 +849,14 @@ The next section will talk about summing multiple constituents, including how in ...@@ -858,14 +849,14 @@ The next section will talk about summing multiple constituents, including how in
%------------------------------------------------------------ %------------------------------------------------------------
\subsection{Censored Values: Summation Explanation} \subsection{Censored Values: Summation Explanation}
%------------------------------------------------------------ %------------------------------------------------------------
In the typical case where none of the data are censored (that is, no values are reported as `less-than' values) the ConcLow = ConcHigh = ConcAve all of which are equal to the reported value and Uncen=0. For the most common type of censoring, where a value is reported as less than the reporting limit, then ConcLow = NA, ConcHigh = reporting limit, ConcAve = 0.5 * reporting limit, and Uncen = 1. In the typical case where none of the data are censored (that is, no values are reported as `less-than' values) the ConcLow = ConcHigh = ConcAve all of which are equal to the reported value and Uncen=1. For the most common type of censoring, where a value is reported as less than the reporting limit, then ConcLow = NA, ConcHigh = reporting limit, ConcAve = 0.5 * reporting limit, and Uncen = 0.
As an example to understand how the dataRetrieval package handles a more complex censoring problem, let us say that in 2004 and earlier, we computed total phosphorus (tp) as the sum of dissolved phosphorus (dp) and particulate phosphorus (pp). From 2005 and onward, we have direct measurements of total phosphorus (tp). A small subset of this fictional data looks like Table \ref{tab:exampleComplexQW}. As an example to understand how the dataRetrieval package handles a more complex censoring problem, let us say that in 2004 and earlier, we computed total phosphorus (tp) as the sum of dissolved phosphorus (dp) and particulate phosphorus (pp). From 2005 and onward, we have direct measurements of total phosphorus (tp). A small subset of this fictional data looks like Table \ref{tab:exampleComplexQW}.
% latex table generated in R 3.0.2 by xtable 1.7-1 package % latex table generated in R 3.0.2 by xtable 1.7-1 package
% Tue Feb 18 17:39:03 2014 % Fri Apr 11 13:46:12 2014
\begin{table}[ht] \begin{table}[ht]
\centering \centering
\begin{tabular}{rllrlrlr} \begin{tabular}{rllrlrlr}
...@@ -890,7 +881,7 @@ The dataRetrieval package will \texttt{"}add up\texttt{"} all the values in a gi ...@@ -890,7 +881,7 @@ The dataRetrieval package will \texttt{"}add up\texttt{"} all the values in a gi
For example, we might know the value for dp on 5/30/2005, but we don't want to put it in the table because under the rules of this data set, we are not supposed to add it in to the values in 2005. For example, we might know the value for dp on 5/30/2005, but we don't want to put it in the table because under the rules of this data set, we are not supposed to add it in to the values in 2005.
For every sample, the EGRET package requires a pair of numbers to define an interval in which the true value lies (ConcLow and ConcHigh). In a simple non-censored case (the reported value is above the detection limit), ConcLow equals ConcHigh and the interval collapses down to a single point.In a simple censored case, the value might be reported as \verb@<@0.2, then ConcLow=NA and ConcHigh=0.2. We use NA instead of 0 as a way to elegantly handle future logarithm calculations. For every sample, the EGRET package requires a pair of numbers to define an interval in which the true value lies (ConcLow and ConcHigh). In a simple non-censored case (the reported value is above the detection limit), ConcLow equals ConcHigh and the interval collapses down to a single point. In a simple censored case, the value might be reported as \verb@<@0.2, then ConcLow=NA and ConcHigh=0.2. We use NA instead of 0 as a way to elegantly handle future logarithm calculations.
For the more complex example case, let us say dp is reported as \verb@<@0.01 and pp is reported as 0.3. We know that the total must be at least 0.3 and could be as much as 0.31. Therefore, ConcLow=0.3 and ConcHigh=0.31. Another case would be if dp is reported as \verb@<@0.005 and pp is reported \verb@<@0.2. We know in this case that the true value could be as low as zero, but could be as high as 0.205. Therefore, in this case, ConcLow=NA and ConcHigh=0.205. The Sample dataframe for the example data would be: For the more complex example case, let us say dp is reported as \verb@<@0.01 and pp is reported as 0.3. We know that the total must be at least 0.3 and could be as much as 0.31. Therefore, ConcLow=0.3 and ConcHigh=0.31. Another case would be if dp is reported as \verb@<@0.005 and pp is reported \verb@<@0.2. We know in this case that the true value could be as low as zero, but could be as high as 0.205. Therefore, in this case, ConcLow=NA and ConcHigh=0.205. The Sample dataframe for the example data would be:
...@@ -1344,7 +1335,7 @@ There are a few steps that are required in order to create a table in a Microsof ...@@ -1344,7 +1335,7 @@ There are a few steps that are required in order to create a table in a Microsof
5 Suspended sediment discharge 1980-10-01 5 Suspended sediment discharge 1980-10-01
End Count Units End Count Units
1 2012-05-09 529 deg C 1 2012-05-09 529 deg C
2 2014-02-17 24155 ft3/s 2 2014-04-10 24207 ft3/s
3 2012-05-09 527 uS/cm @25C 3 2012-05-09 527 uS/cm @25C
4 1991-09-30 3651 mg/l 4 1991-09-30 3651 mg/l
5 1991-09-30 3652 tons/day 5 1991-09-30 3652 tons/day
......
...@@ -11,7 +11,7 @@ ...@@ -11,7 +11,7 @@
\contentsline {subsection}{\numberline {2.6}Water Quality Values}{11}{subsection.2.6} \contentsline {subsection}{\numberline {2.6}Water Quality Values}{11}{subsection.2.6}
\contentsline {subsection}{\numberline {2.7}STORET Water Quality Retrievals}{13}{subsection.2.7} \contentsline {subsection}{\numberline {2.7}STORET Water Quality Retrievals}{13}{subsection.2.7}
\contentsline {subsection}{\numberline {2.8}URL Construction}{13}{subsection.2.8} \contentsline {subsection}{\numberline {2.8}URL Construction}{13}{subsection.2.8}
\contentsline {section}{\numberline {3}Data Retrievals Structured For Use In The EGRET Package}{14}{section.3} \contentsline {section}{\numberline {3}Data Retrievals Structured For Use In The EGRET Package}{13}{section.3}
\contentsline {subsection}{\numberline {3.1}INFO Data}{14}{subsection.3.1} \contentsline {subsection}{\numberline {3.1}INFO Data}{14}{subsection.3.1}
\contentsline {subsection}{\numberline {3.2}Daily Data}{14}{subsection.3.2} \contentsline {subsection}{\numberline {3.2}Daily Data}{14}{subsection.3.2}
\contentsline {subsection}{\numberline {3.3}Sample Data}{15}{subsection.3.3} \contentsline {subsection}{\numberline {3.3}Sample Data}{15}{subsection.3.3}
......
No preview for this file type
No preview for this file type
No preview for this file type
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment