Skip to content
Snippets Groups Projects
Commit f2576a29 authored by Laura A DeCicco's avatar Laura A DeCicco
Browse files

Made some updates to vignette.

parent b6177a36
No related branches found
No related tags found
No related merge requests found
No preview for this file type
\Sconcordance{concordance:dataRetrieval.tex:dataRetrieval.Rnw:% \Sconcordance{concordance:dataRetrieval.tex:dataRetrieval.Rnw:%
1 49 1 50 0 1 6 11 1 1 5 46 1 113 0 7 1 42 0 21 1 10 % 1 49 1 50 0 1 6 11 1 1 5 42 1 4 0 27 1 10 0 16 1 9 0 %
0 16 1 9 0 21 1 5 0 6 1 8 0 14 1 42 0 17 1 4 0 15 1 6 % 21 1 5 0 6 1 8 0 24 1 4 0 15 1 6 0 16 1 10 0 5 1 8 0 %
0 16 1 10 0 5 1 8 0 20 1 4 0 18 1 3 0 21 1 10 0 20 1 % 20 1 4 0 18 1 3 0 21 1 10 0 20 1 4 0 4 1 18 0 29 1 8 %
4 0 4 1 18 0 29 1 8 0 10 1 10 0 14 1 21 0 19 1 5 0 19 % 0 10 1 10 0 14 1 21 0 19 1 5 0 19 1 5 0 17 1 8 0 14 1 %
1 5 0 17 1 8 0 14 1 15 0 16 1 5 0 9 1 5 0 62 1 6 0 11 % 15 0 16 1 5 0 9 1 5 0 62 1 6 0 11 1 1 4 36 1 5 0 24 1 %
1 1 4 36 1 5 0 24 1 5 0 20 1 38 0 13 1 10 0 79 1 5 0 % 5 0 20 1 38 0 13 1 10 0 101 1 5 0 5 1 14 0 5 1 5 0 11 %
5 1 13 0 5 1 5 0 11 1 5 0 7 1 5 0 16 1 51 0 15 1 49 0 % 1 5 0 7 1 5 0 16 1 51 0 15 1 49 0 7 1 32 0 24 1 19 0 %
7 1 32 0 24 1 19 0 8 1 5 0 56 1} 8 1 5 0 56 1}
...@@ -81,53 +81,48 @@ knit_hooks$set(crop = hook_pdfcrop) ...@@ -81,53 +81,48 @@ knit_hooks$set(crop = hook_pdfcrop)
The dataRetrieval package was created to simplify the process of getting hydrologic data in the R enviornment. It has been specifically designed to work seamlessly with the EGRET R package: Exploration and Graphics for RivEr Trends (EGRET). See: \url{https://github.com/USGS-R/EGRET/wiki} for information on EGRET. EGRET is designed to provide analysis of water quality data sets using the WRTDS method of data analysis (WRTDS is Weighted Regressions on Time, Discharge and Season) as well as analysis of discharge trends using robust time-series smoothing techniques. Both of these capabilities provide both tabular and graphical analyses of long-term data sets. The dataRetrieval package was created to simplify the process of getting hydrologic data in the R enviornment. It has been specifically designed to work seamlessly with the EGRET R package: Exploration and Graphics for RivEr Trends (EGRET). See: \url{https://github.com/USGS-R/EGRET/wiki} for information on EGRET. EGRET is designed to provide analysis of water quality data sets using the WRTDS method of data analysis (WRTDS is Weighted Regressions on Time, Discharge and Season) as well as analysis of discharge trends using robust time-series smoothing techniques. Both of these capabilities provide both tabular and graphical analyses of long-term data sets.
The dataRetrieval package is designed to retrieve many of the major data types of USGS hydrologic data that are available on the web, but also allows users to make use of other data that they supply from spreadsheets. Section 2 provides examples of how one can obtain raw data from USGS sources on the web and ingest them into data frames within the R environment. The functionality described in section 2 is for general use and is not tailored for the specific uses of the EGRET package. The functionality described in section 3 is tailored specifically to obtaining input from the web and structuring them specifically for use in the EGRET package. The functionality described in section 4 is for converting hydrologic data from user-supplied spreadsheets and structuring them specifically for use in the EGRET package. The dataRetrieval package is designed to retrieve many of the major data types of United States Geological Survey (USGS) hydrologic data that are available on the web, but also allows users to make use of other data that they supply from spreadsheets. Section 2 provides examples of how one can obtain raw data from USGS sources on the web and ingest them into data frames within the R environment. The functionality described in section 2 is for general use and is not tailored for the specific uses of the EGRET package. The functionality described in section 3 is tailored specifically to obtaining input from the web and structuring them for use in the EGRET package. The functionality described in section 4 is for converting hydrologic data from user-supplied spreadsheets and structuring them specifically for use in the EGRET package.
For information on getting started in R and installing the package, see Appendix (\ref{sec:appendix1}): Getting Started. For information on getting started in R and installing the package, see Appendix (\ref{sec:appendix1}): Getting Started.
Quick workflow for major dataRetrieval functions:
%------------------------------------------------------------ <<workflow, echo=TRUE,eval=FALSE>>=
\section{General USGS Web Retrievals}
%------------------------------------------------------------
In this section, we will run through 5 examples, documenting how to get raw data from the web. This includes site information (\ref{sec:usgsSite}), measured parameter information (\ref{sec:usgsParams}), historical daily values(\ref{sec:usgsDaily}), real-time (unit) values (\ref{sec:usgsRT}), and water quality data (\ref{sec:usgsWQP}) or (\ref{sec:usgsSTORET}). We will use the Choptank River near Greensboro, MD as an example. The site-ID for this gage station is 01491000. Daily discharge measurements are available as far back as 1948. Additionally, forms of nitrate have been measured dating back to 1964. The functions/examples in this section are for raw data retrieval. This may or may not be the easiest data to work with. In the next section, we will use functions that retrieve and process the data in a dataframe that may prove more friendly for R analysis.
\newpage
Quick workflow example:
<<workflow, echo=TRUE,eval=TRUE>>=
library(dataRetrieval) library(dataRetrieval)
# Site ID for Choptank River near Greensboro, MD # Site ID for Choptank River near Greensboro, MD
siteNumber <- "01491000" siteNumber <- "01491000"
ChoptankInfo <- getSiteFileData(siteNumber) ChoptankInfo <- getSiteFileData(siteNumber)
parameterCd <- "00060" parameterCd <- "00060"
#Raw data:
rawDailyData <- retrieveNWISData(siteNumber,parameterCd, rawDailyData <- retrieveNWISData(siteNumber,parameterCd,
"1980-01-01","2010-01-01") "1980-01-01","2010-01-01")
head(rawDailyData) # Data compiled for EGRET analysis
Daily <- getDVData(siteNumber,parameterCd, Daily <- getDVData(siteNumber,parameterCd,
"1980-01-01","2010-01-01") "1980-01-01","2010-01-01")
head(Daily)
ChoptankInfo <- getSiteFileData(siteNumber)
colnames(ChoptankInfo)
ChoptankAvail <- getDataAvailability(siteNumber) # Sample data Nitrate:
head(ChoptankAvail) parameterCd <- "00618"
Sample <- getSampleData(siteNumber,parameterCd,
"1980-01-01","2010-01-01")
# Metadata on site and nitrate:
INFO <- getMetaData(siteNumber,parameterCd)
parameterCd <- "00618" # Nitrate # Merge discharge and nitrate data to one dataframe:
parameterINFO <- getParameterInfo(parameterCd) Sample <- mergeReport()
colnames(parameterINFO)
@ @
\newpage
<<workflow2, echo=TRUE,eval=TRUE>>= %------------------------------------------------------------
ChoptankAvail <- getDataAvailability(siteNumber) \section{General USGS Web Retrievals}
head(ChoptankAvail) %------------------------------------------------------------
@ In this section, we will run through 5 examples, which document how to get raw data from the web. This includes site information (\ref{sec:usgsSite}), measured parameter information (\ref{sec:usgsParams}), historical daily values(\ref{sec:usgsDaily}), real-time (unit) values (\ref{sec:usgsRT}), and water quality data (\ref{sec:usgsWQP}) or (\ref{sec:usgsSTORET}). We will use the Choptank River near Greensboro, MD as an example. The site-ID for this gage station is 01491000. Daily discharge measurements are available as far back as 1948. Additionally, forms of nitrate have been measured dating back to 1964. The functions/examples in this section are for raw data retrieval. In the next section, we will use functions that retrieve and process the data in a dataframe that may prove more friendly for R analysis, and specifically tailored to EGRET analysis.
%------------------------------------------------------------ %------------------------------------------------------------
\subsection{Introduction} \subsection{Introduction}
%------------------------------------------------------------ %------------------------------------------------------------
The United States Geological Survey organizes their hydrological data in standard structure. Streamgages are located throughout the United States, and each streamgage has a unique ID. Often (but not always), these ID's are 8 digits. The first step to finding data is discoving this 8-digit ID. There are many ways to do this, one is the National Water Information System: Mapper \url{http://maps.waterdata.usgs.gov/mapper/index.html}. The USGS organizes their hydrological data in a standard structure. Streamgages are located throughout the United States, and each streamgage has a unique ID. Often (but not always), these ID's are 8 digits. The first step to finding data is discoving this 8-digit ID. There are many ways to do this, one is the National Water Information System: Mapper \url{http://maps.waterdata.usgs.gov/mapper/index.html}.
Once the site-ID is known, the next required input for USGS data retrievals is the 'parameter code'. This is a 5-digit code that specifies what measured paramater is being requested. A complete list of possible USGS parameter codes can be found at \url{http://go.usa.gov/bVDz}. Once the site-ID is known, the next required input for USGS data retrievals is the 'parameter code'. This is a 5-digit code that specifies what measured paramater is being requested. A complete list of possible USGS parameter codes can be found at \url{http://go.usa.gov/bVDz}.
...@@ -194,22 +189,15 @@ Site information is obtained from \url{http://waterservices.usgs.gov/rest/Site-T ...@@ -194,22 +189,15 @@ Site information is obtained from \url{http://waterservices.usgs.gov/rest/Site-T
\subsubsection{getDataAvailability} \subsubsection{getDataAvailability}
\label{sec:usgsDataAvailability} \label{sec:usgsDataAvailability}
%------------------------------------------------------------ %------------------------------------------------------------
To find out the available data at a particular USGS site, including measured parameters, period of record, and number of samples (count), use the getDataAvailability function: To find out the available data at a particular USGS site, including measured parameters, period of record, and number of samples (count), use the getDataAvailability function. It is also possible to only request parameter information for a subset of variables. In the following example, we retrieve just the daily mean parameter information from the Choptank data availability dataframe (excluding all unit value and water quality values).
<<getSiteAvailability, echo=TRUE>>=
# Continuing from the previous example:
ChoptankAvailableData <- getDataAvailability(siteNumber)
head(ChoptankAvailableData)
@
There is an additional argument to the getDataAvailability called longNames, which defaults to FALSE. Setting longNames to TRUE will cause the function to make a web service call for each parameter and return expanded information on that parameter. Currently, this is a very slow process because each parameter code makes a unique web service call. If the site does not have many measured parameters, setting longNames to TRUE is reasonable.
It is also possible to only request parameter information for a subset of variables. In the following example, we retrieve just the daily mean parameter information from the Choptank data availability dataframe (excluding all unit value and water quality values).
<<getSiteExtended, echo=TRUE>>= <<getSiteExtended, echo=TRUE>>=
# Continuing from the previous example: # Continuing from the previous example:
# This pulls out just the daily data: # This pulls out just the daily data:
ChoptankAvailableData <- getDataAvailability(siteNumber)
ChoptankDailyData <- subset(ChoptankAvailableData, ChoptankDailyData <- subset(ChoptankAvailableData,
"dv" == service) "dv" == service)
...@@ -712,7 +700,29 @@ Data Type & EGRET Functions & Other Retrieval Functions \\ ...@@ -712,7 +700,29 @@ Data Type & EGRET Functions & Other Retrieval Functions \\
\end{minipage} \end{minipage}
\end{table} \end{table}
\begin{table}[!ht]
\begin{minipage}{\linewidth}
\begin{center}
\caption{dataRetrieval miscellaneous functions}
\begin{tabular}{ll}
\hline
Function Name & Description \\
\hline
compressData & Converts value/qualifier into ConcLow, ConcHigh, Uncen\\
getRDB1Data & Retrieves and converts RDB data to dataframe\\
getWaterML1Data & Retrieves and converts WaterML1 data to dataframe\\
getWaterML2Data & Retrieves and converts WaterML2 data to dataframe\\
mergeReport & Merges flow data from the daily record into the sample record\\
populateDateColumns & Generates Julian, Month, Day, DecYear, and MonthSeq columns\\
removeDuplicates & Removes duplicated rows\\
renameColumns & Renames columns from raw data retrievals\\
\hline
\end{tabular}
\end{center}
\end{minipage}
\end{table}
\clearpage
\appendix \appendix
%------------------------------------------------------------ %------------------------------------------------------------
......
This diff is collapsed.
No preview for this file type
No preview for this file type
This diff is collapsed.
\select@language {american} \select@language {american}
\contentsline {section}{\numberline {1}Introduction to dataRetrieval}{2}{section.1} \contentsline {section}{\numberline {1}Introduction to dataRetrieval}{2}{section.1}
\contentsline {section}{\numberline {2}General USGS Web Retrievals}{3}{section.2} \contentsline {section}{\numberline {2}General USGS Web Retrievals}{3}{section.2}
\contentsline {subsection}{\numberline {2.1}Introduction}{8}{subsection.2.1} \contentsline {subsection}{\numberline {2.1}Introduction}{4}{subsection.2.1}
\contentsline {subsection}{\numberline {2.2}Site Information}{9}{subsection.2.2} \contentsline {subsection}{\numberline {2.2}Site Information}{5}{subsection.2.2}
\contentsline {subsubsection}{\numberline {2.2.1}getSiteFileData}{9}{subsubsection.2.2.1} \contentsline {subsubsection}{\numberline {2.2.1}getSiteFileData}{5}{subsubsection.2.2.1}
\contentsline {subsubsection}{\numberline {2.2.2}getDataAvailability}{9}{subsubsection.2.2.2} \contentsline {subsubsection}{\numberline {2.2.2}getDataAvailability}{5}{subsubsection.2.2.2}
\contentsline {subsection}{\numberline {2.3}Parameter Information}{12}{subsection.2.3} \contentsline {subsection}{\numberline {2.3}Parameter Information}{6}{subsection.2.3}
\contentsline {subsection}{\numberline {2.4}Daily Values}{12}{subsection.2.4} \contentsline {subsection}{\numberline {2.4}Daily Values}{6}{subsection.2.4}
\contentsline {subsection}{\numberline {2.5}Unit Values}{15}{subsection.2.5} \contentsline {subsection}{\numberline {2.5}Unit Values}{9}{subsection.2.5}
\contentsline {subsection}{\numberline {2.6}Water Quality Values}{16}{subsection.2.6} \contentsline {subsection}{\numberline {2.6}Water Quality Values}{10}{subsection.2.6}
\contentsline {subsection}{\numberline {2.7}STORET Water Quality Retrievals}{18}{subsection.2.7} \contentsline {subsection}{\numberline {2.7}STORET Water Quality Retrievals}{12}{subsection.2.7}
\contentsline {subsection}{\numberline {2.8}URL Construction}{18}{subsection.2.8} \contentsline {subsection}{\numberline {2.8}URL Construction}{12}{subsection.2.8}
\contentsline {section}{\numberline {3}Data Retrievals Structured For Use In The EGRET Package}{19}{section.3} \contentsline {section}{\numberline {3}Data Retrievals Structured For Use In The EGRET Package}{13}{section.3}
\contentsline {subsection}{\numberline {3.1}INFO Data}{19}{subsection.3.1} \contentsline {subsection}{\numberline {3.1}INFO Data}{13}{subsection.3.1}
\contentsline {subsection}{\numberline {3.2}Daily Data}{19}{subsection.3.2} \contentsline {subsection}{\numberline {3.2}Daily Data}{13}{subsection.3.2}
\contentsline {subsection}{\numberline {3.3}Sample Data}{20}{subsection.3.3} \contentsline {subsection}{\numberline {3.3}Sample Data}{14}{subsection.3.3}
\contentsline {subsection}{\numberline {3.4}Censored Values: Summation Explanation}{22}{subsection.3.4} \contentsline {subsection}{\numberline {3.4}Censored Values: Summation Explanation}{16}{subsection.3.4}
\contentsline {subsection}{\numberline {3.5}User-Generated Data Files}{23}{subsection.3.5} \contentsline {subsection}{\numberline {3.5}User-Generated Data Files}{17}{subsection.3.5}
\contentsline {subsubsection}{\numberline {3.5.1}getDailyDataFromFile}{23}{subsubsection.3.5.1} \contentsline {subsubsection}{\numberline {3.5.1}getDailyDataFromFile}{17}{subsubsection.3.5.1}
\contentsline {subsubsection}{\numberline {3.5.2}getSampleDataFromFile}{24}{subsubsection.3.5.2} \contentsline {subsubsection}{\numberline {3.5.2}getSampleDataFromFile}{18}{subsubsection.3.5.2}
\contentsline {subsection}{\numberline {3.6}Merge Report}{25}{subsection.3.6} \contentsline {subsection}{\numberline {3.6}Merge Report}{19}{subsection.3.6}
\contentsline {subsection}{\numberline {3.7}EGRET Plots}{26}{subsection.3.7} \contentsline {subsection}{\numberline {3.7}EGRET Plots}{20}{subsection.3.7}
\contentsline {section}{\numberline {4}Summary}{28}{section.4} \contentsline {section}{\numberline {4}Summary}{22}{section.4}
\contentsline {section}{\numberline {A}Getting Started in R}{28}{appendix.A} \contentsline {section}{\numberline {A}Getting Started in R}{24}{appendix.A}
\contentsline {subsection}{\numberline {A.1}New to R?}{29}{subsection.A.1} \contentsline {subsection}{\numberline {A.1}New to R?}{24}{subsection.A.1}
\contentsline {subsection}{\numberline {A.2}R User: Installing dataRetrieval}{29}{subsection.A.2} \contentsline {subsection}{\numberline {A.2}R User: Installing dataRetrieval}{24}{subsection.A.2}
\contentsline {section}{\numberline {B}Columns Names}{30}{appendix.B} \contentsline {section}{\numberline {B}Columns Names}{25}{appendix.B}
\contentsline {subsection}{\numberline {B.1}INFO dataframe}{30}{subsection.B.1} \contentsline {subsection}{\numberline {B.1}INFO dataframe}{25}{subsection.B.1}
\contentsline {subsection}{\numberline {B.2}Water Quality Portal}{32}{subsection.B.2} \contentsline {subsection}{\numberline {B.2}Water Quality Portal}{27}{subsection.B.2}
\contentsline {section}{\numberline {C}Creating tables in Microsoft from R}{35}{appendix.C} \contentsline {section}{\numberline {C}Creating tables in Microsoft from R}{30}{appendix.C}
No preview for this file type
No preview for this file type
No preview for this file type
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment