Skip to content
Snippets Groups Projects
Commit f205f86f authored by Laura A DeCicco's avatar Laura A DeCicco
Browse files

Updating user guide.

parent 13be9025
No related branches found
No related tags found
No related merge requests found
\Sconcordance{concordance:dataRetrieval.tex:dataRetrieval.Rnw:%
1 75 1 1 3 5 0 1 2 2 1 1 4 6 0 1 2 2 1 1 4 6 0 1 2 8 1 1 2 1 0 1 1 3 0 %
1 2 1 1 1 2 4 0 1 2 5 1 1 2 8 0 1 1 18 0 2 1 18 0 2 1 18 0 1 1 5 0 1 2 %
12 0 1 2 45 1}
1 78 1 1 3 5 0 1 2 2 1 1 4 6 0 1 2 2 1 1 4 6 0 1 2 8 1 1 2 1 0 1 1 3 0 %
1 2 1 1 1 2 4 0 1 2 15 1 1 4 1 10 16 0 1 2 5 1 1 10 15 0 1 2 10 1 1 3 2 %
0 3 1 1 2 3 0 1 2 1 1 1 2 10 0 2 2 9 0 1 2 3 1 1 3 2 0 4 1 1 2 3 0 1 2 %
4 1 1 3 2 0 3 1 1 2 3 0 2 2 10 0 2 2 10 0 1 2 5 1 1 3 2 0 1 1 3 0 1 2 1 %
1 1 2 21 0 1 2 7 1 1 2 8 0 1 1 18 0 2 1 18 0 2 1 18 0 1 1 5 0 1 2 12 0 %
1 2 45 1}
......@@ -57,27 +57,28 @@
%------------------------------------------------------------
\section{Introduction to dataRetrieval}
%------------------------------------------------------------
The dataRetrieval package was created to simplify the process of getting hydrologic data in the R enviornment. It has been specifically designed to work seamlessly with the EGRET package: Exploration and Graphics for RivEr Trends (EGRET).
The dataRetrieval package was created to simplify the process of getting hydrologic data in the R enviornment. It has been specifically designed to work seamlessly with the EGRET package: Exploration and Graphics for RivEr Trends (EGRET). See: \url{https://github.com/USGS-R/EGRET/wiki} for information on EGRET.
%------------------------------------------------------------
\subsection{What is dataRetrieval?}
%------------------------------------------------------------
The dataRetrieval package was created to simplify hydrologic data retrieval. The options are web or user-produced files.
There is a plethora of hydrological data available on the web. This package is designed specifically to load United States Geological Survey (USGS) hydrologic data to the R enviornment. This includes daily values, real-time (unit values), site information, and water quality sample data.
Here is some text. Here is some more.
\newpage
%------------------------------------------------------------
\section{Getting Started}
%------------------------------------------------------------
This section describes the options for downloading and installing the dataRetrieval package.
%------------------------------------------------------------
\subsection{Installing dataRetrieval from downloaded binary:}
\subsection{New to R?}
%------------------------------------------------------------
If you are new to R, you will need to first install the latest version of R, which can be found here: \url{http://www.r-project.org/}.
There are many options for running and editing R code, one nice enviornment to learn R is RStudio. RStudio can be downloaded here: \url{http://rstudio.org/}. Once R and RStudio are installed, the dataRetrieval package needs to be installed as described in the next section.
%------------------------------------------------------------
\subsection{R User: Installing dataRetrieval from downloaded binary}
%------------------------------------------------------------
The dataRetrieval pacakage is available for download at \url{https://github.com/USGS-CIDA/WRTDS/downloads}. If the package's tar.gz file is saved in R's working directory, then the following command will fully install the package:
The latest dataRetrieval package build is available for download at \url{https://github.com/USGS-R/dataRetrieval/blob/master/dataRetrieval_1.2.1.tar.gz}. If the package's tar.gz file is saved in R's working directory, then the following command will fully install the package:
<<installFromWD,eval = FALSE>>=
install.packages("dataRetrieval_1.2.0.tar.gz",
install.packages("dataRetrieval_1.2.1.tar.gz",
repos=NULL, type="source")
@
......@@ -85,7 +86,7 @@ If the downloaded file is stored in an alternative location, include the path in
<<installFromFile,eval = FALSE>>=
install.packages(
"C:/RPackages/Statistics/dataRetrieval_1.2.0.tar.gz",
"C:/RPackages/Statistics/dataRetrieval_1.2.1.tar.gz",
repos=NULL, type="source")
@
......@@ -93,68 +94,185 @@ A Mac example looks like this:
<<maxExample,eval = FALSE>>=
install.packages(
"/Users/userA/RPackages/Statistic/dataRetrieval_1.2.0.tar.gz",
"/Users/userA/RPackages/Statistic/dataRetrieval_1.2.1.tar.gz",
repos=NULL, type="source")
@
It is a good idea to re-start the R enviornment after installing the package, especially if installing an updated version. Some users have found it necessary to delete the previous version's package folder before installing newer version of dataRetrieval. If you are experiencing issues after updating a package, trying deleting the package folder - the default location for Windows is something like this: C:/Users/userA/Documents/R/win-library/2.15/dataRetrieval the default for a Mac: /Users/userA/Library/R/2.15/library/dataRetrieval Then, re-install the package using the directions above. Moving to CRAN should solve this problem.
It is a good idea to re-start the R enviornment after installing the package, especially if installing an updated version (that is, restart RStudio). Some users have found it necessary to delete the previous version's package folder before installing newer version of dataRetrieval. If you are experiencing issues after updating a package, trying deleting the package folder - the default location for Windows is something like this: C:/Users/userA/Documents/R/win-library/2.15/dataRetrieval, and the default for a Mac: /Users/userA/Library/R/2.15/library/dataRetrieval. Then, re-install the package using the directions above. Moving to CRAN should solve this problem.
%------------------------------------------------------------
\subsection{Installing dataRetrieval from gitHub:}
\subsection{R Developers: Installing dataRetrieval from gitHub}
%------------------------------------------------------------
Alternatively, R-developers can install the latest version of dataRetrieval directly from gitHub using the devtools package. devtools is available on CRAN. Simpley type the following commands into R to install the latest version of dataRetrieval available on gitHub. Rtools (for Windows) and latex tools are required.
<<gitInstal,eval = FALSE>>=
library(devtools)
install_github("dataRetrieval", "USGS-CIDA")
install_github("dataRetrieval", "USGS-R")
@
To then open the library, simpley type:
To then open the library, simply type:
<<openLibrary>>=
library(dataRetrieval)
@
\newpage
%------------------------------------------------------------
\subsection{A Simple Web Retrieval Example}
\section{Raw Data: USGS Web Retrieval Examples}
%------------------------------------------------------------
In this example, we use 3 dataRetrieval functions to get daily streamflow data and inorganic nitrogen sample results, and site information for a USGS gaging station with the ID 06934500. The station is Missouri River at Hermann, MO (which is discovered in the INFO dataset).
In this section, we will run through 4 examples, documenting how to get raw data from the web. This includes historical daily values, real-time current values, site information, and water quality data.
%------------------------------------------------------------
\subsection{USGS Web Retrieval Introduction}
%------------------------------------------------------------
The United States Geological Survey organizes their hydrological data in fairly standard structure. Gage stations are located throughout the United States, each station has a unique ID. Often (but not always), these ID's are 8 digits. The first step to finding data is discoving this 8-digit ID. One potential tool for discovering data is Environmental Data Discovery and Transformation (EnDDaT): \url{http://cida.usgs.gov/enddat/}. Follow the example in the User's Guide to learn how to discover USGS stations and available data from any location in the United States. Essentially, you can create a Project Location on the map, set a bounding box (in miles), then search for USGS Time Series and USGS Water Quality Data. Locations, ID's, available data, and available time periods will load on the map and appropriate tabs.
<<firstExample>>=
Daily <- getDVData("06934500","00060","1970-10-01","2011-09-30")
head(Daily)
Sample <-getSampleData("06934500","00631","1970-10-01","2011-09-30")
head(Sample)
INFO <-getMetaData("06934500","00631", interactive=FALSE)
colnames(INFO)
INFO$station.nm
Once the site-ID is known, the next required input for USGS data retrievals is the 'parameter code'. This is a 5-digit code that specifies what measured paramater is being requested. A complete list of possible USGS parameter codes can be found here:
Sample <- mergeReport()
\url{http://nwis.waterdata.usgs.gov/usa/nwis/pmcodes?radio_pm_search=param_group&pm_group=All+--+include+all+parameter+groups&pm_search=&casrn_search=&srsname_search=&format=html_table&show=parameter_group_nm&show=parameter_nm&show=casrn&show=srsname&show=parameter_units}
Not every station will measure all parameters. The following is a list of commonly measured parameters:
<<openLibrary, echo=FALSE>>=
library(xtable)
@
<<label=tableParameterCodes, echo=FALSE,results=tex>>=
pCode <- c('00060', '00065', '00010','00045','00400')
shortName <- c("Discharge [cfs]","Gage height [ft]","Temperature [C]", "Precipitation [in]", "pH")
In the next section, we will go into detail the available functions in dataRetrieval, their required input and generated output.
data.df <- data.frame(pCode, shortName, stringsAsFactors=FALSE)
data.table <- xtable(data.df,
caption="Commonly found USGS Parameter Codes")
print(data.table,
caption.placement="top")
@
For real-time data, the parameter code and site ID will suffice. The USGS stores historical data as daily values however. The statistical process used to store the daily data is the final requirement for daily value retrievals. A 5-digit 'stat code' specifies the requested processing. A complete list of possible USGS stat codes can be found here:
\url{http://nwis.waterdata.usgs.gov/nwis/help/?read_file=stat&format=table}
The most common stat codes are:
<<label=tableStatCodes, echo=FALSE,results=tex>>=
StatCode <- c('00001', '00002', '00003','00008')
shortName <- c("Maximum","Minimum","Mean", "Median")
data.df <- data.frame(StatCode, shortName, stringsAsFactors=FALSE)
data.table <- xtable(data.df,
caption="Commonly found USGS Stat Codes")
print(data.table,
caption.placement="top")
@
We will use the Choptank River near Greensboro, MD as an example. The site-ID for this gage station is 01491000. Daily discharge measurements are available as far back as 1948. Additionally, forms of nitrate and nitrogen have been measured dating back to 1964.
\newpage
%------------------------------------------------------------
\section{Function Details}
\subsection{USGS Daily Value Retrievals}
%------------------------------------------------------------
To obtain historic daily records of USGS data, use the retrieveNWISData function. The arguments for this function are siteNumber, parameterCd, startDate, endDate, statCd, and a logical (true/false) interactive. There are 2 default argument: statCd defaults to "00003" and interactive defaults to TRUE. If you want to use the default values, you do not need to list them in the function call. Setting the 'interactive' option to true will walk you through the function. It might make more sense to run large batch collections with the interactive option set to FALSE.
The dates (start and end) need to be in the format "YYYY-MM-DD". Setting the start date to "" will indicate to the program to ask for the earliest date, setting the end date to "" will ask for the latest available date.
<<label=getNWISDaily, echo=TRUE>>=
# Using defaults:
siteNumber <- "01491000" # Site ID for Choptank River near Greensboro, MD
parameterCd <- "00060" # Discharge in cubic feet per second
startDate <- ""
endDate <- ""
discharge <- retrieveNWISData(siteNumber, parameterCd, startDate, endDate)
@
A dataframe is returned that looks like the following:
<<dischargeData, echo=FALSE>>=
head(discharge)
@
The structure of the dataframe is:
<<dischargeStruture, echo=FALSE>>=
str(discharge)
@
Note that dateTime is imported as a Date, value is a number, and code is a string. USGS codes are often "A" (approved for publication) or "P" (provisional data subject to revision). A more complete list of qualification codes can be found here:
\url{http://waterdata.usgs.gov/usa/nwis/help?codes_help}
An example that doesn't use the defaults would be a request for maximum daily temperature in early 2012:
<<label=getNWIStemperature, echo=TRUE>>=
# Using defaults:
siteNumber <- "01491000" # Site ID for Choptank River near Greensboro, MD
parameterCd <- "00010,00060" # Temperature and discharge
statCd <- "00001,00003" #mean and maximum
startDate <- "2012-01-01"
endDate <- "2012-06-30"
temperature <- retrieveNWISData(siteNumber, parameterCd, startDate, endDate, StatCd=statCd)
@
Daily data is pulled from \url{http://waterservices.usgs.gov/rest/DV-Test-Tool.html}.
%------------------------------------------------------------
\subsection{Daily Value Retrievals}
\subsection{USGS Unit Value Retrievals}
%------------------------------------------------------------
We can also get real-time, instantaneous measurements using the retrieveUnitNWISData function:
<<label=getNWISUnit, echo=TRUE>>=
# Using defaults:
siteNumber <- "01491000" # Site ID for Choptank River near Greensboro, MD
parameterCd <- "00060" # Discharge in cubic feet per second
startDate <- as.character(Sys.Date())
endDate <- as.character(Sys.Date())
dischargeToday <- retrieveUnitNWISData(siteNumber, parameterCd, startDate, endDate)
@
Which produces the following dataframe:
<<dischargeData, echo=FALSE>>=
head(dischargeToday)
@
The structure of the dataframe is:
<<dischargeStruture, echo=FALSE>>=
str(dischargeToday)
@
Note that time now becomes important, so the dateTime is a POSIXct, and the time zone is included. Data is pulled from \url{http://waterservices.usgs.gov/rest/IV-Test-Tool.html}.
%------------------------------------------------------------
\subsection{Water Quality Retrievals}
\subsection{USGS Site Information Retrievals}
%------------------------------------------------------------
To obtain all of the available site information, use the getSiteFileData function:
<<label=getSite, echo=TRUE>>=
# Using defaults:
siteNumber <- "01491000" # Site ID for Choptank River near Greensboro, MD
ChopTankInfo <- getSiteFileData(siteNumber)
@
The available date for these for the USGS sites are:
<<siteColnames, echo=TRUE>>=
colnames(ChopTankInfo)
@
Site information is obtained from \url{http://waterservices.usgs.gov/rest/Site-Test-Tool.html}
%------------------------------------------------------------
\subsection{Site Information Retrievals}
\subsection{USGS Water Quality Retrievals}
%------------------------------------------------------------
Finally, we can use the dataRetrieval package to get water quality data that is available on the water quality data portal: \url{http://www.waterqualitydata.us/}. The function is getQWData.
%------------------------------------------------------------
\section{Polished Data: USGS Web Retrieval Examples}
%------------------------------------------------------------
In this example, we use 3 dataRetrieval functions to get daily streamflow data and inorganic nitrogen sample results, and site information for a USGS gaging station with the ID 06934500. The station is Missouri River at Hermann, MO (which is discovered in the INFO dataset). Rather than see the raw output from NWIS, we
<<firstExample>>=
Daily <- getDVData("06934500","00060","1970-10-01","2011-09-30")
head(Daily)
Sample <-getSampleData("06934500","00631","1970-10-01","2011-09-30")
head(Sample)
INFO <-getMetaData("06934500","00631", interactive=FALSE)
colnames(INFO)
INFO$station.nm
Sample <- mergeReport()
@
\newpage
......
This is pdfTeX, Version 3.1415926-2.3-1.40.12 (MiKTeX 2.9) (preloaded format=pdflatex 2012.1.6) 3 JAN 2013 15:51
This is pdfTeX, Version 3.1415926-2.3-1.40.12 (MiKTeX 2.9) (preloaded format=pdflatex 2012.1.6) 21 JAN 2013 16:09
entering extended mode
**dataRetrieval.tex
(D:\LADData\RCode\dataRetrievalGitorious\inst\doc\dataRetrieval.tex
(D:\LADData\RCode\dataRetrieval\inst\doc\dataRetrieval.tex
LaTeX2e <2011/06/27>
Babel <v3.8m> and hyphenation patterns for english, afrikaans, ancientgreek, ar
abic, armenian, assamese, basque, bengali, bokmal, bulgarian, catalan, coptic,
......@@ -384,7 +384,7 @@ LaTeX Font Info: Try loading font information for T1+aer on input line 100.
("C:\Program Files (x86)\MiKTeX 2.9\tex\latex\ae\t1aer.fd"
File: t1aer.fd 1997/11/16 Font definitions for T1/aer.
))))
(D:\LADData\RCode\dataRetrievalGitorious\inst\doc\dataRetrieval.aux)
(D:\LADData\RCode\dataRetrieval\inst\doc\dataRetrieval.aux)
LaTeX Font Info: Checking defaults for OML/cmm/m/it on input line 42.
LaTeX Font Info: ... okay on input line 42.
LaTeX Font Info: Checking defaults for T1/cmr/m/n on input line 42.
......@@ -421,8 +421,8 @@ LaTeX Info: Redefining \ref on input line 42.
LaTeX Info: Redefining \pageref on input line 42.
LaTeX Info: Redefining \nameref on input line 42.
(D:\LADData\RCode\dataRetrievalGitorious\inst\doc\dataRetrieval.out)
(D:\LADData\RCode\dataRetrievalGitorious\inst\doc\dataRetrieval.out)
(D:\LADData\RCode\dataRetrieval\inst\doc\dataRetrieval.out)
(D:\LADData\RCode\dataRetrieval\inst\doc\dataRetrieval.out)
\@outlinefile=\write4
(C:\Users\ldecicco\AppData\Roaming\MiKTeX\2.9\tex\context\base\supp-pdf.mkii
......@@ -438,11 +438,14 @@ LaTeX Info: Redefining \nameref on input line 42.
\MPnumerator=\count123
\makeMPintoPDFobject=\count124
\everyMPtoPDFconversion=\toks21
)
(D:\LADData\RCode\dataRetrievalGitorious\inst\doc\dataRetrieval-concordance.tex
) (D:\LADData\RCode\dataRetrievalGitorious\inst\doc\dataRetrieval.toc)
) (D:\LADData\RCode\dataRetrieval\inst\doc\dataRetrieval-concordance.tex)
(D:\LADData\RCode\dataRetrieval\inst\doc\dataRetrieval.toc)
\tf@toc=\write5
LaTeX Font Info: Try loading font information for T1+aett on input line 60.
("C:\Program Files (x86)\MiKTeX 2.9\tex\latex\ae\t1aett.fd"
File: t1aett.fd 1997/11/16 Font definitions for T1/aett.
)
Overfull \vbox (21.68121pt too high) has occurred while \output is active []
......@@ -453,14 +456,34 @@ Overfull \vbox (21.68121pt too high) has occurred while \output is active []
[2]
LaTeX Font Info: Try loading font information for T1+aett on input line 75.
("C:\Program Files (x86)\MiKTeX 2.9\tex\latex\ae\t1aett.fd"
File: t1aett.fd 1997/11/16 Font definitions for T1/aett.
)
Overfull \hbox (63.21521pt too wide) in paragraph at lines 107--108
\T1/aer/m/n/10.95 library/2.15/dataRetrieval, and the de-fault for a Mac: /User
s/userA/Library/R/2.15/library/dataRetrieval.
[]
Overfull \vbox (21.68121pt too high) has occurred while \output is active []
[3]
Overfull \hbox (22.21066pt too wide) in paragraph at lines 141--142
[][]$\T1/aett/m/n/10.95 http : / / nwis . waterdata . usgs . gov / usa / nwis /
pmcodes ? radio _ pm _ search = param _ group&pm _$
[]
Overfull \hbox (23.424pt too wide) in paragraph at lines 141--142
$\T1/aett/m/n/10.95 group = All + -[]-[] + include + all + parameter + groups&p
m _ search = &casrn _ search = &srsname _ search =$
[]
Overfull \hbox (68.32622pt too wide) in paragraph at lines 141--142
$\T1/aett/m/n/10.95 &format = html _ table&show = parameter _ group _ nm&show =
parameter _ nm&show = casrn&show = srsname&show =$
[]
Overfull \vbox (21.68121pt too high) has occurred while \output is active []
......@@ -469,6 +492,20 @@ Overfull \vbox (21.68121pt too high) has occurred while \output is active []
[5]
LaTeX Font Info: Try loading font information for TS1+aett on input line 221
.
(C:/PROGRA~1/R/R-215~1.2/share/texmf/tex/latex\ts1aett.fd
File: ts1aett.fd
)
LaTeX Font Info: Try loading font information for TS1+cmtt on input line 221
.
("C:\Program Files (x86)\MiKTeX 2.9\tex\latex\base\ts1cmtt.fd"
File: ts1cmtt.fd 1999/05/25 v2.5h Standard LaTeX font definitions
)
LaTeX Font Info: Font shape `TS1/aett/m/n' in size <10.95> not available
(Font) Font shape `TS1/cmtt/m/n' tried instead on input line 221.
Overfull \vbox (21.68121pt too high) has occurred while \output is active []
......@@ -477,40 +514,57 @@ Overfull \vbox (21.68121pt too high) has occurred while \output is active []
[7]
Package atveryend Info: Empty hook `BeforeClearDocument' on input line 261.
Overfull \vbox (21.68121pt too high) has occurred while \output is active []
[8]
Package atveryend Info: Empty hook `AfterLastShipout' on input line 261.
(D:\LADData\RCode\dataRetrievalGitorious\inst\doc\dataRetrieval.aux)
Package atveryend Info: Executing hook `AtVeryEndDocument' on input line 261.
Package atveryend Info: Executing hook `AtEndAfterFileList' on input line 261.
Overfull \vbox (21.68121pt too high) has occurred while \output is active []
[9]
Overfull \vbox (21.68121pt too high) has occurred while \output is active []
[10]
Overfull \vbox (21.68121pt too high) has occurred while \output is active []
[11]
Package atveryend Info: Empty hook `BeforeClearDocument' on input line 458.
Overfull \vbox (21.68121pt too high) has occurred while \output is active []
[12]
Package atveryend Info: Empty hook `AfterLastShipout' on input line 458.
(D:\LADData\RCode\dataRetrieval\inst\doc\dataRetrieval.aux)
Package atveryend Info: Executing hook `AtVeryEndDocument' on input line 458.
Package atveryend Info: Executing hook `AtEndAfterFileList' on input line 458.
Package rerunfilecheck Info: File `dataRetrieval.out' has not changed.
(rerunfilecheck) Checksum: AA1F9028E8B7805EFF68E9E6E5486C08;729.
(rerunfilecheck) Checksum: 3FF7CAE86609DE69C211E24EAF10AC22;1129.
)
Here is how much of TeX's memory you used:
7312 strings out of 494045
104581 string characters out of 3145961
185875 words of memory out of 3000000
10452 multiletter control sequences out of 15000+200000
39706 words of font info for 81 fonts, out of 3000000 for 9000
7355 strings out of 494045
105366 string characters out of 3145961
187876 words of memory out of 3000000
10477 multiletter control sequences out of 15000+200000
40004 words of font info for 82 fonts, out of 3000000 for 9000
715 hyphenation exceptions out of 8191
35i,4n,28p,540b,481s stack positions out of 5000i,500n,10000p,200000b,50000s
<C:/Program Files (x86)/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmbx10.pfb>
<C:/Program Files (x86)/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmbx12.pfb><C
:/Program Files (x86)/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmr10.pfb><C:/P
rogram Files (x86)/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmr12.pfb><C:/Prog
ram Files (x86)/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmr17.pfb><C:/Program
Files (x86)/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmr7.pfb><C:/Program Fil
es (x86)/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmr8.pfb><C:/Program Files (
x86)/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmsltt10.pfb><C:/Program Files (
x86)/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmti10.pfb><C:/Program Files (x8
6)/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmtt10.pfb>
Output written on dataRetrieval.pdf (8 pages, 162191 bytes).
35i,8n,28p,866b,483s stack positions out of 5000i,500n,10000p,200000b,50000s
<C:\Users\ldecicco\AppData\Local\MiKTeX\2.9\fonts\pk\ljfour\jknappen\ec\dpi6
00\tctt1095.pk><C:/Program Files (x86)/MiKTeX 2.9/fonts/type1/public/amsfonts/c
m/cmbx10.pfb><C:/Program Files (x86)/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/
cmbx12.pfb><C:/Program Files (x86)/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cm
r10.pfb><C:/Program Files (x86)/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmr12
.pfb><C:/Program Files (x86)/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmr17.pf
b><C:/Program Files (x86)/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmr7.pfb><C
:/Program Files (x86)/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmr8.pfb><C:/Pr
ogram Files (x86)/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmsltt10.pfb><C:/Pr
ogram Files (x86)/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmti10.pfb><C:/Prog
ram Files (x86)/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmtt10.pfb>
Output written on dataRetrieval.pdf (12 pages, 186676 bytes).
PDF statistics:
149 PDF objects out of 1000 (max. 8388607)
24 named destinations out of 1000 (max. 500000)
85 words of extra memory for PDF output out of 10000 (max. 10000000)
211 PDF objects out of 1000 (max. 8388607)
35 named destinations out of 1000 (max. 500000)
125 words of extra memory for PDF output out of 10000 (max. 10000000)
No preview for this file type
No preview for this file type
......@@ -57,26 +57,29 @@
%------------------------------------------------------------
\section{Introduction to dataRetrieval}
%------------------------------------------------------------
The dataRetrieval package was created to simplify the process of getting hydrologic data in the R enviornment. It has been specifically designed to work seamlessly with the EGRET package: Exploration and Graphics for RivEr Trends (EGRET)
The dataRetrieval package was created to simplify the process of getting hydrologic data in the R enviornment. It has been specifically designed to work seamlessly with the EGRET package: Exploration and Graphics for RivEr Trends (EGRET). See: \url{https://github.com/USGS-R/EGRET/wiki} for information on EGRET.
%------------------------------------------------------------
\subsection{What is dataRetrieval?}
%------------------------------------------------------------
The dataRetrieval package was created to simplify hydrologic data retrieval. The options are web or user-produced files.....
There is a plethora of hydrological data available on the web. This package is designed specifically to load United States Geological Survey (USGS) hydrologic data to the R enviornment. This includes daily values, real-time (unit values), site information, and water quality sample data.
\newpage
%------------------------------------------------------------
\section{Getting Started}
%------------------------------------------------------------
This section describes the options for downloading and installing the dataRetrieval package.
%------------------------------------------------------------
\subsection{Installing dataRetrieval from downloaded binary:}
\subsection{New to R?}
%------------------------------------------------------------
The dataRetrieval pacakage is available for download at \url{https://github.com/USGS-CIDA/WRTDS/downloads}. If the package's tar.gz file is saved in R's working directory, then the following commands will fully install the package:
If you are new to R, you will need to first install the latest version of R, which can be found here: \url{http://www.r-project.org/}.
There are many options for running and editing R code, one nice enviornment to learn R is RStudio. RStudio can be downloaded here: \url{http://rstudio.org/}. Once R and RStudio are installed, the dataRetrieval package needs to be installed as described in the next section.
%------------------------------------------------------------
\subsection{R User: Installing dataRetrieval from downloaded binary}
%------------------------------------------------------------
The latest dataRetrieval package build is available for download at \url{https://github.com/USGS-R/dataRetrieval/blob/master/dataRetrieval_1.2.1.tar.gz}. If the package's tar.gz file is saved in R's working directory, then the following command will fully install the package:
\begin{Schunk}
\begin{Sinput}
> install.packages("dataRetrieval_1.2.0.tar.gz",
> install.packages("dataRetrieval_1.2.1.tar.gz",
+ repos=NULL, type="source")
\end{Sinput}
\end{Schunk}
......@@ -86,7 +89,7 @@ If the downloaded file is stored in an alternative location, include the path in
\begin{Schunk}
\begin{Sinput}
> install.packages(
+ "C:/RPackages/Statistics/dataRetrieval_1.2.0.tar.gz",
+ "C:/RPackages/Statistics/dataRetrieval_1.2.1.tar.gz",
+ repos=NULL, type="source")
\end{Sinput}
\end{Schunk}
......@@ -96,26 +99,26 @@ A Mac example looks like this:
\begin{Schunk}
\begin{Sinput}
> install.packages(
+ "/Users/userA/RPackages/Statistic/dataRetrieval_1.2.0.tar.gz",
+ "/Users/userA/RPackages/Statistic/dataRetrieval_1.2.1.tar.gz",
+ repos=NULL, type="source")
\end{Sinput}
\end{Schunk}
Some users have found it necessary to delete the package folders before installing newer versions of either dataRetrieval or EGRET. If you are experiencing an issue after updating a package, trying deleting the package folder, the default location for Windows is something like this: C:/Users/ldecicco/Documents/R/win-library/2.15/dataRetrieval the default for a Mac: /Users/ldecicco/Library/R/2.15/library/dataRetrieval Then, re-install the package using the directions above. Moving to CRAN should solve this problem.
It is a good idea to re-start the R enviornment after installing the package, especially if installing an updated version (that is, restart RStudio). Some users have found it necessary to delete the previous version's package folder before installing newer version of dataRetrieval. If you are experiencing issues after updating a package, trying deleting the package folder - the default location for Windows is something like this: C:/Users/userA/Documents/R/win-library/2.15/dataRetrieval, and the default for a Mac: /Users/userA/Library/R/2.15/library/dataRetrieval. Then, re-install the package using the directions above. Moving to CRAN should solve this problem.
%------------------------------------------------------------
\subsection{Installing dataRetrieval from gitHub:}
\subsection{R Developers: Installing dataRetrieval from gitHub}
%------------------------------------------------------------
Alternatively, R-developers can install the latest version of dataRetrieval directly from gitHub using the devtools package. devtools is available on CRAN. Simpley type the following commands into R to install the latest version of dataRetrieval available on gitHub. Rtools (for Windows) and latex tools are required.
\begin{Schunk}
\begin{Sinput}
> library(devtools)
> install_github("dataRetrieval", "USGS-CIDA")
> install_github("dataRetrieval", "USGS-R")
\end{Sinput}
\end{Schunk}
To then open the library, simpley type:
To then open the library, simply type:
\begin{Schunk}
\begin{Sinput}
......@@ -123,9 +126,203 @@ To then open the library, simpley type:
\end{Sinput}
\end{Schunk}
\newpage
%------------------------------------------------------------
\subsection{A Simple Web Retrieval Example}
\section{Raw Data: USGS Web Retrieval Examples}
%------------------------------------------------------------
In this section, we will run through 4 examples, documenting how to get raw data from the web. This includes historical daily values, real-time current values, site information, and water quality data.
%------------------------------------------------------------
\subsection{USGS Web Retrieval Introduction}
%------------------------------------------------------------
The United States Geological Survey organizes their hydrological data in fairly standard structure. Gage stations are located throughout the United States, each station has a unique ID. Often (but not always), these ID's are 8 digits. The first step to finding data is discoving this 8-digit ID. One potential tool for discovering data is Environmental Data Discovery and Transformation (EnDDaT): \url{http://cida.usgs.gov/enddat/}. Follow the example in the User's Guide to learn how to discover USGS stations and available data from any location in the United States. Essentially, you can create a Project Location on the map, set a bounding box (in miles), then search for USGS Time Series and USGS Water Quality Data. Locations, ID's, available data, and available time periods will load on the map and appropriate tabs.
Once the site-ID is known, the next required input for USGS data retrievals is the 'parameter code'. This is a 5-digit code that specifies what measured paramater is being requested. A complete list of possible USGS parameter codes can be found here:
\url{http://nwis.waterdata.usgs.gov/usa/nwis/pmcodes?radio_pm_search=param_group&pm_group=All+--+include+all+parameter+groups&pm_search=&casrn_search=&srsname_search=&format=html_table&show=parameter_group_nm&show=parameter_nm&show=casrn&show=srsname&show=parameter_units}
Not every station will measure all parameters. The following is a list of commonly measured parameters:
% latex table generated in R 2.15.2 by xtable 1.7-0 package
% Mon Jan 21 16:09:09 2013
\begin{table}[ht]
\begin{center}
\caption{Commonly found USGS Parameter Codes}
\begin{tabular}{rll}
\hline
& pCode & shortName \\
\hline
1 & 00060 & Discharge [cfs] \\
2 & 00065 & Gage height [ft] \\
3 & 00010 & Temperature [C] \\
4 & 00045 & Precipitation [in] \\
5 & 00400 & pH \\
\hline
\end{tabular}
\end{center}
\end{table}
For real-time data, the parameter code and site ID will suffice. The USGS stores historical data as daily values however. The statistical process used to store the daily data is the final requirement for daily value retrievals. A 5-digit 'stat code' specifies the requested processing. A complete list of possible USGS stat codes can be found here:
\url{http://nwis.waterdata.usgs.gov/nwis/help/?read_file=stat&format=table}
The most common stat codes are:
% latex table generated in R 2.15.2 by xtable 1.7-0 package
% Mon Jan 21 16:09:09 2013
\begin{table}[ht]
\begin{center}
\caption{Commonly found USGS Stat Codes}
\begin{tabular}{rll}
\hline
& StatCode & shortName \\
\hline
1 & 00001 & Maximum \\
2 & 00002 & Minimum \\
3 & 00003 & Mean \\
4 & 00008 & Median \\
\hline
\end{tabular}
\end{center}
\end{table}
We will use the Choptank River near Greensboro, MD as an example. The site-ID for this gage station is 01491000. Daily discharge measurements are available as far back as 1948. Additionally, forms of nitrate and nitrogen have been measured dating back to 1964.
%------------------------------------------------------------
\subsection{USGS Daily Value Retrievals}
%------------------------------------------------------------
To obtain historic daily records of USGS data, use the retrieveNWISData function. The arguments for the function are siteNumber, parameterCd, startDate, endDate, statCd, and a logical (true/false) interactive. There are 2 default argument: statCd defaults to "00003" and interactive defaults to TRUE. If you want to use the default values, you do not need to list them in the function call. Setting the 'interactive' option to true will walk you through the function. It might make more sense to run large batch collections with the interactive option set to FALSE.
The dates (start and end) need to be in the format "YYYY-MM-DD". Setting the start date to "" will indicate to the program to ask for the earliest date, setting the end date to "" will ask for the latest available date.
\begin{Schunk}
\begin{Sinput}
> # Using defaults:
> siteNumber <- "01491000" # Site ID for Choptank River near Greensboro, MD
> parameterCd <- "00060" # Discharge in cubic feet per second
> startDate <- ""
> endDate <- ""
> discharge <- retrieveNWISData(siteNumber, parameterCd, startDate, endDate)
\end{Sinput}
\end{Schunk}
A dataframe is returned that looks like the following:
\begin{Schunk}
\begin{Soutput}
agency site dateTime value code
1 USGS 01491000 1948-01-01 190 A
2 USGS 01491000 1948-01-02 900 A
3 USGS 01491000 1948-01-03 480 A
4 USGS 01491000 1948-01-04 210 A
5 USGS 01491000 1948-01-05 210 A
6 USGS 01491000 1948-01-06 220 A
\end{Soutput}
\end{Schunk}
The structure of the dataframe is:
\begin{Schunk}
\begin{Soutput}
'data.frame': 23762 obs. of 5 variables:
$ agency : chr "USGS" "USGS" "USGS" "USGS" ...
$ site : chr "01491000" "01491000" "01491000" "01491000" ...
$ dateTime: Date, format: "1948-01-01" "1948-01-02" ...
$ value : num 190 900 480 210 210 220 160 130 120 100 ...
$ code : chr "A" "A" "A" "A" ...
\end{Soutput}
\end{Schunk}
Note that dateTime is imported as a Date, value is a number, and code is a string. USGS codes are often "A" (approved for publication) or "P" (provisional data subject to revision). A more complete list of qualification codes can be found here:
\url{http://waterdata.usgs.gov/usa/nwis/help?codes_help}
An example that doesn't use the defaults would be a request for maximum daily temperature in early 2012:
\begin{Schunk}
\begin{Sinput}
> # Using defaults:
> siteNumber <- "01491000" # Site ID for Choptank River near Greensboro, MD
> parameterCd <- "00010" # Temperature
> statCd <- "00001"
> startDate <- "2012-01-01"
> endDate <- "2012-06-30"
> temperature <- retrieveNWISData(siteNumber, parameterCd, startDate, endDate, StatCd=statCd)
\end{Sinput}
\end{Schunk}
%------------------------------------------------------------
\subsection{USGS Unit Value Retrievals}
%------------------------------------------------------------
We can also get real-time, instantaneous measurements using the retrieveUnitNWISData function:
\begin{Schunk}
\begin{Sinput}
> # Using defaults:
> siteNumber <- "01491000" # Site ID for Choptank River near Greensboro, MD
> parameterCd <- "00060" # Discharge in cubic feet per second
> startDate <- as.character(Sys.Date())
> endDate <- as.character(Sys.Date())
> dischargeToday <- retrieveUnitNWISData(siteNumber, parameterCd, startDate, endDate)
\end{Sinput}
\end{Schunk}
Which produces the following dataframe:
\begin{Schunk}
\begin{Soutput}
agency site dateTime tzone value code
1 USGS 01491000 2013-01-21 00:00:00 EST 231 P
2 USGS 01491000 2013-01-21 00:15:00 EST 231 P
3 USGS 01491000 2013-01-21 00:30:00 EST 234 P
4 USGS 01491000 2013-01-21 00:45:00 EST 231 P
5 USGS 01491000 2013-01-21 01:00:00 EST 231 P
6 USGS 01491000 2013-01-21 01:15:00 EST 228 P
\end{Soutput}
\end{Schunk}
The structure of the dataframe is:
\begin{Schunk}
\begin{Soutput}
'data.frame': 66 obs. of 6 variables:
$ agency : chr "USGS" "USGS" "USGS" "USGS" ...
$ site : chr "01491000" "01491000" "01491000" "01491000" ...
$ dateTime: POSIXct, format: "2013-01-21 00:00:00" "2013-01-21 00:15:00" ...
$ tzone : chr "EST" "EST" "EST" "EST" ...
$ value : num 231 231 234 231 231 228 228 228 228 228 ...
$ code : chr "P" "P" "P" "P" ...
\end{Soutput}
\end{Schunk}
Note that time now becomes important, so the dateTime is a POSIXct, and the time zone is included.
%------------------------------------------------------------
\subsection{USGS Site Information Retrievals}
%------------------------------------------------------------
To obtain all of the available site information, use the getSiteFileData function:
\begin{Schunk}
\begin{Sinput}
> # Using defaults:
> siteNumber <- "01491000" # Site ID for Choptank River near Greensboro, MD
> ChopTankInfo <- getSiteFileData(siteNumber)
\end{Sinput}
\end{Schunk}
The available date for these for the USGS sites are:
\begin{Schunk}
\begin{Sinput}
> colnames(ChopTankInfo)
\end{Sinput}
\begin{Soutput}
[1] "agency.cd" "site.no" "station.nm"
[4] "site.tp.cd" "lat.va" "long.va"
[7] "dec.lat.va" "dec.long.va" "coord.meth.cd"
[10] "coord.acy.cd" "coord.datum.cd" "dec.coord.datum.cd"
[13] "district.cd" "state.cd" "county.cd"
[16] "country.cd" "land.net.ds" "map.nm"
[19] "map.scale.fc" "alt.va" "alt.meth.cd"
[22] "alt.acy.va" "alt.datum.cd" "huc.cd"
[25] "basin.cd" "topo.cd" "instruments.cd"
[28] "construction.dt" "inventory.dt" "drain.area.va"
[31] "contrib.drain.area.va" "tz.cd" "local.time.fg"
[34] "reliability.cd" "gw.file.cd" "nat.aqfr.cd"
[37] "aqfr.cd" "aqfr.type.cd" "well.depth.va"
[40] "hole.depth.va" "depth.src.cd" "project.no"
[43] "queryTime"
\end{Soutput}
\end{Schunk}
%------------------------------------------------------------
\subsection{USGS Water Quality Retrievals}
%------------------------------------------------------------
In this example, we use 3 dataRetrieval functions to get daily streamflow data and inorganic nitrogen sample results, and site information for a USGS gaging station with the ID 06934500. The station is Missouri River at Hermann, MO (which is discovered in the INFO dataset).
\begin{Schunk}
......
\select@language {american}
\contentsline {section}{\numberline {1}Introduction to dataRetrieval}{1}{section.1}
\contentsline {subsection}{\numberline {1.1}What is dataRetrieval?}{2}{subsection.1.1}
\contentsline {section}{\numberline {2}Getting Started}{3}{section.2}
\contentsline {subsection}{\numberline {2.1}Installing dataRetrieval from downloaded binary:}{3}{subsection.2.1}
\contentsline {subsection}{\numberline {2.2}Installing dataRetrieval from gitHub:}{3}{subsection.2.2}
\contentsline {subsection}{\numberline {2.3}A Simple Web Retrieval Example}{4}{subsection.2.3}
\contentsline {section}{\numberline {3}Function Details}{7}{section.3}
\contentsline {subsection}{\numberline {3.1}Daily Value Retrievals}{7}{subsection.3.1}
\contentsline {subsection}{\numberline {3.2}Water Quality Retrievals}{7}{subsection.3.2}
\contentsline {subsection}{\numberline {3.3}Site Information Retrievals}{7}{subsection.3.3}
\contentsline {section}{\numberline {1}Introduction to dataRetrieval}{2}{section.1}
\contentsline {section}{\numberline {2}Getting Started}{2}{section.2}
\contentsline {subsection}{\numberline {2.1}New to R?}{2}{subsection.2.1}
\contentsline {subsection}{\numberline {2.2}R User: Installing dataRetrieval from downloaded binary}{2}{subsection.2.2}
\contentsline {subsection}{\numberline {2.3}R Developers: Installing dataRetrieval from gitHub}{3}{subsection.2.3}
\contentsline {section}{\numberline {3}Raw Data: USGS Web Retrieval Examples}{4}{section.3}
\contentsline {subsection}{\numberline {3.1}USGS Web Retrieval Introduction}{4}{subsection.3.1}
\contentsline {subsection}{\numberline {3.2}USGS Daily Value Retrievals}{5}{subsection.3.2}
\contentsline {subsection}{\numberline {3.3}USGS Unit Value Retrievals}{6}{subsection.3.3}
\contentsline {subsection}{\numberline {3.4}USGS Site Information Retrievals}{7}{subsection.3.4}
\contentsline {subsection}{\numberline {3.5}USGS Water Quality Retrievals}{8}{subsection.3.5}
\contentsline {section}{\numberline {4}Function Details}{11}{section.4}
\contentsline {subsection}{\numberline {4.1}Daily Value Retrievals}{11}{subsection.4.1}
\contentsline {subsection}{\numberline {4.2}Water Quality Retrievals}{11}{subsection.4.2}
\contentsline {subsection}{\numberline {4.3}Site Information Retrievals}{11}{subsection.4.3}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment