Problem creating a data frame of all measurements from NJ
Created by: DRosenman
I am trying to use the USGS's R-package 'dataRetrieval') in order to create a dataframe of all measurements from all NJ stations for the following parameters (full list of all parameters:
00003 (Sampling depth, feet) 00010 (Temperature, water, degrees Celsius) 00070 (Physical Turbidity, water, unfiltered, Jackson Turbidity) 00075 (Turbidity, water, unfiltered, Hellige turbidimeter, milligrams per liter as silicon dioxide) 00095 (Physical Specific conductance, water, unfiltered, microsiemens per centimeter at 25 degrees Celsius) 00300 (Dissolved oxygen, water, unfiltered, milligrams per liter) 00301 (Dissolved oxygen, water, unfiltered, percent of saturation) 00400 (pH, water, unfiltered, field, standard units) 00633 (Nitrate plus nitrite, bed sediment, total, dry weight, milligrams per kilogram as nitrogen) 32284 (Chlorophyll a, total, in situ, fluorescence excitation at 370, 470, 525, 570, 590, 610 nm, fluorescence emission at 700 nm with correction for CDOM) 32285 (Chlorophyll a, green algae, in situ, fluorescence excitation at 370, 470, 525, 570, 590, 610 nm, fluorescence emission at 700 nm) 6380 - (Turbidity, water, unfiltered, monochrome near infra-red LED light, 780-900 nm, detection)
I was able to create a vector of site values for each NJ station using the following R code:
library(dataRetrieval)
nj_sites <- whatNWISsites(stateCD = "NJ") #dataframe of stations in NJ
site_nos <- nj_sites$site_no #vector of station ID numbers
Before trying to get every individual measurement, I figured I would test out getting daily mean measurements from each station from January 1st 2012 to June 1st 2012. I assumed that the since I used 12 parameters, the following code (combined with the code above) would produce a data.frame with 15 columns (12 for the parameters + agency_cd,site_no and Date columns). Instead, it produced a data.frame with 55 variables.
parameters <- c('00003','00010','00070','00075','00095','00300','00301','00400','00633',
'32284','32285','63680')
stat <- c("00003")
startDate <- "2012-01-01"
endDate <- "2012-06-01"
parameter_info <- readNWISpCode(parameters)$parameter_nm
station_data <- readNWISdv(siteNumber = site_nos,
parameterCd = parameters,
startDate = startDate,
endDate = endDate)
There is clearly something I am not understanding about the dataRetrieval package. Does anyone know what I'm doing wrong?
Also, other than the agency_cd, site_no, Date, X_00010_00003, and X_00010_00003_cd columns the other columns have a minimum of 60% NA values and most have over 90% NA value.
Here's a summary of the data.frame
summary(station_data)
agency_cd site_no Date X_00010_00003 X_00010_00003_cd X_.from.right.intake_00010_00003
Length:1892 Length:1892 Min. :2012-01-01 Min. :-0.100 Length:1892 Min. : 1.30
Class :character Class :character 1st Qu.:2012-02-15 1st Qu.: 4.000 Class :character 1st Qu.: 4.70
Mode :character Mode :character Median :2012-03-24 Median : 9.300 Mode :character Median :11.20
Mean :2012-03-21 Mean : 9.689 Mean :10.46
3rd Qu.:2012-04-28 3rd Qu.:14.300 3rd Qu.:15.30
Max. :2012-06-01 Max. :26.400 Max. :24.70
NA's :303 NA's :1739
X_.from.right.intake_00010_00003_cd X_.from.middle.intake_00010_00003 X_.from.middle.intake_00010_00003_cd X_.from.left.intake_00010_00003
Length:1892 Min. : 1.00 Length:1892 Min. : 1.3
Class :character 1st Qu.: 4.70 Class :character 1st Qu.: 4.6
Mode :character Median :11.20 Mode :character Median :11.0
Mean :10.43 Mean :10.3
3rd Qu.:15.40 3rd Qu.:15.4
Max. :24.90 Max. :24.2
NA's :1739 NA's :1739
X_.from.left.intake_00010_00003_cd X_.from.right.intake_00095_00003 X_.from.right.intake_00095_00003_cd X_.from.middle.intake_00095_00003
Length:1892 Min. :322 Length:1892 Min. :275.0
Class :character 1st Qu.:463 Class :character 1st Qu.:434.0
Mode :character Median :570 Mode :character Median :482.0
Mean :563 Mean :472.3
3rd Qu.:640 3rd Qu.:517.0
Max. :877 Max. :816.0
NA's :1739 NA's :1739
X_.from.middle.intake_00095_00003_cd X_.from.left.intake_00095_00003 X_.from.left.intake_00095_00003_cd X_.from.right.intake_00300_00003
Length:1892 Min. :225.0 Length:1892 Min. : 3.90
Class :character 1st Qu.:344.0 Class :character 1st Qu.: 7.60
Mode :character Median :411.0 Mode :character Median :11.50
Mean :399.2 Mean :10.07
3rd Qu.:437.0 3rd Qu.:12.30
Max. :784.0 Max. :13.10
NA's :1739 NA's :1739
X_.from.right.intake_00300_00003_cd X_.from.middle.intake_00300_00003 X_.from.middle.intake_00300_00003_cd X_.from.left.intake_00300_00003
Length:1892 Min. : 4.1 Length:1892 Min. : 5.00
Class :character 1st Qu.: 7.6 Class :character 1st Qu.: 8.30
Mode :character Median :12.2 Mode :character Median :12.80
Mean :10.7 Mean :11.34
3rd Qu.:12.9 3rd Qu.:13.40
Max. :14.0 Max. :14.30
NA's :1739 NA's :1739
X_.from.left.intake_00300_00003_cd X_.from.right.intake_00301_00003 X_.from.right.intake_00301_00003_cd X_.from.middle.intake_00301_00003
Length:1892 Min. : 48.00 Length:1892 Min. : 50.00
Class :character 1st Qu.: 72.50 Class :character 1st Qu.: 73.75
Mode :character Median : 92.00 Mode :character Median : 98.00
Mean : 87.33 Mean : 93.02
3rd Qu.: 98.00 3rd Qu.:105.00
Max. :126.00 Max. :130.00
NA's :1740 NA's :1740
X_.from.middle.intake_00301_00003_cd X_.from.left.intake_00301_00003 X_.from.left.intake_00301_00003_cd X_.from.right.intake_32284_00003
Length:1892 Min. : 60.00 Length:1892 Min. : 0.900
Class :character 1st Qu.: 83.00 Class :character 1st Qu.: 4.200
Mode :character Median :101.00 Mode :character Median : 6.000
Mean : 98.57 Mean : 6.323
3rd Qu.:110.00 3rd Qu.: 7.600
Max. :133.00 Max. :17.300
NA's :1740 NA's :1739
X_.from.right.intake_32284_00003_cd X_.from.left.intake_32284_00003 X_.from.left.intake_32284_00003_cd X_.from.right.intake_63680_00003
Length:1892 Min. : 0.10 Length:1892 Min. : 3.300
Class :character 1st Qu.: 2.90 Class :character 1st Qu.: 4.800
Mode :character Median : 4.45 Mode :character Median : 6.500
Mean :13.28 Mean : 7.905
3rd Qu.:16.38 3rd Qu.: 9.800
Max. :66.20 Max. :22.400
NA's :1742 NA's :1739
X_.from.right.intake_63680_00003_cd X_.from.middle.intake_63680_00003 X_.from.middle.intake_63680_00003_cd X_.from.left.intake_63680_00003
Length:1892 Min. : 2.600 Length:1892 Min. : 1.40
Class :character 1st Qu.: 3.700 Class :character 1st Qu.: 2.00
Mode :character Median : 5.500 Mode :character Median : 4.40
Mean : 6.682 Mean : 4.85
3rd Qu.: 8.700 3rd Qu.: 7.00
Max. :21.000 Max. :16.30
NA's :1739 NA's :1739
X_.from.left.intake_63680_00003_cd X_00300_00003 X_00300_00003_cd X_00095_00003 X_00095_00003_cd X_Pennsylvania.side_00010_00003
Length:1892 Min. : 7.00 Length:1892 Min. : 83.0 Length:1892 Min. : 0.100
Class :character 1st Qu.: 9.50 Class :character 1st Qu.:183.0 Class :character 1st Qu.: 3.700
Mode :character Median :10.70 Mode :character Median :239.0 Mode :character Median : 9.800
Mean :11.04 Mean :265.7 Mean : 9.311
3rd Qu.:13.10 3rd Qu.:345.0 3rd Qu.:14.425
Max. :14.80 Max. :803.0 Max. :23.200
NA's :1442 NA's :1175 NA's :1742
X_Pennsylvania.side_00010_00003_cd X_Pennsylvania.side_00095_00003 X_Pennsylvania.side_00095_00003_cd X_Pennsylvania.side_00300_00003
Length:1892 Min. : 94.0 Length:1892 Min. : 8.40
Class :character 1st Qu.:162.2 Class :character 1st Qu.:10.85
Mode :character Median :183.0 Mode :character Median :13.50
Mean :183.6 Mean :12.52
3rd Qu.:205.8 3rd Qu.:14.10
Max. :250.0 Max. :15.00
NA's :1742 NA's :1745
X_Pennsylvania.side_00300_00003_cd X_00301_00003 X_00301_00003_cd X_Pennsylvania.side_63680_00003 X_Pennsylvania.side_63680_00003_cd X_63680_00003
Length:1892 Min. : 68.0 Length:1892 Min. : 0.100 Length:1892 Min. : 1.300
Class :character 1st Qu.: 96.0 Class :character 1st Qu.: 0.800 Class :character 1st Qu.: 2.800
Mode :character Median :101.0 Mode :character Median : 1.400 Mode :character Median : 5.000
Mean :101.1 Mean : 3.666 Mean : 8.732
3rd Qu.:104.0 3rd Qu.: 2.675 3rd Qu.: 11.225
Max. :145.0 Max. :90.100 Max. :131.000
NA's :1298 NA's :1746 NA's :1408
X_63680_00003_cd
Length:1892
Class :character
Mode :character
Thanks in advance for your help.