You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Data set hosted by the NSF Arctic Data Center (<https://arcticdata.io>)
24
24
25
-
Field data on shorebird ecology and environmental conditions were collected from 1993-2014 at 16 field sites in Alaska, Canada, and Russia.
26
-
27
-
Data were not collected in every year at all sites. Studies of the population ecology of these birds included nest-monitoring to determine timing of reproduction and reproductive success; live capture of birds to collect blood samples, feathers, and fecal samples for investigations of population structure and pathogens; banding of birds to determine annual survival rates; resighting of color-banded birds to determine space use and site fidelity; and use of light-sensitive geolocators to investigate migratory movements. Data on climatic conditions, prey abundance, and predators were also collected. Environmental data included weather stations that recorded daily climatic conditions, surveys of seasonal snowmelt, weekly sampling of terrestrial and aquatic invertebrates that are prey of shorebirds, live trapping of small mammals (alternate prey for shorebird predators), and daily counts of potential predators (jaegers, falcons, foxes). Detailed field methods for each year are available in the ASDN_protocol_201X.pdf files. All research was conducted under permits from relevant federal, state and university authorities.
28
-
29
-
See ` 01_ASDN_Readme.txt` provided in the `data` folder for full metadata information about this data set.
25
+
Field data on shorebird ecology and environmental conditions were collected from 1993-2014 at 16 field sites in Alaska, Canada, and Russia.
30
26
27
+
Data were not collected in every year at all sites. Studies of the population ecology of these birds included nest-monitoring to determine timing of reproduction and reproductive success; live capture of birds to collect blood samples, feathers, and fecal samples for investigations of population structure and pathogens; banding of birds to determine annual survival rates; resighting of color-banded birds to determine space use and site fidelity; and use of light-sensitive geolocators to investigate migratory movements. Data on climatic conditions, prey abundance, and predators were also collected. Environmental data included weather stations that recorded daily climatic conditions, surveys of seasonal snowmelt, weekly sampling of terrestrial and aquatic invertebrates that are prey of shorebirds, live trapping of small mammals (alternate prey for shorebird predators), and daily counts of potential predators (jaegers, falcons, foxes). Detailed field methods for each year are available in the ASDN_protocol_201X.pdf files. All research was conducted under permits from relevant federal, state and university authorities.
31
28
29
+
See `01_ASDN_Readme.txt` provided in the `data` folder for full metadata information about this data set.
32
30
33
31
## Analyzing the bird dataset using csv files (raw data)
34
32
35
-
36
-
Let us import the csv files with the species information:
33
+
Let us import the csv files with the bird species information:
37
34
38
35
```{r}
39
36
# Import the species
40
37
species_csv <- read_csv("data/species.csv")
41
38
42
39
glimpse(species_csv)
43
40
```
41
+
44
42
Let's explore what is in the `Relevance` attribute/column:
45
43
46
44
```{r}
@@ -49,40 +47,90 @@ species_csv %>%
49
47
count()
50
48
```
51
49
52
-
We are interested in the `Study species` because according to the metadata they are the species that are
53
-
included in the data sets for banding, resighting, and/or nest monitoring. Let us extract the species and sort them in alphabetical order:
50
+
We are interested in the `Study species` because according to the metadata they are the species that are included in the data sets for banding, resighting, and/or nest monitoring. Let us extract the species and sort them in alphabetical order:
54
51
55
52
```{r}
56
53
# list of the bird species included in the study
57
-
species_csv %>%
54
+
species_study <- species_csv %>%
58
55
filter(Relevance=="Study species") %>%
59
-
select(Scientific_name) %>%
56
+
select(Scientific_name, Code) %>%
60
57
arrange(Scientific_name)
58
+
59
+
species_study
61
60
```
62
61
63
-
Now we can load more information about the sites, nests, and eggs monitoring
62
+
We would like to know what is the average egg size for each of those bird species. How would we do that?
64
63
65
-
```{r}
66
-
sites_csv <- read_csv("data/site.csv")
64
+
We will need more information that what we have in our species table. Actually we will need to also retrieve information from the nests and eggs monitoring table.
65
+
66
+
An egg is in a nest, and a nest is associated with a species
67
67
68
+
```{r}
69
+
# information about the nests
68
70
nests_csv <- read_csv("data/ASDN_Bird_nests.csv")
69
71
72
+
# information about the
70
73
eggs_csv <- read_csv("data/ASDN_Bird_eggs.csv")
71
74
```
72
75
76
+
How do we join those tables?
73
77
74
-
## Let's connect to our first database
78
+
```{r}
79
+
glimpse(eggs_csv)
80
+
```
81
+
82
+
`Nest_Id` seems promising as a foreign key!!
83
+
84
+
```{r}
85
+
glimpse(nests_csv)
86
+
```
87
+
88
+
`Species` is probably the field we will use to join nest to the species
89
+
90
+
OK let's do it:
91
+
92
+
First compute the average of the volume of an egg. we can use the following formula:
inner_join(species_egg_volume_avg, by = join_by(Code == Species))
119
+
120
+
species_egg_area_avg
121
+
```
122
+
123
+
124
+
## Let's connect to our first database
76
125
77
126
### Load the bird database
78
127
79
-
This database has been built from the csv files we just manipulated, so the data should be very similar - note we did not say identical more on this in the last section:
128
+
This database has been built from the csv files we just manipulated, so the data should be very similar - note we did not say identical more on this in the last section:
### Let's try to reproduce the analaysis we just did
87
135
88
136
```{r}
@@ -102,7 +150,7 @@ Note that those are not dataframes but tables. What `dbplyr` is actually doing b
102
150
103
151
#### How can I get a "real dataframe?"
104
152
105
-
you add `collect()` to your query.
153
+
you add `collect()` to your query.
106
154
107
155
```{r}
108
156
species %>%
@@ -113,10 +161,8 @@ species %>%
113
161
collect()
114
162
```
115
163
116
-
117
164
Note it means the full query is going to be ran and save in you memory. This might slow things down so you generally want to collect on the smallest data frame you can
118
165
119
-
120
166
#### How can you see the SQL query equivalent to the tidyverse code?
121
167
122
168
```{r}
@@ -128,6 +174,7 @@ species %>%
128
174
head(3) %>%
129
175
show_query()
130
176
```
177
+
131
178
This is a great way to start getting familiar with the SQL syntax, because although you can do a lot with `dbplyr` you can not do everything that SQL can do. So at some point you might want to start using SQL directly.
132
179
133
180
Here is how you could run the query using the SQL code directly
@@ -149,13 +196,13 @@ species %>%
149
196
150
197
Does that code looks familiar? But this time, here is really the query that was used to retrieve this information:
151
198
152
-
153
199
```{r}
154
200
species %>%
155
201
group_by(Relevance) %>%
156
202
summarize(num_species = n()) %>%
157
203
show_query()
158
204
```
205
+
159
206
```{r}
160
207
species %>%
161
208
mutate(Code = paste("X", Code)) %>%
@@ -171,8 +218,6 @@ species %>%
171
218
172
219
Limitation: no way to add or update data, `dbplyr` is view only. If you want to add or update data, you'll need to use the `DBI` package functions.
173
220
174
-
175
-
176
221
### Disconnecting from the database
177
222
178
223
Before we close our session, it is good practice to disconnect from the database first
@@ -181,11 +226,6 @@ Before we close our session, it is good practice to disconnect from the database
181
226
DBI::dbDisconnect(conn, shutdown = TRUE)
182
227
```
183
228
184
-
185
229
## How did we create this database
186
230
187
231
You might be wondering, how we created this database from our csv files. Most databases have some function to help you import csv files into databases. Note that since there is not data modeling (does not have to be normalized or tidy) constraints nor data type constraints a lot things can go wrong. This is a great opportunity to implement a QA/QC on your data and help you to keep clean and tidy moving forward as new data are collected.
0 commit comments