You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: hands-on.qmd
+15-9Lines changed: 15 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -50,7 +50,9 @@ species_csv %>%
50
50
summarize(num_species = n())
51
51
```
52
52
53
-
We are interested in the `Study species` because according to the metadata they are the species that are included in the data sets for banding, resighting, and/or nest monitoring. Let us extract the species and sort them in alphabetical order:
53
+
We are interested in the `Study species` because according to the metadata, they are the species that are included in the data sets for banding, resighting, and/or nest monitoring.
54
+
55
+
Let us extract the species and sort them in alphabetical order:
54
56
55
57
```{r}
56
58
# list of the bird species included in the study
@@ -84,7 +86,7 @@ How do we join those tables?
84
86
glimpse(eggs_csv)
85
87
```
86
88
87
-
`Nest_Id` seems promising as a foreign key!!
89
+
`Nest_Id` seems like promising as a foreign key!!
88
90
89
91
```{r}
90
92
glimpse(nests_csv)
@@ -94,9 +96,13 @@ glimpse(nests_csv)
94
96
95
97
OK let's do it:
96
98
97
-
First compute the average of the volume of an egg. we can use the following formula:
99
+
First, we need to compute the average of the volume of an egg. We can use the following formula:
This database has been built from the csv files we just manipulated, so the data should be very similar - note we did not say identical more on this in the last section:
138
+
This database has been built from the csv files we just analyzed, so the data should be very similar - note we did not say identical more on this in the last section:
Note that those are not data frames but tables. What `dbplyr` is actually doing behind the scenes is translating all those dplyr operations into SQL, sending the SQL to the database, retrieving results, etc.
165
+
Note that those are not data frames but tables. What `dbplyr` is actually doing behind the scenes is translating all those dplyr operations into SQL, sending the SQL code to query the database, retrieving results, etc.
159
166
160
167
#### How can I get a "real data frame?"
161
168
162
-
you add `collect()` to your query.
169
+
You add `collect()` to your query.
163
170
164
171
```{r}
165
172
species_db %>%
@@ -172,7 +179,7 @@ species_db %>%
172
179
173
180
Note it means the full query is going to be ran and save in your environment. This might slow things down so you generally want to collect on the smallest data frame you can
174
181
175
-
#### How can you see the SQL query equivalent to the tidyverse code?
182
+
#### How can you see the SQL query equivalent to the tidyverse code? => `show_query()`
176
183
177
184
```{r}
178
185
# Add show_query() to the end to see what SQL it is sending!
You might be wondering, how we created this database from our csv files. Most databases have some function to help you import csv files into databases. Note that since there is not data modeling (does not have to be normalized or tidy) constraints nor data type constraints a lot things can go wrong. This is a great opportunity to implement a QA/QC on your data and help you to keep clean and tidy moving forward as new data are collected.
0 commit comments