Skip to content

Commit fafd872

Browse files
committed
word smithing
1 parent b12909d commit fafd872

File tree

1 file changed

+15
-9
lines changed

1 file changed

+15
-9
lines changed

hands-on.qmd

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,9 @@ species_csv %>%
5050
summarize(num_species = n())
5151
```
5252

53-
We are interested in the `Study species` because according to the metadata they are the species that are included in the data sets for banding, resighting, and/or nest monitoring. Let us extract the species and sort them in alphabetical order:
53+
We are interested in the `Study species` because according to the metadata, they are the species that are included in the data sets for banding, resighting, and/or nest monitoring.
54+
55+
Let us extract the species and sort them in alphabetical order:
5456

5557
```{r}
5658
# list of the bird species included in the study
@@ -84,7 +86,7 @@ How do we join those tables?
8486
glimpse(eggs_csv)
8587
```
8688

87-
`Nest_Id` seems promising as a foreign key!!
89+
`Nest_Id` seems like promising as a foreign key!!
8890

8991
```{r}
9092
glimpse(nests_csv)
@@ -94,9 +96,13 @@ glimpse(nests_csv)
9496

9597
OK let's do it:
9698

97-
First compute the average of the volume of an egg. we can use the following formula:
99+
First, we need to compute the average of the volume of an egg. We can use the following formula:
100+
101+
$Volume=\frac{\Pi}6W^2L$
102+
103+
Where W is the width and L the length of the egg
98104

99-
$\frac{\Pi}6W^2L$
105+
We can use mutate to do so:
100106

101107
```{r}
102108
eggs_area_df <- eggs_csv %>%
@@ -114,6 +120,7 @@ species_egg_volume_avg <- left_join(nests_csv, eggs_area_df, by="Nest_ID") %>%
114120
115121
species_egg_volume_avg
116122
```
123+
117124
Ideally we would like the scientific names...
118125

119126
```{r}
@@ -128,7 +135,7 @@ species_egg_area_avg
128135

129136
### Load the bird database
130137

131-
This database has been built from the csv files we just manipulated, so the data should be very similar - note we did not say identical more on this in the last section:
138+
This database has been built from the csv files we just analyzed, so the data should be very similar - note we did not say identical more on this in the last section:
132139

133140
```{r}
134141
conn <- dbConnect(duckdb::duckdb(), dbdir = "./data/bird_database.duckdb", read_only = FALSE)
@@ -155,11 +162,11 @@ species_db %>%
155162
head(3)
156163
```
157164

158-
Note that those are not data frames but tables. What `dbplyr` is actually doing behind the scenes is translating all those dplyr operations into SQL, sending the SQL to the database, retrieving results, etc.
165+
Note that those are not data frames but tables. What `dbplyr` is actually doing behind the scenes is translating all those dplyr operations into SQL, sending the SQL code to query the database, retrieving results, etc.
159166

160167
#### How can I get a "real data frame?"
161168

162-
you add `collect()` to your query.
169+
You add `collect()` to your query.
163170

164171
```{r}
165172
species_db %>%
@@ -172,7 +179,7 @@ species_db %>%
172179

173180
Note it means the full query is going to be ran and save in your environment. This might slow things down so you generally want to collect on the smallest data frame you can
174181

175-
#### How can you see the SQL query equivalent to the tidyverse code?
182+
#### How can you see the SQL query equivalent to the tidyverse code? => `show_query()`
176183

177184
```{r}
178185
# Add show_query() to the end to see what SQL it is sending!
@@ -285,4 +292,3 @@ DBI::dbDisconnect(conn, shutdown = TRUE)
285292

286293
You might be wondering, how we created this database from our csv files. Most databases have some function to help you import csv files into databases. Note that since there is not data modeling (does not have to be normalized or tidy) constraints nor data type constraints a lot things can go wrong. This is a great opportunity to implement a QA/QC on your data and help you to keep clean and tidy moving forward as new data are collected.
287294

288-

0 commit comments

Comments
 (0)