Skip to content

Commit 6c4aff1

Browse files
committed
Add data modeling info, not done yet
1 parent fafd872 commit 6c4aff1

File tree

1 file changed

+26
-1
lines changed

1 file changed

+26
-1
lines changed

hands-on.qmd

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -290,5 +290,30 @@ DBI::dbDisconnect(conn, shutdown = TRUE)
290290

291291
## How did we create this database
292292

293-
You might be wondering, how we created this database from our csv files. Most databases have some function to help you import csv files into databases. Note that since there is not data modeling (does not have to be normalized or tidy) constraints nor data type constraints a lot things can go wrong. This is a great opportunity to implement a QA/QC on your data and help you to keep clean and tidy moving forward as new data are collected.
293+
You might be wondering, how we created this database from our csv files. Most databases have some function to help you import csv files into databases. Note that since there is not data modeling (does not have to be normalized or tidy) constraints nor data type constraints a lot things can go wrong. This is a great opportunity to implement a QA/QC on your data and help you to keep clean and tidy moving forward as new data are collected. As an example, here's
294+
295+
```{sql eval=FALSE}
296+
CREATE TABLE Bird_eggs (
297+
Book_page VARCHAR,
298+
Year INTEGER NOT NULL CHECK (Year BETWEEN 1950 AND 2015),
299+
Site VARCHAR NOT NULL,
300+
FOREIGN KEY (Site) REFERENCES Site (Code),
301+
Nest_ID VARCHAR NOT NULL,
302+
FOREIGN KEY (Nest_ID) REFERENCES Bird_nests (Nest_ID),
303+
Egg_num INTEGER NOT NULL CHECK (Egg_num BETWEEN 1 AND 20),
304+
Length FLOAT NOT NULL CHECK (Length > 0 AND Length < 100),
305+
Width FLOAT NOT NULL CHECK (Width > 0 AND Width < 100),
306+
PRIMARY KEY (Nest_ID, Egg_num)
307+
);
308+
309+
COPY Bird_eggs FROM 'ASDN_Bird_eggs.csv' (header TRUE);
310+
```
311+
312+
DuckDB's `COPY` SQL command reads a csv file into a database table. Had we not already created the table in the previous statement, DuckDB would have created a table automatically and guessed at column names and data types. But by explicitly declaring the table, we are able to better characterize the data. Notable in the above:
313+
314+
- `NOT NULL` indicates that missing values are not allowed.
315+
- Constraints (e.g., `Egg_num BETWEEN 1 and 20`) express expectations about the data and either.
316+
- A `FOREIGN KEY` declares that a value must refer to an existing value in another table, i.e., it must be a reference.
317+
- A `PRIMARY KEY` identifies a quantity that should be unique within each row, and that serves as a row identifier.
294318

319+
Understand that a table declaration serves as documentation, the database actually

0 commit comments

Comments
 (0)