-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Learn keys from SQLite databases #352
base: main
Are you sure you want to change the base?
Conversation
Thanks, this is amazing! If roundtrip worked, testing would be easier indeed. It feels like I'm happy to support this to have another example for a data source that can be learned from. It means a bit more work further down -- learning should lead to an intermediate format first (#229) and also work for compound keys (#3). If you would like to take a stab at defining |
It isn't specified anywhere yet: Not sure about the name yet -- do we need another |
Looking at your query, maybe the structure for the "meta dm" can be modeled after SQLite? Also, would it be difficult to rewrite as dbplyr code, perhaps using e.g. |
Yes, primary keys roundtrip, so that's a good start!
Sure, I'll give it a try!
I think this is doable, although we'd have to query SQLite Column Info Sketchlibrary(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)
library(purrr)
library(DBI)
library(RSQLite)
tmp_db <- tempfile(fileext = ".db")
con <- dbConnect(SQLite(), tmp_db)
# Create Tables
dbExecute(con, "CREATE TABLE first (id INTEGER PRIMARY KEY)")
#> [1] 0
dbExecute(con, paste(
"CREATE TABLE second (",
"id INTEGER PRIMARY KEY, first_id INTEGER,",
"FOREIGN KEY (first_id) REFERENCES first (id))"
))
#> [1] 0
dbExecute(con, paste(
"CREATE TABLE third (",
"id INTEGER PRIMARY KEY, first_id INTEGER, second_id INTEGER,",
"FOREIGN KEY (first_id) REFERENCES first (id),",
"FOREIGN KEY (second_id) REFERENCES second (id))"
))
#> [1] 0
dbExecute(con, paste(
"CREATE TABLE fourth (",
"id_1 INTEGER, id_2 INTEGER, third_id INTEGER,",
"PRIMARY KEY (id_1, id_2),",
"FOREIGN KEY (third_id) REFERENCES third (id))"
))
#> [1] 0 This next part could by skipped using (tables <-
list(
tbl(con, "sqlite_master"),
tbl(con, "sqlite_temp_master")
) %>%
map(~ filter(.x, type == "table")) %>%
map_dfr(collect))
#> # A tibble: 4 x 5
#> type name tbl_name rootpage sql
#> <chr> <chr> <chr> <int> <chr>
#> 1 table first first 2 CREATE TABLE first (id INTEGER PRIMARY KEY)
#> 2 table second second 3 CREATE TABLE second ( id INTEGER PRIMARY KEY, …
#> 3 table third third 4 CREATE TABLE third ( id INTEGER PRIMARY KEY, f…
#> 4 table fourth fourth 5 CREATE TABLE fourth ( id_1 INTEGER, id_2 INTEG…
table_names <- set_names(tables$name) (columns <-
table_names %>%
map(~ tbl(con, ident_q(sprintf("pragma_table_info('%s')", .x)))) %>%
map_dfr(collect, .id = "table"))
#> # A tibble: 9 x 7
#> table cid name type notnull dflt_value pk
#> <chr> <int> <chr> <chr> <int> <lgl> <int>
#> 1 first 0 id INTEGER 0 NA 1
#> 2 second 0 id INTEGER 0 NA 1
#> 3 second 1 first_id INTEGER 0 NA 0
#> 4 third 0 id INTEGER 0 NA 1
#> 5 third 1 first_id INTEGER 0 NA 0
#> 6 third 2 second_id INTEGER 0 NA 0
#> 7 fourth 0 id_1 INTEGER 0 NA 1
#> 8 fourth 1 id_2 INTEGER 0 NA 2
#> 9 fourth 2 third_id INTEGER 0 NA 0 (foreign_keys <-
table_names %>%
map(~ tbl(con, ident_q(sprintf("pragma_foreign_key_list('%s')", .x)))) %>%
map_dfr(collect, .id = "table"))
#> # A tibble: 4 x 9
#> table id seq table from to on_update on_delete match
#> <chr> <int> <int> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 second 0 0 first first_id id NO ACTION NO ACTION NONE
#> 2 third 0 0 second second_id id NO ACTION NO ACTION NONE
#> 3 third 1 0 first first_id id NO ACTION NO ACTION NONE
#> 4 fourth 0 0 third third_id id NO ACTION NO ACTION NONE Created on 2020-05-11 by the reprex package (v0.3.0) |
Unfortunately, in SQLite it's not possible to modify foreign keys using Instead of implementing the 12-step table schema update method, it might be easier to include the foreign keys in the |
Ouch. We already had to force the We need to sort the tables topologically so that fact tables come before dimension tables. There's code in #319 that does this. I looked into For SQLite it feels we can collect the relevant tables from |
@gadenbuie: Would you like to take another look? The idea would be to implement |
Hi all, I know there are a few things currently in motion with respect to SQLite databases, so I'm submitting this with the complete expectation that you may want to implement this in another way. Please take whatever you find useful.
The primary change I've made is to add an
sqlite_learn_query()
that's wired up throughdb_learn_query()
and that returns a table of table and column information in same structure aspostgres_learn_query()
.I've added a small test to confirm that my changes work, and I know I'm not taking full advantage of your testing infrastructure. I tried to setup a test using
dm_nycflights_small()
but the foreign keys weren't making the round trip and I wasn't sure if this was a bug or a problem with how I was usingcopy_dm_to()
. I'd be happy to revise the tests with a little guidance.