Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: New internal dm_meta() for learning a data model from the database, for now for SQL Server only (#342) #517

Merged
merged 87 commits into from
May 28, 2022
Merged
Show file tree
Hide file tree
Changes from 72 commits
Commits
Show all changes
87 commits
Select commit Hold shift + click to select a range
8f1f849
Draft dm_meta()
krlmlr Apr 29, 2021
2ae1a06
Show <- again in rigg()
krlmlr Apr 29, 2021
e56ee25
Copy from other branch
krlmlr Apr 29, 2021
5682b27
Tweak
krlmlr Apr 29, 2021
82c9624
Keep old behavior for now
krlmlr Apr 29, 2021
aed5884
Add column selection, remove unneeded tables
krlmlr Apr 29, 2021
83eef21
Select later
krlmlr Apr 29, 2021
f254e6e
Postgres, PK and FK
krlmlr Apr 29, 2021
1d5f0bc
Split function
krlmlr Apr 29, 2021
44c9205
key_dm
krlmlr Apr 30, 2021
2551eea
More keys
krlmlr Apr 30, 2021
161e45f
Now try MSSQL
krlmlr Apr 30, 2021
7d19bf8
Special-case Postgres
krlmlr Apr 30, 2021
f569478
Works
krlmlr Apr 30, 2021
a5f37cf
Works now
krlmlr Apr 30, 2021
3820f7a
NA catalog
krlmlr Apr 30, 2021
f8e889b
Forgot
krlmlr Apr 30, 2021
1862cd5
Global
krlmlr Apr 30, 2021
52ce1e9
WAT
krlmlr Apr 30, 2021
92403bb
Bugfix
krlmlr Apr 30, 2021
0a8b96f
Fix PK
krlmlr Apr 30, 2021
271f1a3
Examine
krlmlr Apr 30, 2021
fa51767
Bump
krlmlr Apr 30, 2021
c1d68a2
Example script
krlmlr May 2, 2021
1d489da
Merge branch 'main' into f-learn-compound-mssql
krlmlr May 3, 2021
1668d58
Merge branch 'main' into f-learn-compound-mssql
krlmlr May 11, 2021
37dad4a
Merge branch 'main' into f-learn-compound-mssql
krlmlr Jun 20, 2021
ca81453
Merge branch 'main' into f-learn-compound-mssql
krlmlr Jul 5, 2021
fe8cce8
Get con
krlmlr Jul 5, 2021
5738c31
Merge branch 'f-new-dm2' into f-learn-compound-mssql
krlmlr Jul 5, 2021
e4dfaff
FIXME and formatting
krlmlr Jul 5, 2021
e4a0c3c
Implement filter_dm_meta()
krlmlr Jul 5, 2021
b086df7
Merge branch 'f-new-dm2' (early part) into f-learn-compound-mssql
krlmlr Jul 5, 2021
d1b2091
Merge branch 'f-new-dm2' into f-learn-compound-mssql
krlmlr Jul 6, 2021
5a2a20c
Merge branch 'main' into f-learn-compound-mssql
krlmlr Jul 8, 2021
5e725be
Merge branch 'main' into f-learn-compound-mssql
krlmlr Aug 26, 2021
0f8317e
Merge branch 'main' into f-learn-compound-mssql
krlmlr Sep 15, 2021
0b1d60d
Merge branch 'main' into f-learn-compound-mssql
krlmlr Oct 12, 2021
77c8c85
Tweaks
krlmlr Oct 12, 2021
27ae581
name_format, get tables
krlmlr Oct 13, 2021
fb715e3
pks
krlmlr Oct 13, 2021
98c0ba6
Merge branch 'main' into f-learn-compound-mssql
krlmlr Oct 13, 2021
186f367
Oops
krlmlr Oct 13, 2021
4d5e8bc
MSSQL learns from dm_meta()
krlmlr Oct 13, 2021
8cc9f4d
Merge branch 'main' into f-learn-compound-mssql
krlmlr Oct 13, 2021
42e8e1a
Globals
krlmlr Oct 13, 2021
267ba0e
Merge branch 'main' into f-learn-compound-mssql
krlmlr Oct 25, 2021
a58d96c
Work around trailing comma problem
krlmlr Jan 3, 2022
3f2bdcc
Merge branch 'main' into f-learn-compound-mssql
krlmlr Jan 3, 2022
e397b3c
Merge branch 'main' into f-learn-compound-mssql
krlmlr Mar 9, 2022
5e2a352
Merge remote-tracking branch 'origin/main' into f-learn-compound-mssql
krlmlr Mar 9, 2022
9411f8f
Auto-update from GitHub Actions
krlmlr Mar 9, 2022
240b10b
Merge branch 'main' into f-learn-compound-mssql
krlmlr Mar 14, 2022
739f857
Manual rename
krlmlr Mar 14, 2022
f7f3b51
Fix corner case
krlmlr Mar 14, 2022
3830f87
Temporarily switch to database to be learned
krlmlr Mar 14, 2022
fe545f3
Fix new_dm2() for the case of more than one foreign key to the same t…
krlmlr Mar 14, 2022
a755b16
Fix dbname default -- NA means current db
krlmlr Mar 14, 2022
a1e8cc3
Fix test
krlmlr Mar 14, 2022
974885a
Unrelated: fix warning
krlmlr Mar 14, 2022
cdd4870
Fix Postgres error
krlmlr Mar 14, 2022
c4f94b9
Skip test on Postgres for now
krlmlr Mar 14, 2022
9d6e72f
Fix test
krlmlr Mar 14, 2022
81f7f5a
Simplify
krlmlr Mar 14, 2022
270918d
Remove dead code
krlmlr Mar 14, 2022
bae9f74
Simplify new_dm2()
krlmlr Mar 14, 2022
bf4098a
Remove unused
krlmlr Mar 14, 2022
60d1f24
Move code
krlmlr Mar 14, 2022
662f4e5
Reorder
krlmlr Mar 28, 2022
ab2aada
Use unclassed type for table name
krlmlr Mar 28, 2022
f1ad921
Auto-update from GitHub Actions
krlmlr Mar 28, 2022
bffe96f
Merge branch 'main' into f-learn-compound-mssql
krlmlr Mar 31, 2022
00be1e1
Merge branch 'main' into f-learn-compound-mssql
krlmlr Apr 7, 2022
30a0652
Merge branch 'main' into f-learn-compound-mssql
krlmlr Apr 14, 2022
336fa71
Merge branch 'main' into f-learn-compound-mssql
krlmlr Apr 29, 2022
e450672
Merge branch 'main' into f-learn-compound-mssql
krlmlr Apr 30, 2022
499dbe8
vars argument
krlmlr Apr 29, 2022
b212914
Test: single db with vars
krlmlr Apr 29, 2022
d89da8e
Explicit variable names
krlmlr Apr 29, 2022
5563eba
Remove mssql_sys_all_db()
krlmlr Apr 29, 2022
5dc009e
Later
krlmlr Apr 30, 2022
1f8ea05
Merge branch 'main' into f-learn-compound-mssql
krlmlr May 24, 2022
d9845b8
Revert "Explicit variable names"
krlmlr May 28, 2022
a6ec1f6
Always collect on SQL Server
krlmlr May 28, 2022
9cccf86
Revert "Revert "Explicit variable names""
krlmlr May 28, 2022
5119330
Explicit SELECT
krlmlr May 28, 2022
424701b
Add snapshot test
krlmlr May 28, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions R/dm-from-src.R
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@ dm_from_src <- function(src = NULL, table_names = NULL, learn_keys = NULL,
src <- src_from_src_or_con(src)
con <- con_from_src_or_con(src)

# FIXME: Get rid of legacy method once it works for all

if (is.null(learn_keys) || isTRUE(learn_keys)) {
dm_learned <- dm_learn_from_db(src, ...)

Expand All @@ -76,12 +78,12 @@ dm_from_src <- function(src = NULL, table_names = NULL, learn_keys = NULL,
inform("Keys queried successfully, use `learn_keys = TRUE` to mute this message.")
}

tbls_in_dm <- src_tbls_impl(dm_learned)

if (is_null(table_names)) {
return(dm_learned)
}

tbls_in_dm <- src_tbls_impl(dm_learned)

if (!all(table_names %in% tbls_in_dm)) {
abort_tbl_access(setdiff(table_names, tbls_in_dm))
}
Expand Down Expand Up @@ -123,7 +125,9 @@ dm_from_src <- function(src = NULL, table_names = NULL, learn_keys = NULL,
}

quote_ids <- function(x, con, schema = NULL) {
if (is.null(con)) return(x)
if (is.null(con)) {
return(x)
}

if (is_null(schema)) {
map(
Expand Down
12 changes: 4 additions & 8 deletions R/dm.R
Original file line number Diff line number Diff line change
Expand Up @@ -94,23 +94,19 @@ new_dm <- function(tables = list()) {
}

new_dm2 <- function(tables = list(),
pks = structure(list(), names = character()),
fks = structure(list(), names = character()),
pks_df = tibble(table = character(), pks = list()),
fks_df = tibble(table = character(), fks = list()),
validate = TRUE) {
# Legacy
data <- unname(tables)
table <- names2(tables)

stopifnot(!is.null(names(pks)), all(names(pks) %in% table))
stopifnot(!is.null(names(fks)), all(names(fks) %in% table))
stopifnot(all(pks_df$table %in% table))
stopifnot(all(fks_df$table %in% table))

zoom <- new_zoom()
col_tracker_zoom <- new_col_tracker_zoom()

pks_df <- enframe(pks, "table", "pks")

fks_df <- enframe(fks, "table", "fks")

filters <-
tibble(
table = table,
Expand Down
31 changes: 31 additions & 0 deletions R/global.R
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,37 @@ utils::globalVariables(c(
"orders",
"trans",
#
# information_schema
"catalog",
"catalog_name",
"column_default",
"column_id",
"column_name",
"con",
"constraint_catalog",
"constraint_column_id",
"constraint_column_usage",
"constraint_name",
"constraint_schema",
"constraint_type",
"dbname",
"FIXME",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentional?

"is_nullable",
"key_column_usage",
"object_id",
"ordinal_position",
"schema_id",
"schemata",
"table_catalog",
"table_constraints",
"table_schema",
"table_type",
"tables",
"constraint_column_usage.column_name",
"constraint_column_usage.dm_name",
"key_column_usage.column_name",
"key_column_usage.dm_name",
#
# pixarfilms
"pixar_films",
"pixar_people",
Expand Down
180 changes: 126 additions & 54 deletions R/learn.R
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
#' # the `dm` from the SQLite DB
#' iris_dm_learned <- dm_learn_from_db(src_sqlite)
#' }
dm_learn_from_db <- function(dest, dbname = NULL, ...) {
dm_learn_from_db <- function(dest, dbname = NA, ...) {
# assuming that we will not try to learn from (globally) temporary tables, which do not appear in sys.table
con <- con_from_src_or_con(dest)
src <- src_from_src_or_con(dest)
Expand All @@ -41,6 +41,130 @@ dm_learn_from_db <- function(dest, dbname = NULL, ...) {
return()
}

if (!is_mssql(con)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, dm_meta() works well for PG too. We could learn from PG using the function as well in this PR.

return(dm_learn_from_db_legacy(con, dbname, ...))
}

dm_learn_from_db_meta(con, catalog = dbname, ...)
}

dm_learn_from_db_meta <- function(con, catalog = NULL, schema = NULL, name_format = "{table}") {
info <- dm_meta(con, catalog = catalog, schema = schema)

df_info <-
info %>%
dm_select_tbl(-schemata) %>%
collect()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collect() is already done at the end of dm_meta()


dm_name <-
df_info$tables %>%
select(catalog = table_catalog, schema = table_schema, table = table_name) %>%
mutate(name = glue(!!name_format)) %>%
pull() %>%
unclass() %>%
vec_as_names(repair = "unique")

from <-
df_info$tables %>%
select(catalog = table_catalog, schema = table_schema, table = table_name) %>%
pmap_chr(~ DBI::dbQuoteIdentifier(con, DBI::Id(...)))

df_key_info <-
df_info %>%
dm_zoom_to(tables) %>%
mutate(dm_name = !!dm_name, from = !!from) %>%
dm_update_zoomed() %>%
dm_zoom_to(columns) %>%
arrange(ordinal_position) %>%
select(-ordinal_position) %>%
left_join(tables) %>%
dm_update_zoomed() %>%
dm_select_tbl(constraint_column_usage, key_column_usage, columns)

table_info <-
df_key_info %>%
dm_zoom_to(columns) %>%
group_by(dm_name, from) %>%
summarize(vars = list(column_name)) %>%
ungroup() %>%
pull_tbl()

tables <- map2(table_info$from, table_info$vars, ~ tbl(con, dbplyr::ident_q(.x), vars = .y))
names(tables) <- table_info$dm_name

pks_df <-
df_key_info %>%
dm_zoom_to(key_column_usage) %>%
anti_join(constraint_column_usage) %>%
arrange(ordinal_position) %>%
dm_update_zoomed() %>%
dm_squash_to_tbl(key_column_usage) %>%
select(constraint_catalog, constraint_schema, constraint_name, dm_name, column_name) %>%
group_by(constraint_catalog, constraint_schema, constraint_name, dm_name) %>%
summarize(pks = list(tibble(column = list(column_name)))) %>%
ungroup() %>%
select(table = dm_name, pks)

fks_df <-
df_key_info %>%
dm_zoom_to(key_column_usage) %>%
left_join(columns, select = c(column_name, dm_name, table_catalog, table_schema, table_name)) %>%
dm_update_zoomed() %>%
dm_zoom_to(constraint_column_usage) %>%
left_join(columns, select = c(column_name, dm_name, table_catalog, table_schema, table_name)) %>%
dm_update_zoomed() %>%
dm_select_tbl(-columns) %>%
dm_rename(constraint_column_usage, constraint_column_usage.table_catalog = table_catalog) %>%
dm_rename(constraint_column_usage, constraint_column_usage.table_schema = table_schema) %>%
dm_rename(constraint_column_usage, constraint_column_usage.table_name = table_name) %>%
dm_rename(constraint_column_usage, constraint_column_usage.column_name = column_name) %>%
dm_rename(constraint_column_usage, constraint_column_usage.dm_name = dm_name) %>%
dm_rename(key_column_usage, key_column_usage.table_catalog = table_catalog) %>%
dm_rename(key_column_usage, key_column_usage.table_schema = table_schema) %>%
dm_rename(key_column_usage, key_column_usage.table_name = table_name) %>%
dm_rename(key_column_usage, key_column_usage.column_name = column_name) %>%
dm_rename(key_column_usage, key_column_usage.dm_name = dm_name) %>%
dm_flatten_to_tbl(constraint_column_usage) %>%
select(
constraint_catalog,
constraint_schema,
constraint_name,
ordinal_position,
ref_table = constraint_column_usage.dm_name,
ref_column = constraint_column_usage.column_name,
table = key_column_usage.dm_name,
column = key_column_usage.column_name,
) %>%
arrange(
constraint_catalog,
constraint_schema,
constraint_name,
ordinal_position,
) %>%
select(-ordinal_position) %>%
# FIXME: Where to learn this in INFORMATION_SCHEMA?
group_by(
constraint_catalog,
constraint_schema,
constraint_name,
ref_table,
) %>%
summarize(fks = list(tibble(
ref_column = list(ref_column),
table = if (length(table) > 0) table[[1]] else NA_character_,
column = list(column),
on_delete = "no_action"
))) %>%
ungroup() %>%
select(-(1:3)) %>%
group_by(table = ref_table) %>%
summarize(fks = list(bind_rows(fks))) %>%
ungroup()

new_dm2(tables, pks_df, fks_df)
}

dm_learn_from_db_legacy <- function(con, dbname, ...) {
sql <- db_learn_query(con, dbname = dbname, ...)
if (is.null(sql)) {
return()
Expand Down Expand Up @@ -74,7 +198,7 @@ dm_learn_from_db <- function(dest, dbname = NULL, ...) {

schema_if <- function(schema, table, con, dbname = NULL) {
table_sql <- DBI::dbQuoteIdentifier(con, table)
if (is_null(dbname) || dbname == "") {
if (is_null(dbname) || is.na(dbname) || dbname == "") {
if_else(
are_na(schema),
table_sql,
Expand All @@ -91,63 +215,11 @@ schema_if <- function(schema, table, con, dbname = NULL) {
}

db_learn_query <- function(dest, dbname, ...) {
if (is_mssql(dest)) {
return(mssql_learn_query(dest, dbname = dbname, ...))
}
if (is_postgres(dest)) {
return(postgres_learn_query(dest, ...))
}
}

mssql_learn_query <- function(con, schema = "dbo", dbname = NULL) { # taken directly from {datamodelr} and subsequently tweaked a little
dbname_sql <- if (is_null(dbname)) {
""
} else {
paste0(DBI::dbQuoteIdentifier(con, dbname), ".")
}
glue::glue(
"select
schemas.name as [schema],
tabs.name as [table],
cols.name as [column],
isnull(ind_col.column_id, 0) as [key],
ref_tabs.name AS ref,
ref_cols.name AS ref_col,
1 - cols.is_nullable as mandatory,
types.name as [type],
cols.max_length,
cols.precision,
cols.scale
from
{dbname_sql}sys.all_columns cols
inner join {dbname_sql}sys.tables tabs on
cols.object_id = tabs.object_id
inner join {dbname_sql}sys.schemas schemas on
tabs.schema_id = schemas.schema_id
left outer join {dbname_sql}sys.foreign_key_columns ref on
ref.parent_object_id = tabs.object_id
and ref.parent_column_id = cols.column_id
left outer join {dbname_sql}sys.indexes ind on
ind.object_id = tabs.object_id
and ind.is_primary_key = 1
left outer join {dbname_sql}sys.index_columns ind_col on
ind_col.object_id = ind.object_id
and ind_col.index_id = ind.index_id
and ind_col.column_id = cols.column_id
left outer join {dbname_sql}sys.systypes [types] on
types.xusertype = cols.system_type_id
left outer join {dbname_sql}sys.tables ref_tabs on
ref_tabs.object_id = ref.referenced_object_id
left outer join {dbname_sql}sys.all_columns ref_cols on
ref_cols.object_id = ref.referenced_object_id
and ref_cols.column_id = ref.referenced_column_id
where schemas.name = {DBI::dbQuoteString(con, schema)}
order by
tabs.create_date,
cols.column_id"
)
}

postgres_learn_query <- function(con, schema = "public", table_type = "BASE TABLE") {
sprintf(
"SELECT
Expand Down
Loading