Skip to content

Commit

Permalink
docs: add database warnings information (#131)
Browse files Browse the repository at this point in the history
* docs: add database warnings information

* fix(linting): code formatting

---------

Co-authored-by: Azory YData Bot <[email protected]>
  • Loading branch information
fabclmnt and azory-ydata authored Sep 17, 2024
1 parent 9421c3d commit 3197cce
Show file tree
Hide file tree
Showing 2 changed files with 54 additions and 2 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
56 changes: 54 additions & 2 deletions docs/data_catalog/datasources/warnings.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
The first technical step in any data science project is to examine the data and understand its quality, value and fitness for purpose. For this reason, Fabric’s Data Catalog includes an **Overview and Warnings** module for a better understanding of the available datasets.

## Overview
## Datasets

### Overview
When clicking on a **Dataset** available from the **Data Catalog**, it will show its details page, revealing an **Overview and Warnings** section.

<figure markdown>
Expand All @@ -19,7 +21,7 @@ In the **Overview**, you’ll get an overall perspective of your dataset’s cha
- Main data quality warnings


## Data Quality Warnings
### Data Quality Warnings
To enable data-centric development, Fabric **automatically detects and signals potential data quality warnings**. Warnings highlight certain peculiarities of data that might require further investigation prior to model development and deployment. However, *the validity of each issued warning and whether follow-up mitigation work is needed will depend on the specific use case and on domain knowledge*.

<figure markdown>
Expand Down Expand Up @@ -47,3 +49,53 @@ Fabric further enables the **interactive exploration of warnings**, filtering ov
<figure markdown>
![Explore Warnings](../../assets/data_catalog/explore_warnings.png){: style="height:630px;width:700px"}
</figure>

## Databases

When users create a database in **YData Fabric's Data Catalog**, they gain access to a powerful suite of tools designed to help them manage and understand
the structure of their data. The **Data Catalog** provides a comprehensive view of each database, offering detailed insights into the schema and data quality.
Here are the key features users can expect:

### Schema Overview with Database-Specific Warnings

As soon as a database is added to the **Fabric Data Catalog**, users are presented with a complete overview of the schema.
This includes a visual representation of the tables, columns, and relationships that exist within the database. In addition to the schema visualization,
Fabric automatically scans the database for potential issues and displays database-specific warnings.

<figure markdown>
![Database shcema overview](../../assets/data_catalog/database_schema_overview.png){: style="height:630px;width:700px"}
</figure>

These warnings help users identify problems that could impact data integrity, such as:

- **Self-references** (columns that act as both primary and foreign keys)
- Self-referencing columns, where a column serves as both a primary key and a foreign key in the same table, can create complexities for synthetic data generation.
YData Fabric detects self-references and warns users when this relationship might lead to synthetic data inconsistency or improper referential integrity.
The platform suggests creating a temporary schema or breaking these references into a more manageable structure to ensure clarity and accuracy in data generation.
- **Independent tables** (tables with no defined relations)
- Tables without relationships to other tables—referred to as independent tables—can result in isolated synthetic data that lacks the interdependencies
present in the original data. YData Fabric flags such tables to alert users that these isolated data structures may need further inspection or modeling
to ensure they align with the overall data environment.
- **Schemas with no defined relations** (schemas missing foreign key constraints)
- When a schema lacks defined relationships between tables, YData Fabric issues warnings to alert users of the absence of foreign key constraints or
other relational ties. This warning is critical, as generating synthetic data without considering relationships can lead to inaccurate and fragmented datasets.
Users are encouraged to define necessary relations or clarify dependencies to improve the quality of the synthetic data output.
- **Circular references** (tables involved in a loop of dependencies)
- Circular references occur when tables are interdependent in a closed loop (e.g., Table A references Table B, and Table B references Table A). These can cause
significant complications during synthetic data generation, especially in maintaining referential integrity across the cycle.
YData Fabric detects these loops and provides guidance on how to restructure the schema, such as breaking the cycle or temporarily isolating the tables,
to avoid generating erroneous data.
- **Indirect relations between tables** (complex chains of relationships)
- YData Fabric also identifies indirect relationships between tables—where two or more tables are connected via intermediary tables or columns.
These complex relationships can introduce nuances that might not be immediately obvious during data modeling. The platform issues warnings to ensure that
indirect relationships are clearly understood and accurately represented in synthetic data generation, preventing the loss of valuable data linkages.

This automatic detection ensures that users can proactively address any schema complexities before they negatively impact data queries or synthetic data generation.

### Table-Level Navigation with Detailed Warnings
Similarly to datasets that are constituted by a single table, for each database table YData Fabric provides users with the ability to navigate through each table within the database individually.
For every table, users can view a detailed overview that includes:

- Column names and types,
- Statistical overview
- Warnings (see the previous section for more details)

0 comments on commit 3197cce

Please sign in to comment.