diff --git a/docs/features/collaborative_data_profiling.md b/docs/features/collaborative_data_profiling.md index 2453f62f5..2d6496493 100644 --- a/docs/features/collaborative_data_profiling.md +++ b/docs/features/collaborative_data_profiling.md @@ -1,16 +1,19 @@ -# Data Catalog - A collaborative experience to profile datasets & relational databases +# Data Catalog ** +A collaborative experience to profile datasets & relational databases -!!! note "Data Catalog with data quality profiling" +!!! info "** YData's Enterprise feature" + + This feature is only available for users of [YData Fabric](https://ydata.ai). - [Sign-up Fabric community](http://ydata.ai/register?utm_source=ydata-profiling&utm_medium=documentation&utm_campaign=YData%20Fabric%20Community) to try the **data catalog** - and **collaborative** experience for datasets and database profiling at scale! + [Sign-up Fabric community](http://ydata.ai/register?utm_source=ydata-profiling&utm_medium=documentation&utm_campaign=YData%20Fabric%20Community) to try the **Data catalog** [YData Fabric](https://ydata.ai/products/fabric) is a Data-Centric AI development platform. YData Fabric provides all capabilities of ydata-profiling in a hosted environment combined with a guided UI experience. -[Fabric's Data Catalog](https://ydata.ai/products/data_catalog) +[Fabric's Data Catalog](https://ydata.ai/products/data_catalog), +a scalable and interactive version of ydata-profiling, provides a comprehensive and powerful tool designed to enable data professionals, including data scientists and data engineers, to manage and understand data within an organization. The Data Catalog act as a diff --git a/docs/features/pii_identification_management.md b/docs/features/pii_identification_management.md new file mode 100644 index 000000000..6a630c645 --- /dev/null +++ b/docs/features/pii_identification_management.md @@ -0,0 +1,59 @@ +# Personally identifiable information (PII) identification & management ** + +!!! info "** YData's Enterprise feature" + + This feature is only available for users of [YData Fabric](https://ydata.ai). + + [Sign-up Fabric community](http://ydata.ai/register?utm_source=ydata-profiling&utm_medium=documentation&utm_campaign=YData%20Fabric%20Community) and + start your journey into **data management** with automated PII identification. + +Personal Identifiable Information **(PII)** refers to any information that can be used to identify an individual. +This includes but is not limited to, names, addresses, phone numbers, social security numbers, email addresses, +and financial information. PII is crucial in today's digital age, where data is extensively collected, stored, +and processed. + +[YData Fabric Data Catalog](https://ydata.ai/products/data_catalog), a scalable and interactive version of ydata-profiling, +integrates into the data profiling experience, an advanced machine learning solutions based on a Named Entity Recognition (NER) model +combine with traditional rule-based patterns identification, allowing to efficiently detect PII. + +:fontawesome-brands-youtube:{ .youtube } +See Fabric's Data Catalog PII identification in action. + +## Why Fabric Catalog automated PII identification? + +The relevance of automating the identification of PII lies in the need to protect individuals' privacy and comply +with various data protection regulations. Mishandling or unauthorized access to PII can lead to severe consequences +such as identity theft, financial fraud, and breaches of privacy. With the increasing volume of data generated manual +identification of PII becomes impractical and error-prone. + +Additionally, having a robust PII management solution is essential for organizations to establish and maintain +a secure approach to handling sensitive information, fostering trust and adhering to legal requirements. + +## Why Fabric to manage dataset PII identification + +Besides automated PII identification, *Fabric Catalog* offers several key benefits in the content of data governance, +privacy compliance and overall data management, through automated data profiling and metadata management: + +### Compliance with Privacy Regulations: +Many countries and regions have stringent data protection regulations (such as GDPR, CCPA, or HIPAA) +that require organizations to handle PII responsibly. A dedicated platform ensures that PII is correctly classified, +helping organizations comply with legal requirements and avoid potential penalties. + +### Data Profiling for Accuracy: + +Data profiling involves analyzing and understanding the structure and content of data. By incorporating data profiling +capabilities into the platform, organizations can ensure accurate identification and classification of PII. +This helps in maintaining the integrity of data and reduces the risk of misclassifications. + +### Efficient Management of PII: +As the volume of data continues to grow, manually managing and editing PII classifications becomes impractical. +A platform streamlines this process, making it more efficient and reducing the likelihood of errors. +It allows organizations to keep track of PII across various datasets and systems. + +### Facilitating Data Governance: + +Data governance involves establishing policies and processes to ensure high data quality, security, and compliance. +A PII management solution enhances data governance efforts by providing a centralized hub for overseeing PII classifications, +metadata, and related policies. + + diff --git a/docs/features/sensitive_data.md b/docs/features/sensitive_data.md index 4d9cef233..467ba691c 100644 --- a/docs/features/sensitive_data.md +++ b/docs/features/sensitive_data.md @@ -56,3 +56,7 @@ pd.read_csv("filename.csv", dtype={"phone": str}) Note that the type detection is hard. That is why [visions](https://github.com/dylan-profiler/visions), a type system to help developers solve these cases, was developed. + +## Automated PII classification & management + +You can find more details about this feature [here](pii_identification_management.md). \ No newline at end of file diff --git a/docs/index.md b/docs/index.md index f478d67d3..edf187468 100644 --- a/docs/index.md +++ b/docs/index.md @@ -9,14 +9,15 @@ understanding and preparing data for analysis in a single line of code! If you'r !!! tip "Advent of Code - Get featured on ydata-profiling" - *“I want to get into open source, but I don’t know how.”* - Does this sound familiar to you? Have you been wanting to get more involved with open-source software, but no one’s given you an entry point? + *“I want to get into open source, but I don’t know how.”* - Does this sound familiar to you? Have you been wanting to + get more involved with open-source software, but no one’s given you an entry point? That's why we joined [The Advent of code this year](https://zilliz.com/advent-of-code). Contribute to ydata-profiling and win some 🐼🐼 swag! How can you be part of it? - Give us some love with a Github ⭐ - - Write an article or create a tutorial like other [members the communit already did.](https://medium.com/@seckindinc/data-profiling-with-python-36497d3a1261) + - Write an article or create a tutorial like other [members the community already did.](https://medium.com/@seckindinc/data-profiling-with-python-36497d3a1261) - Feeling adventurous? Contribute with a PR. We have a list of [great issues to get you started.](https://github.com/ydataai/ydata-profiling/issues?q=label%3A%22getting+started+%E2%98%9D%22+) ![ydata-profiling report](_static/img/ydata-profiling.gif) @@ -55,15 +56,16 @@ YData-profiling can be used to deliver a variety of different applications. The Check out the [free Community Version](http://ydata.ai/register?utm_source=ydata-profiling&utm_medium=documentation&utm_campaign=YData%20Fabric%20Community). -| Features & functionalities | Description | -|------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------| -| [Comparing datasets](features/comparing_datasets.md) | Comparing multiple version of the same dataset | -| [Profiling a Time-Series dataset](features/time_series_datasets.md) | Generating a report for a time-series dataset with a single line of code | -| [Profiling large datasets](features/big_data.md) | Tips on how to prepare data and configure `ydata-profiling` for working with large datasets | -| [Handling sensitive data](features/sensitive_data.md) | Generating reports which are mindful about sensitive data in the input dataset | -| [Dataset metadata and data dictionaries](features/metadata.md) | Complementing the report with dataset details and column-specific data dictionaries | -| [Customizing the report's appearance](features/custom_report_appearance.md ) | Changing the appearance of the report's page and of the contained visualizations | -| [Profiling Databases **](features/collaborative_data_profiling.md) | For a seamless profiling experience in your organization's databases, check [Fabric Data Catalog](https://ydata.ai/products/data_catalog), which allows to consume data from different types of storages such as RDBMs (Azure SQL, PostGreSQL, Oracle, etc.) and object storages (Google Cloud Storage, AWS S3, Snowflake, etc.), among others. | +| Features & functionalities | Description | +|----------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------| +| [Comparing datasets](features/comparing_datasets.md) | Comparing multiple version of the same dataset | +| [Profiling a Time-Series dataset](features/time_series_datasets.md) | Generating a report for a time-series dataset with a single line of code | +| [Profiling large datasets](features/big_data.md) | Tips on how to prepare data and configure `ydata-profiling` for working with large datasets | +| [Handling sensitive data](features/sensitive_data.md) | Generating reports which are mindful about sensitive data in the input dataset | +| [Dataset metadata and data dictionaries](features/metadata.md) | Complementing the report with dataset details and column-specific data dictionaries | +| [Customizing the report's appearance](features/custom_report_appearance.md ) | Changing the appearance of the report's page and of the contained visualizations | +| [Profiling Relational databases **](features/collaborative_data_profiling.md) | For a seamless profiling experience in your organization's databases, check [Fabric Data Catalog](https://ydata.ai/products/data_catalog), which allows to consume data from different types of storages such as RDBMs (Azure SQL, PostGreSQL, Oracle, etc.) and object storages (Google Cloud Storage, AWS S3, Snowflake, etc.), among others. | +| [PII classification & management **](features/pii_identification_management.md ) | Automated PII classification and management through an UI experience | ### Tutorials diff --git a/mkdocs.yml b/mkdocs.yml index cab5f2351..b725412d3 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -15,6 +15,7 @@ nav: - Dataset metadata: 'features/metadata.md' - Datasets catalog **: 'features/collaborative_data_profiling.md' - Sensitive data: 'features/sensitive_data.md' + - Automated PII classification & management **: 'features/pii_identification_management.md' - Time-series: 'features/time_series_datasets.md' - Comparing datasets: 'features/comparing_datasets.md' - Big data: 'features/big_data.md'