Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Added reference blogs to hudi docs #12505

Open
wants to merge 3 commits into
base: asf-site
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions website/docs/azure_hoodie.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,8 @@ This combination works out of the box. No extra config needed.
.format("org.apache.hudi")
.load("/mountpoint/hudi-tables/customer")
```

## Related Resources

<h3>Blogs</h3>
* [How to use Apache Hudi with Databricks](https://www.onehouse.ai/blog/how-to-use-apache-hudi-with-databricks)
4 changes: 4 additions & 0 deletions website/docs/cleaning.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,10 @@ cleans run --sparkMaster local --hoodieConfigs hoodie.cleaner.policy=KEEP_LATEST
You can find more details and the relevant code for these commands in [`org.apache.hudi.cli.commands.CleansCommand`](https://github.com/apache/hudi/blob/master/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CleansCommand.java) class.

## Related Resources

<h3>Blogs</h3>
* [Cleaner and Archival in Apache Hudi](https://medium.com/@simpsons/cleaner-and-archival-in-apache-hudi-9e15b08b2933)

<h3>Videos</h3>

* [Cleaner Service: Save up to 40% on data lake storage costs | Hudi Labs](https://youtu.be/mUvRhJDoO3w)
Expand Down
5 changes: 5 additions & 0 deletions website/docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -753,3 +753,8 @@ table change-table-type COW
║ hoodie.timeline.layout.version │ 1 │ 1 ║
╚════════════════════════════════════════════════╧══════════════════════════════════════╧══════════════════════════════════════╝
```

## Related Resources

<h3>Blogs</h3>
* [Getting Started: Manage your Hudi tables with the admin Hudi-CLI tool](https://www.onehouse.ai/blog/getting-started-manage-your-hudi-tables-with-the-admin-hudi-cli-tool)
5 changes: 5 additions & 0 deletions website/docs/clustering.md
Original file line number Diff line number Diff line change
Expand Up @@ -341,6 +341,11 @@ and execution strategy `org.apache.hudi.client.clustering.run.strategy.JavaSortA
out-of-the-box. Note that as of now only linear sort is supported in Java execution strategy.

## Related Resources

<h3>Blogs</h3>
[Apache Hudi Z-Order and Hilbert Space Filling Curves](https://www.onehouse.ai/blog/apachehudi-z-order-and-hilbert-space-filling-curves)
[Hudi Z-Order and Hilbert Space-filling Curves](https://medium.com/apache-hudi-blogs/hudi-z-order-and-hilbert-space-filling-curves-68fa28bffaf0)

<h3>Videos</h3>

* [Understanding Clustering in Apache Hudi and the Benefits of Asynchronous Clustering](https://www.youtube.com/watch?v=R_sm4wlGXuE)
6 changes: 6 additions & 0 deletions website/docs/compaction.md
Original file line number Diff line number Diff line change
Expand Up @@ -226,3 +226,9 @@ Offline compaction needs to submit the Flink task on the command line. The progr
| `--seq` | `LIFO` (Optional) | The order in which compaction tasks are executed. Executing from the latest compaction plan by default. `LIFO`: executing from the latest plan. `FIFO`: executing from the oldest plan. |
| `--service` | `false` (Optional) | Whether to start a monitoring service that checks and schedules new compaction task in configured interval. |
| `--min-compaction-interval-seconds` | `600(s)` (optional) | The checking interval for service mode, by default 10 minutes. |

## Related Resources

<h3>Blogs</h3>
[Apache Hudi Compaction](https://medium.com/@simpsons/apache-hudi-compaction-6e6383790234)
[Standalone HoodieCompactor Utility](https://medium.com/@simpsons/standalone-hoodiecompactor-utility-890198e4c539)
5 changes: 5 additions & 0 deletions website/docs/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,4 +169,9 @@ The intention of merge on read table is to enable near real-time processing dire
data out to specialized systems, which may not be able to handle the data volume. There are also a few secondary side benefits to
this table such as reduced write amplification by avoiding synchronous merge of data, i.e, the amount of data written per 1 bytes of data in a batch

## Related Resources

<h3>Blogs</h3>
* [Comparing Apache Hudi's MOR and COW Tables: Use Cases from Uber and Shopee](https://www.onehouse.ai/blog/comparing-apache-hudis-mor-and-cow-tables-use-cases-from-uber-and-shopee)
* [Hudi Metafields demystified](https://www.onehouse.ai/blog/hudi-metafields-demystified)
* [File Naming conventions in Apache Hudi](https://medium.com/@simpsons/file-naming-conventions-in-apache-hudi-cd1cdd95f5e7)
5 changes: 5 additions & 0 deletions website/docs/concurrency_control.md
Original file line number Diff line number Diff line change
Expand Up @@ -333,6 +333,11 @@ If you are using the `WriteClient` API, please note that multiple writes to the
It is **NOT** recommended to use the same instance of the write client to perform multi writing.

## Related Resources

<h3>Blogs</h3>
* [Data Lakehouse Concurrency Control](https://www.onehouse.ai/blog/lakehouse-concurrency-control-are-we-too-optimistic)
* [Multi-writer support with Apache Hudi](https://medium.com/@simpsons/multi-writer-support-with-apache-hudi-e1b75dca29e6)

<h3>Videos</h3>

* [Hands on Lab with using DynamoDB as lock table for Apache Hudi Data Lakes](https://youtu.be/JP0orl9_0yQ)
Expand Down
5 changes: 5 additions & 0 deletions website/docs/indexes.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,11 @@ partition path value could change due to an update e.g users table partitioned b


## Related Resources

<h3>Blogs</h3>

* [Global vs Non-global index in Apache Hudi](https://medium.com/@simpsons/global-vs-non-global-index-in-apache-hudi-ac880b031cbc)

<h3>Videos</h3>

* [Global Bloom Index: Remove duplicates & guarantee uniquness - Hudi Labs](https://youtu.be/XlRvMFJ7g9c)
Expand Down
4 changes: 3 additions & 1 deletion website/docs/key_generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,4 +212,6 @@ Partition path generated from key generator: "04/01/2020"

## Related Resources

* [Hudi metafields demystified](https://www.onehouse.ai/blog/hudi-metafields-demystified)
<h3>Blogs</h3>
* [Hudi metafields demystified](https://www.onehouse.ai/blog/hudi-metafields-demystified)
* [Primary key and Partition Generators with Apache Hudi](https://medium.com/@simpsons/primary-key-and-partition-generators-with-apache-hudi-f0e4d71d9d26)
5 changes: 5 additions & 0 deletions website/docs/markers.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,3 +89,8 @@ with direct markers because the file system metadata is efficiently cached in me
| `hoodie.markers.timeline_server_based.batch.num_threads` | 20 | Number of threads to use for batch processing marker creation requests at the timeline server. |
| `hoodie.markers.timeline_server_based.batch.interval_ms` | 50 | The batch interval in milliseconds for marker creation batch processing. |


## Related Resources

<h3>Blogs</h3>
[Timeline Server in Apache Hudi](https://medium.com/@simpsons/timeline-server-in-apache-hudi-b5be25f85e47)
2 changes: 1 addition & 1 deletion website/docs/metadata.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,6 @@ metadata table across all writers.

## Related Resources
<h3>Blogs</h3>

* [Table service deployment models in Apache Hudi](https://medium.com/@simpsons/table-service-deployment-models-in-apache-hudi-9cfa5a44addf)
* [Multi Modal Indexing for the Data Lakehouse](https://www.onehouse.ai/blog/introducing-multi-modal-index-for-the-lakehouse-in-apache-hudi)
* [How to Optimize Performance for Your Open Data Lakehouse](https://www.onehouse.ai/blog/how-to-optimize-performance-for-your-open-data-lakehouse)
6 changes: 6 additions & 0 deletions website/docs/performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,3 +131,9 @@ To enable Data Skipping in your queries make sure to set following properties to
- `hoodie.enable.data.skipping` (to control data skipping, enabled by default)
- `hoodie.metadata.enable` (to enable metadata table use on the read path, enabled by default)
- `hoodie.metadata.index.column.stats.enable` (to enable column stats index use on the read path)

## Related Resources

<h3>Blogs</h3>
* [Hudi’s Column Stats Index and Data Skipping feature help speed up queries by an orders of magnitude!](https://www.onehouse.ai/blog/hudis-column-stats-index-and-data-skipping-feature-help-speed-up-queries-by-an-orders-of-magnitude)
* [Top 3 Things You Can Do to Get Fast Upsert Performance in Apache Hudi](https://www.onehouse.ai/blog/top-3-things-you-can-do-to-get-fast-upsert-performance-in-apache-hudi)
5 changes: 4 additions & 1 deletion website/docs/precommit_validator.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,9 @@ Hudi offers a [commit notification service](platform_services_post_commit_callba
The commit notification service can be combined with pre-commit validators to send a notification when a commit fails a validation. This is possible by passing details about the validation as a custom value to the HTTP endpoint.

## Related Resources
<h3>Videos</h3>

<h3>Blogs</h3>
* [Apply Pre-Commit Validation for Data Quality in Apache Hudi](https://www.onehouse.ai/blog/apply-pre-commit-validation-for-data-quality-in-apache-hudi)

<h3>Videos</h3>
* [Learn About Apache Hudi Pre Commit Validator with Hands on Lab](https://www.youtube.com/watch?v=KNzs9dj_Btc)
4 changes: 4 additions & 0 deletions website/docs/record_merger.md
Original file line number Diff line number Diff line change
Expand Up @@ -251,3 +251,7 @@ example, [`MySqlDebeziumAvroPayload`](https://github.com/apache/hudi/blob/e76dd1
captured via Debezium for MySQL and PostgresDB. [`AWSDmsAvroPayload`](https://github.com/apache/hudi/blob/e76dd102bcaf8aec5a932e7277ccdbfd73ce1a32/hudi-common/src/main/java/org/apache/hudi/common/model/AWSDmsAvroPayload.java) provides support for applying changes captured via Amazon Database Migration Service onto S3.
For full configurations, go [here](/docs/configurations#RECORD_PAYLOAD) and please check out [this FAQ](faq_writing_tables/#can-i-implement-my-own-logic-for-how-input-records-are-merged-with-record-on-storage) if you want to implement your own custom payloads.

## Related Resources

<h3>Blogs</h3>
* [How to define your own merge logic with Apache Hudi](https://medium.com/@simpsons/how-to-define-your-own-merge-logic-with-apache-hudi-622ee5ccab1e)
5 changes: 5 additions & 0 deletions website/docs/timeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,3 +151,8 @@ Flink jobs using the SQL can be configured through the options in WITH clause. T

Refer [here](https://hudi.apache.org/docs/next/configurations#Flink-Options) for more details.

## Related Resources

<h3>Blogs</h3>
* [Apache Hudi Timeline: Foundational pillar for ACID transactions](https://medium.com/@simpsons/hoodie-timeline-foundational-pillar-for-acid-transactions-be871399cbae)

6 changes: 6 additions & 0 deletions website/docs/writing_tables_streaming_writes.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,3 +93,9 @@ df.writeStream.format("hudi")
</Tabs
>

## Related Resources

<h3>Blogs</h3>
* [An Introduction to the Hudi and Flink Integration](https://www.onehouse.ai/blog/intro-to-hudi-and-flink)
* [Bulk Insert Sort Modes with Apache Hudi](https://medium.com/@simpsons/bulk-insert-sort-modes-with-apache-hudi-c781e77841bc)

Loading