Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evolve "Features" and "Application Domains" sections #53

Merged
merged 67 commits into from
Jul 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
92fb6db
Industrial Data and Time Series Data: Improve guidance and structure
amotl Feb 28, 2024
a2d98cb
Raw-Data Analytics: Add use case with Bitmovin
amotl Mar 2, 2024
da365a0
Industrial Data: Add use cases with ABB and TGW
amotl Mar 2, 2024
dbb4297
Industrial Data: Add "SPGo! Insights"
amotl Mar 2, 2024
333da72
Feature: Refactor pages from "Application Domains", add new ones
amotl Mar 3, 2024
039e162
Feature: Add guiding content to "Document Store" page
amotl Mar 3, 2024
b809cb3
Chore: Improve wording and syntax, refactor custom CSS styles
amotl Mar 2, 2024
867499a
Raw-Data Analytics: Add item to main index page
amotl Mar 2, 2024
90986c0
Guidance: Improve main page about "Application Domains"
amotl Mar 2, 2024
1bbf932
Guidance: Improve "Features" and "Application Domains" pages
amotl Mar 3, 2024
6f6f320
Application Domain: Add page about "Metrics and Telemetry Data Store"
amotl Mar 3, 2024
6196480
Domain: Improve page about "Raw-Data Analytics"
amotl Mar 3, 2024
41641a8
Domain: Improve page about "Industrial Data"
amotl Mar 3, 2024
a6df79b
Integrate: Information about "Prometheus Adapter" has been refactored
amotl Mar 3, 2024
0c4b91d
Chore: Update badge color for "Open on GitHub"
amotl Mar 4, 2024
58d00c5
Chore: Improve hyphenation and fix syntax
amotl Mar 4, 2024
43f51c8
Domain: Rename "Metrics Store" to "Telemetry Data Store"
amotl Mar 4, 2024
ac14281
Time Series: Improve "Index" and "Advanced" pages
amotl Mar 4, 2024
4e5db88
Feature: Add page about "Relational / JSON" capabilities
amotl Mar 4, 2024
419f765
Getting Started: Absorb page about "Database Connectivity"
amotl Mar 4, 2024
93c97b3
Feature: Absorb "Connectivity" page from "Getting Started" section
amotl Mar 5, 2024
0dcb5bb
Feature: Improve "Document Store" page
amotl Mar 5, 2024
b211567
Features: Add dedicated pages about...
amotl Mar 5, 2024
59617ed
Feature: Populate page about "SQL"
amotl Mar 5, 2024
6f0e8f9
Chore: s/time-series/time series/
amotl Mar 10, 2024
239e7d4
Chore: s/timeseries/time series/
amotl Mar 10, 2024
891ebd0
Time Series: Use substitutions for link badges
amotl Mar 12, 2024
f4d95dd
Time Series: Improve layout, make it responsive
amotl Mar 12, 2024
e23fc6d
Feature: Start writing about "Advanced Querying"
amotl Mar 13, 2024
ed10681
This and that: Wording improvements, layout adjustments, and cleanups
amotl Mar 13, 2024
6d03f0c
Time Series: Add dedicated page about "Long Term Storage"
amotl Mar 13, 2024
cd75a18
Integrations vs. Time Series: Generalize "Visualization" elements
amotl Mar 13, 2024
4d2334d
Chore: Remove speaker names and exact dates
amotl Mar 13, 2024
5f0c7c7
Features: Hide pages which are not ready yet
amotl Mar 13, 2024
9868535
Feature / Advanced Querying: Add "Time Bucketing" example
amotl Mar 13, 2024
805e269
Feature / Relational: Implement page
amotl Mar 13, 2024
38ee522
Chore: Refactor a few redundant external link references
amotl Mar 13, 2024
ceee618
Feature / Vector Store: Implement page
amotl Mar 13, 2024
7304fbb
Machine Learning: Add video recording of FOSDEM 2024 talk
amotl Mar 14, 2024
cb8d34f
Chore: HTML responsiveness across the board
amotl Mar 14, 2024
f7480da
Feature / Search: Implement page
amotl Mar 14, 2024
7f190c6
Feature / SQL: Reasonably finish page by ...
amotl Mar 15, 2024
8257e4a
Chore: Fixups after integration tutorials from cloud-docs
amotl May 7, 2024
77f03cd
Chore: Fixups after releasing crate-docs-theme 0.32.0
amotl May 7, 2024
bffc75f
Feature / Geospatial: Reasonably finish page
amotl Jul 3, 2024
e763914
BI: Add information about Rill and Tableau
amotl Jul 3, 2024
61ff483
Chore: Generalize / clean up per-page CSS styles
amotl Jul 3, 2024
30a4acd
Feature / BLOB: Add minimal version of page
amotl Jul 3, 2024
9363ab1
Chore: Remove `language` attribute from `conf.py`
amotl Jul 3, 2024
7a91864
Shortcuts: Add `readmore` badge
amotl Jul 3, 2024
91f0306
Chore: Update .gitignore and backlog.md
amotl Jul 3, 2024
529b9d0
Feature / Clustering: Implement page
amotl Jul 5, 2024
e8514f4
Feature / Snapshots: Add minimal version of page
amotl Jul 20, 2024
0e6e5e8
Feature / Cloud Native: Add minimal version of page
amotl Jul 20, 2024
220f5a0
Feature / Storage: Add minimal version of page
amotl Jul 20, 2024
a4ca0dc
Feature / Index: Add minimal version of page
amotl Jul 20, 2024
889eae1
Feature / FDW: Add minimal version of page
amotl Jul 20, 2024
e542458
Feature / Generated Columns: Add minimal version of page
amotl Jul 20, 2024
1d55fdf
Feature / CCR: Add minimal version of page
amotl Jul 20, 2024
44d3d24
Feature / UDF: Add minimal version of page
amotl Jul 20, 2024
316dd11
Feature / Cursor: Add minimal version of page
amotl Jul 20, 2024
f7b47dc
Feature: Improve guidance into new documentation section
amotl Jul 20, 2024
17bf5ed
Theme: Use modernized crate-docs-theme
amotl Jul 20, 2024
1352cf0
Theme: Use modernized crate-docs-theme
amotl Jul 20, 2024
9ee24ef
Features: This and that
amotl Jul 20, 2024
d718397
Clustering: Advertise recommended shard size of 5-50 GB
amotl Jul 20, 2024
b138e84
Feature / Index: Remove unnecessary 3rd party link, shorten quote
amotl Jul 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
__pycache__
_out/
.build
.clone
Expand Down
1 change: 1 addition & 0 deletions backlog.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
- Rework container/index
- Update HTTP links, use Sphinx references instead
- Update remaining links from crate.io to cratedb.com
- Gallery: https://python.arviz.org/en/stable/examples/

## Iteration +2
- Render Jupyter Notebooks?
Expand Down
28 changes: 28 additions & 0 deletions docs/_include/card/timeseries-datashader.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
::::{info-card}

:::{grid-item}
:columns: auto 9 9 9
**Display millions of data points using hvPlot, Datashader, and CrateDB**

[HoloViews] and [Datashader] frameworks enable channeling millions of data
points from your backend systems to the browser's glass.

This notebook plots the venerable NYC Taxi dataset after importing it
into a CrateDB Cloud database cluster.

🚧 _Please note this notebook is a work in progress._ 🚧

{{ '{}[cloud-datashader-github]'.format(nb_github) }} {{ '{}[cloud-datashader-colab]'.format(nb_colab) }}
:::

:::{grid-item}
:columns: 3
{tags-primary}`Time series visualization`

{tags-secondary}`Python`
{tags-secondary}`HoloViews`
{tags-secondary}`hvPlot`
{tags-secondary}`Datashader`
:::

::::
27 changes: 27 additions & 0 deletions docs/_include/card/timeseries-explore.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
::::{info-card}

:::{grid-item}
:columns: auto 9 9 9
**CrateDB for Time Series Modeling, Exploration, and Visualization**

Access time series data from CrateDB via SQL, load it into pandas DataFrames,
and visualize it using Plotly.

About advanced time series operations in SQL, like aggregations, window
functions, interpolation of missing data, common table expressions, moving
averages, relational JOINs, and the handling of JSON data.

{{ '{}[timeseries-queries-and-visualization-github]'.format(nb_github) }} {{ '{}[timeseries-queries-and-visualization-colab]'.format(nb_colab) }}
:::

:::{grid-item}
:columns: 3
{tags-primary}`Time series visualization`

{tags-secondary}`Python`
{tags-secondary}`pandas`
{tags-secondary}`Plotly`
{tags-secondary}`Dash`
:::

::::
24 changes: 24 additions & 0 deletions docs/_include/links.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
[cloud-datashader-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/amo/cloud-datashader/topic/timeseries/explore/cloud-datashader.ipynb
[cloud-datashader-github]: https://github.com/crate/cratedb-examples/blob/amo/cloud-datashader/topic/timeseries/explore/cloud-datashader.ipynb
[Datashader]: https://datashader.org/
[Dynamic Database Schemas]: https://cratedb.com/product/features/dynamic-schemas
[Geospatial Data Model]: https://cratedb.com/data-model/geospatial
[Geospatial Database]: https://cratedb.com/geospatial-spatial-database
[HoloViews]: https://www.holoviews.org/
[Indexing, Columnar Storage, and Aggregations]: https://cratedb.com/product/features/indexing-columnar-storage-aggregations
[JSON Database]: https://cratedb.com/solutions/json-database
[LangChain and CrateDB: Code Examples]: https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llm-langchain
[langchain-similarity-binder]: https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Fllm-langchain%2Fvector_search.ipynb
[langchain-similarity-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/vector_search.ipynb
[langchain-similarity-github]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/vector_search.ipynb
[langchain-rag-sql-binder]: https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Fllm-langchain%2Fcratedb-vectorstore-rag-openai-sql.ipynb
[langchain-rag-sql-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/cratedb-vectorstore-rag-openai-sql.ipynb
[langchain-rag-sql-github]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/cratedb-vectorstore-rag-openai-sql.ipynb
[Multi-model Database]: https://cratedb.com/solutions/multi-model-database
[Nested Data Structure]: https://cratedb.com/product/features/nested-data-structure
[query DSL based on JSON]: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html
[Relational Database]: https://cratedb.com/solutions/relational-database
[timeseries-queries-and-visualization-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/timeseries/timeseries-queries-and-visualization.ipynb
[timeseries-queries-and-visualization-github]: https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/timeseries-queries-and-visualization.ipynb
[Vector Database (Product)]: https://cratedb.com/solutions/vector-database
[Vector Database]: https://en.wikipedia.org/wiki/Vector_database
49 changes: 49 additions & 0 deletions docs/_include/styles.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
<!--
Custom styles for efficient tile layouts using card elements.

TODO: Upstream to crate-docs-theme.
-->
<style>
/* General */
/*
.sd-card-body {
line-height: 1.1em;
}
*/
.sd-card-footer {
font-size: small;
}

/* No margins for images in tight layouts, e.g. badges */
/* Needed for domain/timeseries/advanced.md */
.wrapper-content-right img {
margin-bottom: 0 !important;
}


/* Document Store */
.wrapper-content-right ul {
margin-left: 0;
}
.rubric-slimmer p.rubric {
margin-bottom: 0.25em;
}
.rubric-slim p.rubric {
margin-bottom: 0;
}
.title-slim .sd-col > * {
margin-top: 0;
margin-bottom: 0;
}
.no-margin > * {
margin-top: 0 !important;
margin-bottom: 0 !important;;
}


/* Cards with Links */
.sd-hide-link-text {
height: 0;
}

</style>
2 changes: 1 addition & 1 deletion docs/admin/clustering/index.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. _clustering:
.. _admin-clustering:

==========
Clustering
Expand Down
18 changes: 9 additions & 9 deletions docs/admin/sharding-partitioning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ partition as a set of shards. For each partition, the number of shards defined
by ``CLUSTERED INTO x SHARDS`` are created, when a first record with a specific
``partition key`` is inserted.

In the following example - which represents a very simple time-series use-case
In the following example - which represents a very simple time series use-case
- we added another column ``part`` that automatically generates the current
month upon insertion from the ``ts`` column. The ``part`` column is further used
as the ``partition key``.
Expand Down Expand Up @@ -111,7 +111,7 @@ cluster.
Over-sharding and over-partitioning are common flaws leading to an overall
poor performance.

**As a rule of thumb, a single shard should hold somewhere between 5 - 100
**As a rule of thumb, a single shard should hold somewhere between 5 - 50
GB of data.**

To avoid oversharding, CrateDB by default limits the number of shards per
Expand All @@ -129,15 +129,15 @@ benchmarks across various strategies. The following steps provide a general guid
- Calculate the throughput

Then, to calculate the number of shards, you should consider that the size of each
shard should roughly be between 5 - 100 GB, and that each node can only manage
shard should roughly be between 5 - 50 GB, and that each node can only manage
up to 1000 shards.

Time-series example
Time series example
-------------------

To illustrate the steps above, let's use them on behalf of an example. Imagine
you want to create a *partitioned table* on a *three-node cluster* to store
time-series data with the following assumptions:
time series data with the following assumptions:

- Inserts: 1.000 records/s
- Record size: 128 byte/record
Expand All @@ -146,12 +146,12 @@ time-series data with the following assumptions:
Given the daily throughput is around 10 GB/day, the monthly throughput is 30 times
that (~ 300 GB). The partition column can be day, week, month, quarter, etc. So,
assuming a monthly partition, the next step is to calculate the number of shards
with the **shard size recommendation** (5 - 100 GB) and the **number of nodes** in
with the **shard size recommendation** (5 - 50 GB) and the **number of nodes** in
the cluster in mind.

With three shards, each shard will hold 100 GB (300 GB / 3 shards), which is too
close to the upper limit. With six shards, each shard will manage 50 GB
(300 GB / 6 shards) of data, which is closer to the recommended size range (5 - 100 GB).
With three shards, each shard would hold 100 GB (300 GB / 3 shards), which is above
the upper limit. With six shards, each shard will manage 50 GB (300 GB / 6 shards)
of data, which is right on the spot.

.. code-block:: psql

Expand Down
6 changes: 6 additions & 0 deletions docs/admin/troubleshooting/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,12 @@ infrastructure operations. For example:
- Clean up stale node data.
:::

:::{card} {material-outlined}`wysiwyg;1.6em` About `crash`
:link: https://cratedb.com/docs/crate/crash/en/latest/troubleshooting.html
Troubleshooting the CLI program `crash`.
:::


:::{note}
You can find a lot of troubleshooting guides that explain how to perform
diagnostics on Java applications.
Expand Down
1 change: 1 addition & 0 deletions docs/admin/upgrade/index.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
(upgrading)=
# Upgrading

Guidelines about upgrading CrateDB clusters.
Expand Down
21 changes: 21 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,30 @@
]

# Configure intersphinx.
if "sphinx.ext.intersphinx" not in extensions:
extensions += ["sphinx.ext.intersphinx"]

if "intersphinx_mapping" not in globals():
intersphinx_mapping = {}

intersphinx_mapping.update({
'ctk': ('https://cratedb-toolkit.readthedocs.io/', None),
'matplotlib': ('https://matplotlib.org/stable/', None),
'pandas': ('https://pandas.pydata.org/pandas-docs/stable/', None),
'numpy': ('https://numpy.org/doc/stable/', None),
})


# Configure substitutions.
if "myst_substitutions" not in globals():
myst_substitutions = {}

myst_substitutions.update({
"nb_colab": "[![Notebook on Colab](https://img.shields.io/badge/Open-Notebook%20on%20Colab-blue?logo=Google%20Colab)]",
"nb_binder": "[![Notebook on Binder](https://img.shields.io/badge/Open-Notebook%20on%20Binder-lightblue?logo=binder)]",
"nb_github": "[![Notebook on GitHub](https://img.shields.io/badge/Open-Notebook%20on%20GitHub-darkgreen?logo=GitHub)]",
"readme_github": "[![README](https://img.shields.io/badge/Open-README-darkblue?logo=GitHub)]",
"blog": "[![Blog](https://img.shields.io/badge/Open-Blog-darkblue?logo=Markdown)]",
"tutorial": "[![Navigate to Tutorial](https://img.shields.io/badge/Navigate%20to-Tutorial-darkcyan?logo=Markdown)]",
"readmore": "[![Read More](https://img.shields.io/badge/Read-More-darkyellow?logo=Markdown)]",
})
97 changes: 97 additions & 0 deletions docs/domain/analytics/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
(analytics)=
# Raw-Data Analytics

**CrateDB provides real-time analytics on raw data stored for the long term**

In all domains of real-time analytics where you absolutely must have access to all
the records, and can't live with any down-sampled variants, because records are
unique, and need to be accounted for within your analytics queries.

If you find yourself in such a situation, you need a storage system which
manages all the high-volume data in its hot zone, to be available right on
your fingertips, for live querying. Batch jobs to roll up raw data into
analytical results are not an option, because users' queries are too
individual, so you need to run them on real data in real time.

With CrateDB, compatible to PostgreSQL, you can do all of that using plain SQL.
Other than integrating well with commodity systems using standard database
access interfaces like ODBC or JDBC, it provides a proprietary HTTP interface
on top.

:Tags:
{tags-primary}`Analytics`
{tags-primary}`Long Term Storage`

:Related:
[](#timeseries) •
[](#timeseries-longterm) •
[](#machine-learning)

:Product:
[Real-time Analytics Database]


(bitmovin)=
## Bitmovin Insights

Multi tenant data analytics on top of billions of records.

> CrateDB enables use cases we couldn't satisfy with other
database systems, also with databases which are even stronger
focused on the time series domain.
>
> CrateDB is not your normal database!
>
> <small>-- Daniel Hölbling-Inzko, Director of Engineering Analytics, Bitmovin</small>

:Industry:
{tags-secondary}`Broadcasting`
{tags-secondary}`Media Transcoding`
{tags-secondary}`Streaming Media`

:Tags:
{tags-primary}`Event Tracking`
{tags-primary}`Real-Time Analytics`
{tags-primary}`Multi Tenancy`
{tags-primary}`SaaS`

:Related:
[CrateDB provides the backbone of Bitmovin's real-time video analytics platform] \
[How Bitmovin uses CrateDB to monitor the biggest live video events]


::::{info-card}

:::{grid-item}
:columns: 8

{material-outlined}`analytics;2em` &nbsp; **Bitmovin: Real-Time Analytics**

Bitmovin, as a leader in video codec algorithms and as a web-based video
stream broadcasting provider, produces billions of rows of data and stores
them in CrateDB, allowing their customers to do analytics on it.

One of their product's subsystems, a video analytics component, required to
serve real-time analytics on very large and fast-moving data, so they needed
to find a performing database at the right cost.

- [Bitmovin: Improving the Streaming Experience with Real-Time Analytics]

The use-case of Bitmovin illustrates why traditional databases weren't capable
to deal with so many data records and keep them all available for querying in
real time.
:::

:::{grid-item}
:columns: 4

<iframe width="240" src="https://www.youtube-nocookie.com/embed/4BPApD0Piyc?si=J0w5yG56Ld4fIXfm" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
:::

::::


[Bitmovin: Improving the Streaming Experience with Real-Time Analytics]: https://youtu.be/4BPApD0Piyc?feature=shared
[CrateDB provides the backbone of Bitmovin's real-time video analytics platform]: https://cratedb.com/customers/bitmovin
[How Bitmovin uses CrateDB to monitor the biggest live video events]: https://youtu.be/IR6hokaYv5g?feature=shared
[Real-time Analytics Database]: https://cratedb.com/solutions/real-time-analytics-database
23 changes: 0 additions & 23 deletions docs/domain/document/index.md

This file was deleted.

Loading