Skip to content

Commit

Permalink
Time Series: Improve guidance, structure, layout, and wording
Browse files Browse the repository at this point in the history
- Expand canonical "time series" entry-point page
- Add dedicated time series sub-pages about:
  - Time Series Basics
  - Advanced Time Series Analysis
  - Connectivity Options
  - Video Tutorials
- Use "time series" 2-gram everywhere
- Improve page about "Industrial Data"
- Improve page about "Document Store"
- ML: Add section about "Exploratory data analysis (EDA)"
  • Loading branch information
amotl committed Feb 28, 2024
1 parent 500ed8d commit 70d84b7
Show file tree
Hide file tree
Showing 13 changed files with 761 additions and 34 deletions.
6 changes: 3 additions & 3 deletions docs/admin/sharding-partitioning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ partition as a set of shards. For each partition, the number of shards defined
by ``CLUSTERED INTO x SHARDS`` are created, when a first record with a specific
``partition key`` is inserted.

In the following example - which represents a very simple time-series use-case
In the following example - which represents a very simple time series use-case
- we added another column ``part`` that automatically generates the current
month upon insertion from the ``ts`` column. The ``part`` column is further used
as the ``partition key``.
Expand Down Expand Up @@ -132,12 +132,12 @@ Then, to calculate the number of shards, you should consider that the size of ea
shard should roughly be between 5 - 100 GB, and that each node can only manage
up to 1000 shards.

Time-series example
Time series example
-------------------

To illustrate the steps above, let's use them on behalf of an example. Imagine
you want to create a *partitioned table* on a *three-node cluster* to store
time-series data with the following assumptions:
time series data with the following assumptions:

- Inserts: 1.000 records/s
- Record size: 128 byte/record
Expand Down
3 changes: 3 additions & 0 deletions docs/domain/document/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,11 @@ Storing documents in CrateDB provides the same development convenience like the
document-oriented storage layer of Lotus Notes / Domino, CouchDB, MongoDB, and
PostgreSQL's `JSON(B)` types.

- [](inv:crate-reference#type-object)
- [](inv:cloud#object)
- [CrateDB Objects]
- [Unleashing the Power of Nested Data: Ingesting and Querying JSON Documents with SQL]


[CrateDB Objects]: https://youtu.be/aQi9MXs2irU?feature=shared
[Unleashing the Power of Nested Data: Ingesting and Querying JSON Documents with SQL]: https://youtu.be/S_RHmdz2IQM?feature=shared
137 changes: 124 additions & 13 deletions docs/domain/industrial/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# Industrial Data

Learn how to use CrateDB in industrial / IIoT / Industry 4.0 scenarios within
engineering, manufacturing, and other operational domains.
engineering, manufacturing, production, and other operational domains.

In the realm of Industrial IoT, dealing with diverse data, ranging from
slow-moving structured data, to high-frequency measurements, presents unique
Expand All @@ -15,24 +15,110 @@ The complexities of industrial big data are characterized by its high variety,
unstructured features, different data sampling rates, and how these attributes
influence data storage, retention, and integration.

Today's warehouses are complex systems with a very high degree of automation.
The key to the successful operation of these warehouses lies in having a
holistic view on the entire system based on data from various components like
sensors, PLCs, embedded controllers and software systems.

(rauch)=
## Rauch Insights

::::{info-card}

:::{grid-item}
:columns: 8

{material-outlined}`data_exploration;2em`   **Rauch: High-Speed Production Lines**

_Scaling a high-speed production environment with CrateDB._

Rauch is filling 33 cans per second and how that adds up to 400 data records
per second which are being processed, stored, and analyzed. In total, they are
within the range of one to ten billion records persisted in CrateDB.

- [Rauch: High-Speed Production Lines]

The use-case of Rauch demonstrates why traditional databases weren't capable to
deal with so many data records and unstructured data. The benefits of CrateDB
made Rauch choose it over other databases, such as PostgreSQL compatibility,
the support for unstructured data, and its excellent customer support.

:Industry: {tags-secondary}`Food` {tags-secondary}`Packaging` {tags-secondary}`Production`
:Tags: {tags-primary}`SCADA` {tags-primary}`MDE` {tags-primary}`Data Historian` {tags-primary}`Industrial IoT` {tags-primary}`PLC`
:::

:::{grid-item}  
:columns: 4

<iframe width="240" src="https://www.youtube-nocookie.com/embed/gJPmJ0uXeVs?si=J0w5yG56Ld4fIXfm" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

**Date:** 28 Jun 2022 \
**Speaker:** Arno Breuss
:::

::::


(tgw)=
## TGW Insights


::::{info-card}

:::{grid-item}
:columns: 8

{material-outlined}`inventory;2em` &nbsp; **TGW: Data acquisition in high-speed logistics**

_Storing, querying, and analyzing industrial IoT data and metadata without
much hassle._

Today's warehouses are complex systems with a very high degree of automation.

TGW Logistics Group implements key factors to the successful operation of these
warehouses, by having a holistic view on the entire system acquiring data from
various components like sensors, PLCs, embedded controllers, and software
systems.

- [TGW: Fixing data silos in a high-speed logistics environment]

TGW states that all these components can be seen as "data silos",
distributed across the entire site, each of them storing just some pieces of
information in various data structures and different ways to access it.

After trying multiple database systems, TGW Logistics moved to CrateDB for
its ability to aggregate different data formats and ability to query this
information without much hassle.

its ability to aggregate different data formats and the ability to query this
information without further ado.

:Industry: {tags-secondary}`Logistics` {tags-secondary}`Shipping`
:Tags: {tags-primary}`SCADA` {tags-primary}`MDE` {tags-primary}`Data Historian` {tags-primary}`Industrial IoT` {tags-primary}`PLC`
:::

:::{grid-item} &nbsp;
:columns: 4

<iframe width="240" src="https://www.youtube-nocookie.com/embed/6dgjVQJtSKI?si=J0w5yG56Ld4fIXfm" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

**Date:** 22 Jun 2022 \
**Speakers:** Alexander Mann, Jan Weber
:::

::::



::::{info-card}

:::{grid-item}
:columns: 8

{material-outlined}`dashboard;2em` &nbsp; **TGW: Challenges in storing and analyzing industrial data**

_Not All Time-Series Are Equal: Challenges in Storing and Analyzing Industrial Data._

In the second presentation, you will learn how TGW leverages CrateDB to build
digital twins of physical warehouses around the world.
digital twins of physical warehouses around the world, by using its unique set
of features suitable for storing and querying complex industrial big data with
high variety, unstructured features, and at different data frequencies.

- [Fixing data silos in a high-speed logistics environment]
- [Challenges of Storing and Analyzing Industrial Data]
- [CrateDB: Challenges in industrial data]
- [TGW: Storing and analyzing real-world industrial data]

**What's inside**

Expand All @@ -47,6 +133,31 @@ digital twins of physical warehouses around the world.
- Real-World Applications: Exploration of actual customer use cases to
illustrate how CrateDB can be applied in various industrial scenarios.

:Industry: {tags-secondary}`Logistics` {tags-secondary}`Shipping`
:Tags: {tags-primary}`Data Historian` {tags-primary}`Industrial IoT` {tags-primary}`Digital Twin`
:::

:::{grid-item} &nbsp;
:columns: 4

<iframe width="240" class="speakerdeck-iframe" style="border: 0px; background: rgba(0, 0, 0, 0.1) padding-box; margin: 0px; padding: 0px; border-radius: 6px; box-shadow: rgba(0, 0, 0, 0.2) 0px 5px 40px; width: 100%; height: auto; aspect-ratio: 560 / 315;" frameborder="0" src="https://speakerdeck.com/player/acb78531a07e4238ac662539b0c23609" title=" Not all time-series are equal ​ Challenges of storing and analyzing industrial data" allowfullscreen="true" data-ratio="1.7777777777777777"></iframe>

**Date:** 23 Nov 2022 \
**Speaker:** Marija Selakovic


<iframe width="240" src="https://www.youtube-nocookie.com/embed/ugQvihToY0k?si=J0w5yG56Ld4fIXfm" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

**Date:** 5 Oct 2023 \
**Speakers:** Alexander Mann, Georg Traar
:::

::::




[Challenges of Storing and Analyzing Industrial Data]: https://youtu.be/ugQvihToY0k?feature=shared
[Fixing data silos in a high-speed logistics environment]: https://youtu.be/6dgjVQJtSKI?feature=shared
[CrateDB: Challenges in industrial data]: https://speakerdeck.com/cratedb/not-all-time-series-are-equal-challenges-of-storing-and-analyzing-industrial-data
[Rauch: High-Speed Production Lines]: https://youtu.be/gJPmJ0uXeVs?feature=shared
[TGW: Fixing data silos in a high-speed logistics environment]: https://youtu.be/6dgjVQJtSKI?feature=shared
[TGW: Storing and analyzing real-world industrial data]: https://youtu.be/ugQvihToY0k?feature=shared
Loading

0 comments on commit 70d84b7

Please sign in to comment.