diff --git a/docs/admin/sharding-partitioning.rst b/docs/admin/sharding-partitioning.rst index f525650e..57d0855d 100644 --- a/docs/admin/sharding-partitioning.rst +++ b/docs/admin/sharding-partitioning.rst @@ -64,7 +64,7 @@ partition as a set of shards. For each partition, the number of shards defined by ``CLUSTERED INTO x SHARDS`` are created, when a first record with a specific ``partition key`` is inserted. -In the following example - which represents a very simple time-series use-case +In the following example - which represents a very simple time series use-case - we added another column ``part`` that automatically generates the current month upon insertion from the ``ts`` column. The ``part`` column is further used as the ``partition key``. @@ -132,12 +132,12 @@ Then, to calculate the number of shards, you should consider that the size of ea shard should roughly be between 5 - 100 GB, and that each node can only manage up to 1000 shards. -Time-series example +Time series example ------------------- To illustrate the steps above, let's use them on behalf of an example. Imagine you want to create a *partitioned table* on a *three-node cluster* to store -time-series data with the following assumptions: +time series data with the following assumptions: - Inserts: 1.000 records/s - Record size: 128 byte/record diff --git a/docs/domain/document/index.md b/docs/domain/document/index.md index a56a1861..51b0dab5 100644 --- a/docs/domain/document/index.md +++ b/docs/domain/document/index.md @@ -9,8 +9,11 @@ Storing documents in CrateDB provides the same development convenience like the document-oriented storage layer of Lotus Notes / Domino, CouchDB, MongoDB, and PostgreSQL's `JSON(B)` types. +- [](inv:crate-reference#type-object) - [](inv:cloud#object) +- [CrateDB Objects] - [Unleashing the Power of Nested Data: Ingesting and Querying JSON Documents with SQL] +[CrateDB Objects]: https://youtu.be/aQi9MXs2irU?feature=shared [Unleashing the Power of Nested Data: Ingesting and Querying JSON Documents with SQL]: https://youtu.be/S_RHmdz2IQM?feature=shared diff --git a/docs/domain/industrial/index.md b/docs/domain/industrial/index.md index 49a5ea26..67daca7a 100644 --- a/docs/domain/industrial/index.md +++ b/docs/domain/industrial/index.md @@ -5,7 +5,7 @@ # Industrial Data Learn how to use CrateDB in industrial / IIoT / Industry 4.0 scenarios within -engineering, manufacturing, and other operational domains. +engineering, manufacturing, production, and other operational domains. In the realm of Industrial IoT, dealing with diverse data, ranging from slow-moving structured data, to high-frequency measurements, presents unique @@ -15,24 +15,110 @@ The complexities of industrial big data are characterized by its high variety, unstructured features, different data sampling rates, and how these attributes influence data storage, retention, and integration. -Today's warehouses are complex systems with a very high degree of automation. -The key to the successful operation of these warehouses lies in having a -holistic view on the entire system based on data from various components like -sensors, PLCs, embedded controllers and software systems. +(rauch)= +## Rauch Insights + +::::{info-card} + +:::{grid-item} +:columns: 8 + +{material-outlined}`data_exploration;2em`   **Rauch: High-Speed Production Lines** + +_Scaling a high-speed production environment with CrateDB._ + +Rauch is filling 33 cans per second and how that adds up to 400 data records +per second which are being processed, stored, and analyzed. In total, they are +within the range of one to ten billion records persisted in CrateDB. + +- [Rauch: High-Speed Production Lines] + +The use-case of Rauch demonstrates why traditional databases weren't capable to +deal with so many data records and unstructured data. The benefits of CrateDB +made Rauch choose it over other databases, such as PostgreSQL compatibility, +the support for unstructured data, and its excellent customer support. + +:Industry: {tags-secondary}`Food` {tags-secondary}`Packaging` {tags-secondary}`Production` +:Tags: {tags-primary}`SCADA` {tags-primary}`MDE` {tags-primary}`Data Historian` {tags-primary}`Industrial IoT` {tags-primary}`PLC` +::: + +:::{grid-item}   +:columns: 4 + + + +**Date:** 28 Jun 2022 \ +**Speaker:** Arno Breuss +::: +:::: + +(tgw)= ## TGW Insights + +::::{info-card} + +:::{grid-item} +:columns: 8 + +{material-outlined}`inventory;2em`   **TGW: Data acquisition in high-speed logistics** + +_Storing, querying, and analyzing industrial IoT data and metadata without +much hassle._ + +Today's warehouses are complex systems with a very high degree of automation. + +TGW Logistics Group implements key factors to the successful operation of these +warehouses, by having a holistic view on the entire system acquiring data from +various components like sensors, PLCs, embedded controllers, and software +systems. + +- [TGW: Fixing data silos in a high-speed logistics environment] + +TGW states that all these components can be seen as "data silos", +distributed across the entire site, each of them storing just some pieces of +information in various data structures and different ways to access it. + After trying multiple database systems, TGW Logistics moved to CrateDB for -its ability to aggregate different data formats and ability to query this -information without much hassle. - +its ability to aggregate different data formats and the ability to query this +information without further ado. + +:Industry: {tags-secondary}`Logistics` {tags-secondary}`Shipping` +:Tags: {tags-primary}`SCADA` {tags-primary}`MDE` {tags-primary}`Data Historian` {tags-primary}`Industrial IoT` {tags-primary}`PLC` +::: + +:::{grid-item}   +:columns: 4 + + + +**Date:** 22 Jun 2022 \ +**Speakers:** Alexander Mann, Jan Weber +::: + +:::: + + + +::::{info-card} + +:::{grid-item} +:columns: 8 + +{material-outlined}`dashboard;2em`   **TGW: Challenges in storing and analyzing industrial data** + +_Not All Time-Series Are Equal: Challenges in Storing and Analyzing Industrial Data._ + In the second presentation, you will learn how TGW leverages CrateDB to build -digital twins of physical warehouses around the world. +digital twins of physical warehouses around the world, by using its unique set +of features suitable for storing and querying complex industrial big data with +high variety, unstructured features, and at different data frequencies. -- [Fixing data silos in a high-speed logistics environment] -- [Challenges of Storing and Analyzing Industrial Data] +- [CrateDB: Challenges in industrial data] +- [TGW: Storing and analyzing real-world industrial data] **What's inside** @@ -47,6 +133,31 @@ digital twins of physical warehouses around the world. - Real-World Applications: Exploration of actual customer use cases to illustrate how CrateDB can be applied in various industrial scenarios. +:Industry: {tags-secondary}`Logistics` {tags-secondary}`Shipping` +:Tags: {tags-primary}`Data Historian` {tags-primary}`Industrial IoT` {tags-primary}`Digital Twin` +::: + +:::{grid-item}   +:columns: 4 + + + +**Date:** 23 Nov 2022 \ +**Speaker:** Marija Selakovic + + + + +**Date:** 5 Oct 2023 \ +**Speakers:** Alexander Mann, Georg Traar +::: + +:::: + + + -[Challenges of Storing and Analyzing Industrial Data]: https://youtu.be/ugQvihToY0k?feature=shared -[Fixing data silos in a high-speed logistics environment]: https://youtu.be/6dgjVQJtSKI?feature=shared +[CrateDB: Challenges in industrial data]: https://speakerdeck.com/cratedb/not-all-time-series-are-equal-challenges-of-storing-and-analyzing-industrial-data +[Rauch: High-Speed Production Lines]: https://youtu.be/gJPmJ0uXeVs?feature=shared +[TGW: Fixing data silos in a high-speed logistics environment]: https://youtu.be/6dgjVQJtSKI?feature=shared +[TGW: Storing and analyzing real-world industrial data]: https://youtu.be/ugQvihToY0k?feature=shared diff --git a/docs/domain/timeseries/advanced.md b/docs/domain/timeseries/advanced.md new file mode 100644 index 00000000..ccc34704 --- /dev/null +++ b/docs/domain/timeseries/advanced.md @@ -0,0 +1,266 @@ +(timeseries-advanced)= +(timeseries-analysis)= + +# Advanced Time Series Analysis + +Learn how to conduct advanced data analysis on large time series datasets +with CrateDB. + +{tags-primary}`Exploratory data analysis` +{tags-primary}`Time series decomposition` +{tags-primary}`Anomaly detection` +{tags-primary}`Forecasting / Prediction` +{tags-primary}`Metadata integration` + + + + + +(timeseries-anomaly-forecasting)= +## Anomaly Detection and Forecasting + +To gain insights from your data in a one-shot or recurring way, based on +machine learning techniques, you may want to look into applying [anomaly] +detection and/or [forecasting] methods. + +**Examples** + + +::::{info-card} + +:::{grid-item} **Use MLflow for time series anomaly detection and timeseries forecasting** +:columns: 9 + +Guidelines and runnable code to get started with [MLflow] and CrateDB, exercising +time series anomaly detection and timeseries forecasting / prediction using +NumPy, Merlion, and Matplotlib. +::: + +:::{grid-item} +:columns: 3 + +[![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/mlops-mlflow/tracking_merlion.ipynb) +[![Open in Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/mlops-mlflow/tracking_merlion.ipynb) + +{tags-primary}`Anomaly Detection` +{tags-primary}`Forecasting / Prediction` + +{tags-secondary}`Python` +{tags-secondary}`MLflow` +::: + +:::: + + +::::{info-card} + +:::{grid-item} **Use PyCaret to train time series forecasting models** +:columns: 9 + +This notebook explores the [PyCaret] framework and shows how to use it +to train various timeseries forecasting models. +::: + +:::{grid-item} +:columns: 3 + +[![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/automl/automl_timeseries_forecasting_with_pycaret.ipynb) +[![Open in Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/automl/automl_timeseries_forecasting_with_pycaret.ipynb) + +{tags-primary}`Forecasting / Prediction` + +{tags-secondary}`Python` +{tags-secondary}`PyCaret` +::: + +:::: + + +(timeseries-decomposition)= +## Decomposition + +[Decomposition of time series] is a statistical task that deconstructs a [time +series] into several components, each representing one of the underlying +categories of patterns. + +There are two principal types of decomposition, one based on rates of change, +the other based on predictability. + +You can use this method to dissect a time series into multiple components, +typically including trend, seasonal, and random (or irregular) components. + +This process helps in understanding the underlying patterns of the time series +data, such as identifying any long-term direction (trend), recurring patterns +at fixed intervals (seasonality), and randomness (irregular fluctuations) in +the data. + +Decomposition is crucial for analyzing how these components change over time, +improving forecasts, and developing strategies for addressing each element +effectively. + +**Examples** + +::::{info-card} + +:::{grid-item} **Analyze trend, seasonality, and fluctuations with PyCaret and CrateDB** +:columns: 9 + +Learn how to extract data from CrateDB for analysis in PyCaret, how to +further preprocess it and how to use PyCaret to plot time series +decomposition by breaking it down into its basic components: trend, +seasonality, and residual (or irregular) fluctuations. +::: + +:::{grid-item} +:columns: 3 + +[![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/time-series-decomposition.ipynb) +[![Open in Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/timeseries/time-series-decomposition.ipynb) + +{tags-primary}`Time series decomposition` + +{tags-secondary}`Python` +{tags-secondary}`PyCaret` +::: + +:::: + + +(timeseries-eda)= +## EDA + +[Exploratory data analysis (EDA)] is an approach of analyzing data sets to +summarize their main characteristics, often using statistical graphics and +other data visualization methods. + +EDA involves visualizing, summarizing, and analyzing data, to uncover +patterns, anomalies, or relationships within the dataset. + +The objective of this step is to gain an understanding and intuition of the +data, identify potential issues, and, in machine learning, guide feature +engineering and model building. + +**Examples** + +::::{info-card} + +:::{grid-item} **Exploratory data analysis (EDA) with PyCaret and CrateDB** +:columns: 9 + +Learn how to access time series data from CrateDB using SQL, and how to apply +exploratory data analysis (EDA) with PyCaret. + +The notebook shows how to generate various plots and charts for EDA, helping +you to understand data distributions, relationships between variables, and to +identify patterns. +::: + +:::{grid-item} +:columns: 3 + +[![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/exploratory_data_analysis.ipynb) +[![Open in Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/timeseries/exploratory_data_analysis.ipynb) + +{tags-primary}`EDA on time series` + +{tags-secondary}`Python` +{tags-secondary}`PyCaret` +::: + +:::: + + +(timeseries-analysis-metadata)= +## Metadata Integration + +CrateDB is particularly effective when you need to combine time-series data +with metadata, for instance, in scenarios where data like sensor readings +or log entries, need to be augmented with additional context for more +insightful analysis. See also [](#document). + +CrateDB supports effective time-series analysis with fast aggregations, a +rich set of built-in functions, and [JOIN](inv:crate-reference#sql_joins) +operations. + +**Examples** + +::::{info-card} + +:::{grid-item} **Analyzing Device Readings with Metadata Integration** +:columns: 9 + +This tutorial illustrates how to augment time-series data with metadata, in +order to enable more comprehensive analysis. It uses a time-series dataset that +captures various device readings, such as battery, CPU, and memory information. +::: + +:::{grid-item} +:columns: 3 + +[![Navigate to Tutorial](https://img.shields.io/badge/Navigate%20to-Tutorial-lightgray?logo=Markdown)](inv:cloud#time-series-advanced) + +{tags-primary}`Rich time series` +{tags-primary}`Metadata` + +{tags-secondary}`SQL` +::: + +:::: + + +(timeseries-visualization)= +## Visualization + +Similar to EDA, just applying [data and information visualization] can yield +significant insights into the characteristics of your data. By using +best-of-breed data visualization tools, initial data exploration is +mostly your first encounter with the data. + +**Examples** + +::::{info-card} + +:::{grid-item} **Display millions of data points using hvPlot, Datashader, and CrateDB** +:columns: 9 + +[HoloViews] and [Datashader] frameworks enable channeling millions of data +points from your backend systems to the browser's glass. + +This notebook plots the venerable NYC Taxi dataset after importing it +into a CrateDB Cloud database cluster. + +🚧 _Please note this notebook is a work in progress._ 🚧 +::: + +:::{grid-item} +:columns: 3 + +[![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](https://github.com/crate/cratedb-examples/blob/amo/cloud-datashader/topic/timeseries/explore/cloud-datashader.ipynb) +[![Open in Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/amo/cloud-datashader/topic/timeseries/explore/cloud-datashader.ipynb) + +{tags-primary}`Time series visualization` + +{tags-secondary}`Python` +{tags-secondary}`HoloViews` +{tags-secondary}`hvPlot` +{tags-secondary}`Datashader` +::: + +:::: + + + +[anomaly]: https://en.wikipedia.org/wiki/Anomaly_(natural_sciences) +[Data and information visualization]: https://en.wikipedia.org/wiki/Data_and_information_visualization +[Datashader]: https://datashader.org/ +[Decomposition of time series]: https://en.wikipedia.org/wiki/Decomposition_of_time_series +[Exploratory data analysis (EDA)]: https://en.wikipedia.org/wiki/Exploratory_data_analysis +[forecasting]: https://en.wikipedia.org/wiki/Forecasting +[HoloViews]: https://www.holoviews.org/ +[MLflow]: https://mlflow.org/ +[PyCaret]: https://www.pycaret.org +[Time series]: https://en.wikipedia.org/wiki/Time_series diff --git a/docs/domain/timeseries/basics.md b/docs/domain/timeseries/basics.md new file mode 100644 index 00000000..072d0878 --- /dev/null +++ b/docs/domain/timeseries/basics.md @@ -0,0 +1,45 @@ +(timeseries-basics)= +# Time Series Basics with CrateDB + +## Getting Started + +- [](#timeseries-generate) +- [](#timeseries-normalize) +- [Financial data collection and processing using pandas] +- [](inv:cloud#time-series) +- [Load and visualize time series data using CrateDB, SQL, pandas, and Plotly](#plotly) +- [How to Build Time Series Applications with CrateDB] + +## Downsampling and Interpolation + +- [](#downsampling-timestamp-binning) +- [](#downsampling-lttb) +- [](#ni-interpolate) +- [Interpolating missing time series values] +- [](inv:crate-reference#aggregation-percentile) + +## Operations +- [](#sharding-partitioning) +- [CrateDB partitioned table vs. TimescaleDB Hypertable] + + +:::{tip} +For more in-depth information, please visit the documentation pages about +[](#timeseries-connect) and [](#timeseries-advanced). Alternatively, you +may prefer the [](#timeseries-video). +::: + + +:::{toctree} +:hidden: + +generate/index +normalize-intervals +::: + + + +[CrateDB partitioned table vs. TimescaleDB Hypertable]: https://community.cratedb.com/t/cratedb-partitioned-table-vs-timescaledb-hypertable/1713 +[Financial data collection and processing using pandas]: https://community.cratedb.com/t/automating-financial-data-collection-and-storage-in-cratedb-with-python-and-pandas-2-0-0/916 +[How to Build Time Series Applications with CrateDB]: https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/dask-weather-data-import.ipynb +[Interpolating missing time series values]: https://community.cratedb.com/t/interpolating-missing-time-series-values/1010 diff --git a/docs/domain/timeseries/connect.md b/docs/domain/timeseries/connect.md new file mode 100644 index 00000000..8abc67c1 --- /dev/null +++ b/docs/domain/timeseries/connect.md @@ -0,0 +1,64 @@ +(timeseries-connect)= +(timeseries-io)= +(timeseries-import-export)= + +# Database / Time Series Connectivity + +CrateDB connectivity options for working with time series data. + +{tags-primary}`Connect` +{tags-primary}`Import` +{tags-primary}`Export` +{tags-primary}`Extract` +{tags-primary}`Load` +{tags-primary}`ETL` + + +## Interfaces and Protocols + +CrateDB supports both the [HTTP protocol] and the [PostgreSQL wire protocol], +which ensures that many clients that work with PostgreSQL, will also work with +CrateDB. Through corresponding drivers, CrateDB is compatible with [ODBC], +[JDBC], and other database API specifications. + +By supporting [SQL], CrateDB is compatible with many standard database +environments out of the box. + +- [CrateDB HTTP interface] +- [CrateDB PostgreSQL interface] +- [CrateDB SQL protocol] + +## Drivers and Integrations + +CrateDB provides plenty of connectivity options with database drivers, +applications, and frameworks, in order to get time series data in and +out of CrateDB, and to connect to other applications. + +- [](inv:crate-clients-tools#connect) +- [](inv:crate-clients-tools#df) +- [](inv:crate-clients-tools#etl) +- [](inv:crate-clients-tools#metrics) + +## Tutorials + +Hands-on tutorials about CrateDB fundamentals about data I/O, as well as about +properly configuring and connecting relevant 3rd-party software components to +work optimally with CrateDB. + +- [Fundamentals of the COPY FROM statement] +- [](#etl) +- [](#metrics) +- [](#performance) +- [Import weather data using Dask] + + +[CrateDB HTTP interface]: inv:crate-reference:*:label#interface-http +[CrateDB PostgreSQL interface]: inv:crate-reference:*:label#interface-postgresql +[CrateDB SQL protocol]: inv:crate-reference:*:label#sql +[Fundamentals of the COPY FROM statement]: https://community.cratedb.com/t/fundamentals-of-the-copy-from-statement/1178 +[HTTP protocol]: https://en.wikipedia.org/wiki/HTTP +[Import weather data using Dask]: https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/dask-weather-data-import.ipynb +[JDBC]: https://en.wikipedia.org/wiki/Java_Database_Connectivity +[ODBC]: https://en.wikipedia.org/wiki/Open_Database_Connectivity +[PostgreSQL wire protocol]: https://www.postgresql.org/docs/current/protocol.html +[SQL]: https://en.wikipedia.org/wiki/Sql diff --git a/docs/domain/timeseries/generate/index.rst b/docs/domain/timeseries/generate/index.rst index 2dda29ef..1064abe0 100644 --- a/docs/domain/timeseries/generate/index.rst +++ b/docs/domain/timeseries/generate/index.rst @@ -1,4 +1,4 @@ -.. _timeseries-basics: +.. _timeseries-generate: .. _gen-ts: ========================= diff --git a/docs/domain/timeseries/index.md b/docs/domain/timeseries/index.md index f464ecd8..d7d34e8e 100644 --- a/docs/domain/timeseries/index.md +++ b/docs/domain/timeseries/index.md @@ -3,19 +3,95 @@ Learn how to optimally use CrateDB for time series use-cases. -- [](#timeseries-basics) -- [](#timeseries-normalize) -- [Financial data collection and processing using pandas] -- [](inv:cloud#time-series) -- [](inv:cloud#time-series-advanced) -- [Time-series data: From raw data to fast analysis in only three steps] +CrateDB is a distributed and scalable SQL database for storing and analyzing +massive amounts of data in near real-time, even with complex queries. It is +PostgreSQL-compatible, and based on Lucene. + + + + + +::::{grid} 1 2 2 2 +:margin: 4 4 0 0 +:padding: 0 +:gutter: 2 + + +:::{grid-item-card} {material-outlined}`show_chart;2em` Basics +:link: timeseries-basics +:link-type: ref +:link-alt: Time series basics with CrateDB + +Basic introductory tutorials about using CrateDB with time series data. + + +What's inside: +Getting Started, Downsampling and Interpolation, +Operations: Sharding and Partitioning. + +::: + + +:::{grid-item-card} {material-outlined}`analytics;2em` Advanced +:link: timeseries-analysis +:link-type: ref +:link-alt: About time series analysis + +Advanced time series data analysis with CrateDB. + + +What's inside: +Exploratory data analysis (EDA), time series decomposition, +anomaly detection, forecasting. + +::: + + +:::{grid-item-card} {material-outlined}`sync;2em` Import and Export +:link: timeseries-io +:link-type: ref +:link-alt: About time series data import and export + +Import data into and export data from your CrateDB cluster. + + +What's inside: +Connectivity and integration options with database drivers +and applications, libraries, and frameworks. + +::: + + +:::{grid-item-card} {material-outlined}`smart_display;2em` Video +:link: timeseries-video +:link-type: ref +:link-alt: Video tutorials about time series with CrateDB + +Video tutorials about time series data and CrateDB. + + +What's inside: +Time series introduction. Importing, exporting, +and analyzing. Industrial applications. + +::: + +:::: + + :::{toctree} :hidden: -generate/index -normalize-intervals +Basics +Advanced +Connectivity +video ::: - -[Financial data collection and processing using pandas]: https://community.cratedb.com/t/automating-financial-data-collection-and-storage-in-cratedb-with-python-and-pandas-2-0-0/916 -[Time-series data: From raw data to fast analysis in only three steps]: https://youtu.be/7biXPnG7dY4?feature=shared diff --git a/docs/domain/timeseries/video.md b/docs/domain/timeseries/video.md new file mode 100644 index 00000000..22630557 --- /dev/null +++ b/docs/domain/timeseries/video.md @@ -0,0 +1,148 @@ +(timeseries-video)= +# Video Tutorials + +Video tutorials about time series with CrateDB. + + +## Time Series Data and CrateDB + +::::{info-card} + +:::{grid-item} **A collection of videos about how CrateDB deals with time-series data** +:columns: 9 + + + +-- [Time Series Data and CrateDB] + +CrateDB simplifies the complexity of managing time-series data. +It provides a comprehensive solution for storing, querying, and extracting +insights from large-scale and high-volume time-series datasets. + +Learn more about CrateDB and time-series data here: +https://cratedb.com/solutions/time-series-database +::: + +:::{grid-item}   +:columns: 3 + +{tags-secondary}`Introduction` \ +{tags-primary}`Time Series` \ +{tags-info}`21 Feb 2024` +::: + +:::: + + +## Importing and Exporting Data with CrateDB + +::::{info-card} + +:::{grid-item} **The basics of `COPY FROM` and `COPY TO`** +:columns: 9 + + + +-- [Importing and Exporting Data with CrateDB] + +In this video tutorial, Rafaela will show you how to import JSON and CSV data +to CrateDB using the [`COPY FROM`] statement. Then, she will demonstrate how to +export data from CrateDB to a local file system, using the [`COPY TO`] statement. +Rafaela will use the [Quotes Dataset]. + +For more information about how to import and export +data from/into CrateDB, please refer to [](#timeseries-io). +::: + +:::{grid-item}   +:columns: 3 + +{tags-secondary}`Introduction` \ +{tags-primary}`Import and Export` \ +{tags-info}`8 Aug 2022` + +Rafaela Sant'ana +::: + +:::: + + + +## Analyzing Time Series Data with CrateDB + +::::{info-card} + +:::{grid-item} **From raw data to fast analysis in only three steps** +:columns: 9 + + + +-- [Time series data: From raw data to fast analysis in only three steps] + +In this extensive video tutorial, Karyn and Niklas will show you how to use +time-series data and data analysis to help businesses understand patterns, +trends, and causes over time. + +On behalf of the webinar, Crate.io's Solution Engineering team guides you +through the implementation steps of a time-series use case - from table layout +to querying. + +Our speakers will also show you how to find the right sharding and partitioning +strategy for your time-series data in CrateDB. +::: + +:::{grid-item}   +:columns: 3 + +{tags-secondary}`Extensive` \ +{tags-primary}`Time Series` \ +{tags-primary}`Modeling` \ +{tags-primary}`Import and Export` \ +{tags-primary}`Querying` \ +{tags-info}`23 Feb 2023` + +Karyn Azevedo, \ +Niklas Schmidtmer +::: + +:::: + + +## CrateDB in Industrial Applications + +::::{info-card} + +:::{grid-item} **High-Speed Production Lines and Logistics** +:columns: 9 + +Learn how Rauch and TGW leverage CrateDB to support high-speed shop-floor +production lines and logistics databases for warehouses around the world. + +- [](#rauch) +- [](#tgw) +::: + +:::{grid-item}   +:columns: 3 + +{tags-secondary}`Extensive` \ +{tags-primary}`Time Series` \ +{tags-primary}`Industrial IoT` \ +{tags-info}`2022/2023` + +Alexander Mann, \ +Arno Breuss, \ +Georg Traar, \ +Jan Weber +::: + +:::: + + + +[`COPY FROM`]: inv:crate-reference#sql-copy-from +[`COPY TO`]: inv:crate-reference#sql-copy-to +[Importing and Exporting Data with CrateDB]: https://youtu.be/xDypaX37XZQ?feature=shared +[Quotes Dataset]: https://www.kaggle.com/datasets/manann/quotes-500k +[Time series data: From raw data to fast analysis in only three steps]: https://youtu.be/7biXPnG7dY4?feature=shared +[Time Series Data and CrateDB]: https://www.youtube.com/playlist?list=PLDZqzXOGoWUKTZwR7zOY8s1sTvZOAa7cy diff --git a/docs/integrate/df.md b/docs/integrate/df.md index 3084c751..eda79aec 100644 --- a/docs/integrate/df.md +++ b/docs/integrate/df.md @@ -7,6 +7,7 @@ How to use CrateDB together with popular open-source dataframe libraries. ## Dask - [Guide to efficient data ingestion to CrateDB with pandas and Dask] - [Efficient batch/bulk INSERT operations with pandas, Dask, and SQLAlchemy] +- [Import weather data using Dask] - [Dask code examples] ## pandas @@ -22,6 +23,7 @@ How to use CrateDB together with popular open-source dataframe libraries. [Efficient batch/bulk INSERT operations with pandas, Dask, and SQLAlchemy]: https://cratedb.com/docs/python/en/latest/by-example/sqlalchemy/dataframe.html [Guide to efficient data ingestion to CrateDB with pandas]: https://community.cratedb.com/t/guide-to-efficient-data-ingestion-to-cratedb-with-pandas/1541 [Guide to efficient data ingestion to CrateDB with pandas and Dask]: https://community.cratedb.com/t/guide-to-efficient-data-ingestion-to-cratedb-with-pandas-and-dask/1482 +[Import weather data using Dask]: https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/dask-weather-data-import.ipynb [Importing Parquet files into CrateDB using Apache Arrow and SQLAlchemy]: https://community.cratedb.com/t/importing-parquet-files-into-cratedb-using-apache-arrow-and-sqlalchemy/1161 [pandas code examples]: https://github.com/crate/cratedb-examples/tree/main/by-dataframe/pandas [Polars code examples]: https://github.com/crate/cratedb-examples/tree/main/by-dataframe/polars diff --git a/docs/integrate/ml/index.md b/docs/integrate/ml/index.md index 493952fb..404cc659 100644 --- a/docs/integrate/ml/index.md +++ b/docs/integrate/ml/index.md @@ -63,6 +63,15 @@ Tutorials and Notebooks about using [PyCaret] together with CrateDB. [![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/automl/automl_timeseries_forecasting_with_pycaret.ipynb) [![Open in Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/automl/automl_timeseries_forecasting_with_pycaret.ipynb) +- [Exploratory data analysis (EDA) with PyCaret and CrateDB] + + This notebook demonstrates how to access timeseries data from CrateDB using + SQL, and how to apply exploratory data analysis (EDA) with PyCaret. It shows + how to generate various plots and charts for EDA, helping you to understand + data distributions, relationships between variables, and to identify patterns. + + [![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/exploratory_data_analysis.ipynb) [![Open in Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/timeseries/exploratory_data_analysis.ipynb) + (scikit-learn)= ## scikit-learn @@ -89,6 +98,7 @@ tensorflow [AutoML with PyCaret and CrateDB]: https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/automl +[Exploratory data analysis (EDA) with PyCaret and CrateDB]: https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/exploratory_data_analysis.ipynb [Introduction to Time Series Modeling using Machine Learning]: https://cratedb.com/blog/introduction-to-time-series-modeling-with-cratedb-machine-learning-time-series-data [Jupyter Notebook]: https://jupyter.org/ [LangChain]: https://python.langchain.com/ diff --git a/docs/integrate/visualize/index.md b/docs/integrate/visualize/index.md index ef21fe8f..54ff2b9d 100644 --- a/docs/integrate/visualize/index.md +++ b/docs/integrate/visualize/index.md @@ -11,7 +11,7 @@ Guidelines about data analysis and visualization with CrateDB. - [Introduction to time series visualization in CrateDB and Apache Superset (Blog)] - [Use CrateDB and Apache Superset for Open Source Data Warehousing and Visualization (Blog)] - [Introduction to time series visualization in CrateDB and Apache Superset (Webinar)] -- [Introduction to Time-Series Visualization in CrateDB and Apache Superset (Preset.io)] +- [Introduction to time series Visualization in CrateDB and Apache Superset (Preset.io)] **Development** - [Set up Apache Superset with CrateDB] @@ -34,6 +34,7 @@ Guidelines about data analysis and visualization with CrateDB. - [Using Grafana with CrateDB Cloud] +(datashader)= ## hvPlot and Datashader The `cloud-datashader.ipynb` notebook explores the [HoloViews] and [Datashader] frameworks @@ -56,10 +57,11 @@ into a CrateDB Cloud database cluster. - [From data storage to data analysis\: Tutorial on CrateDB and pandas] +(plotly)= ## Plotly / Dash - The `timeseries-queries-and-visualization.ipynb` notebook explores how to access - timeseries data from CrateDB via SQL, load it into pandas DataFrames, and visualize + time series data from CrateDB via SQL, load it into pandas DataFrames, and visualize it using Plotly. It includes advanced time series operations in SQL, like aggregations, window functions, @@ -71,6 +73,7 @@ into a CrateDB Cloud database cluster. - Alternatively, you are welcome to explore the canonical [Dash Examples]. + ## R ```{toctree} @@ -88,7 +91,6 @@ metabase ``` - [Dash Examples]: https://plotly.com/examples/ [Data Analysis with Cluvio and CrateDB]: https://community.cratedb.com/t/data-analysis-with-cluvio-and-cratedb/1571 [Datashader]: https://datashader.org/ @@ -97,7 +99,7 @@ metabase [Introduction to Time Series Visualization in CrateDB and Explo]: https://cratedb.com/blog/introduction-to-time-series-visualization-in-cratedb-and-explo [Introduction to time series visualization in CrateDB and Apache Superset (Blog)]: https://community.cratedb.com/t/introduction-to-time-series-visualization-in-cratedb-and-superset/1041 [Introduction to time series visualization in CrateDB and Apache Superset (Webinar)]: https://cratedb.com/resources/webinars/lp-wb-introduction-to-time-series-visualization-in-cratedb-apache-superset -[Introduction to Time-Series Visualization in CrateDB and Apache Superset (Preset.io)]: https://preset.io/blog/timeseries-cratedb-superset/ +[Introduction to time series Visualization in CrateDB and Apache Superset (Preset.io)]: https://preset.io/blog/timeseries-cratedb-superset/ [Real-time data analytics with Metabase and CrateDB]: https://www.metabase.com/community_posts/real-time-data-analytics-with-metabase-and-cratedb [Set up Apache Superset with CrateDB]: https://community.cratedb.com/t/set-up-apache-superset-with-cratedb/1716 [Set up an Apache Superset development sandbox with CrateDB]: https://community.cratedb.com/t/set-up-an-apache-superset-development-sandbox-with-cratedb/1163 diff --git a/docs/performance/selects.rst b/docs/performance/selects.rst index 8d4ab161..4f62aad8 100644 --- a/docs/performance/selects.rst +++ b/docs/performance/selects.rst @@ -116,7 +116,7 @@ Downsampling with ``DATE_BIN`` ============================== For improved downsampling using time-bucketing and resampling, the article -`resampling time-series data with DATE_BIN`_ shares patterns how to +`resampling time series data with DATE_BIN`_ shares patterns how to group records into time buckets and resample the values. This technique will improve query performance by reducing the amount of data @@ -257,6 +257,6 @@ and the same PK values, will also have identical ``_id`` values. .. _Largest Triangle Three Buckets: https://github.com/sveinn-steinarsson/flot-downsample .. _Lucene segment: https://stackoverflow.com/a/2705123 .. _normal distribution: https://en.wikipedia.org/wiki/Normal_distribution -.. _resampling time-series data with DATE_BIN: https://community.cratedb.com/t/resampling-time-series-data-with-date-bin/1009 +.. _resampling time series data with DATE_BIN: https://community.cratedb.com/t/resampling-time-series-data-with-date-bin/1009 .. _retrieving records in bulk with a list of primary key values: https://community.cratedb.com/t/retrieving-records-in-bulk-with-a-list-of-primary-key-values/1721 .. _using common table expressions to speed up queries: https://community.cratedb.com/t/using-common-table-expressions-to-speed-up-queries/1719