Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add import globbing docs #48

Merged
merged 12 commits into from
Jan 8, 2024
Binary file added docs/_assets/img/cluster-import-globbing.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_assets/img/cluster-import-tab-azure.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
61 changes: 56 additions & 5 deletions docs/reference/overview.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
.. _overview:
.. _console-overview:

=====================
Expand Down Expand Up @@ -322,10 +321,11 @@ Import from private S3 bucket

CrateDB Cloud allows convenient imports directly from S3-compatible storage.
To import a file form bucket, provide the name of your bucket, and path to
the file. The S3 Access Key ID, and S3 Secret Access Key are also needed. You can
also specify the endpoint for non-AWS S3 buckets. Keep in mind that you may be
charged for egress, depending on your provider. There is also a limit of 10 GiB
for S3 imports. The usual file formats are supported.
the file. The S3 Access Key ID, and S3 Secret Access Key are also needed. You
can also specify the endpoint for non-AWS S3 buckets. Keep in mind that you may
be charged for egress traffic, depending on your provider. There is also a
volume limit of 10 GiB per file for S3 imports. The usual file formats are
supported - CSV, Json, and Parquet.
Copy link
Contributor

@SStorm SStorm Dec 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
supported - CSV, Json, and Parquet.
supported - CSV (all variants), JSON (JSON-Lines, JSON Arrays and JSON Documents), and Parquet.


.. image:: ../_assets/img/cluster-import-tab-s3.png
:alt: Cloud Console cluster upload from S3
Expand All @@ -350,6 +350,55 @@ for S3 imports. The usual file formats are supported.
}]
}

.. _overview-cluster-import-azure:

Azure Container Storage
^^^^^^^^^^^^^^^^^^^^^^^
matkuliak marked this conversation as resolved.
Show resolved Hide resolved

Importing from Azure Storage Containers is also supported. In this case, the
secret consists of a secret name, an Azure Storage Connection string or an
Azure SAS Token URL.

There is an option to import data from a private Azure Blob Storage container
using a stored secret. Secret can be added by an admin user at the organization
level.
matkuliak marked this conversation as resolved.
Show resolved Hide resolved

You can specify a secret, a container, a table and a path in the form
`/folder/my_file.parquet`

As with other imports Parquet, CSV, and JSON files are supported. File size
limitation for imports is 10 GiB per file.

.. image:: ../_assets/img/cluster-import-tab-azure.png
:alt: Cloud Console cluster upload from Azure Storage Container

.. _overview-cluster-import-globbing:

Importing multiple files
~~~~~~~~~~~~~~~~~~~~~~~~

matkuliak marked this conversation as resolved.
Show resolved Hide resolved
Importing multiple files, also known as import globbing is supported in any
s3-complatible blob storage. The steps are the same as if importing from S3,
i.e. bucket name, path to the file and S3 ID/Secret.

Importing multiple files from Azure Container/Blob Storage is also supported:
`/folder/*.parquet`

Files to be imported are specified by using the well-known `wildcard`_
notation, also known as "globbing". In computer programming, `glob`_ patterns
specify sets of filenames with wildcard characters. The following example would
import all the files from the single specified day.

.. code-block:: console

/somepath/AWSLogs/123456678899/CloudTrail/us-east-1/2023/11/12/*.json.gz

.. image:: ../_assets/img/cluster-import-globbing.png
:alt: Cloud Console cluster import globbing

As with other imports, the supported file types are CSV, JSON,
and Parquet.

.. _overview-cluster-import-file:

Import from file
Expand Down Expand Up @@ -668,10 +717,12 @@ about uncertainties or problems you are having when using our products.
.. _Croud: https://crate.io/docs/cloud/cli/en/latest/
.. _Croud clusters upgrade: https://crate.io/docs/cloud/cli/en/latest/commands/clusters.html#clusters-upgrade
.. _deploy a trial cluster on the CrateDB Cloud Console for free: https://crate.io/lp-free-trial
.. _glob: https://en.wikipedia.org/wiki/Glob_(programming)
.. _HTTP: https://crate.io/docs/crate/reference/en/latest/interfaces/http.html
.. _Microsoft Azure: https://azure.microsoft.com/en-us/
.. _PostgreSQL wire protocol: https://crate.io/docs/crate/reference/en/latest/interfaces/postgres.html
.. _scaling the cluster: https://crate.io/docs/cloud/howtos/en/latest/scale-cluster.html
.. _signup tutorial: https://crate.io/docs/cloud/tutorials/en/latest/sign-up.html
.. _tutorial: https://crate.io/docs/cloud/tutorials/en/latest/cluster-deployment/index.html
.. _user roles: https://crate.io/docs/cloud/reference/en/latest/user-roles.html
.. _wildcard: https://en.wikipedia.org/wiki/Wildcard_character