From 3a4e148bb87f41b80e5142c0b074f5078d957f91 Mon Sep 17 00:00:00 2001 From: xzdandy Date: Thu, 31 Aug 2023 04:19:51 -0400 Subject: [PATCH 1/8] Update install-guide --- docs/source/dev-guide/contribute.rst | 2 + .../getting-started/install-guide.rst | 65 +++++++++---------- docs/source/reference/api.rst | 4 +- 3 files changed, 35 insertions(+), 36 deletions(-) diff --git a/docs/source/dev-guide/contribute.rst b/docs/source/dev-guide/contribute.rst index fc2f216d37..80d9a4d43a 100644 --- a/docs/source/dev-guide/contribute.rst +++ b/docs/source/dev-guide/contribute.rst @@ -1,3 +1,5 @@ +.. _contributing: + Contributing ---------------- diff --git a/docs/source/overview/getting-started/install-guide.rst b/docs/source/overview/getting-started/install-guide.rst index 24b63a1791..04c36ebedb 100644 --- a/docs/source/overview/getting-started/install-guide.rst +++ b/docs/source/overview/getting-started/install-guide.rst @@ -5,57 +5,52 @@ Installation Guide EvaDB provides couple different installation options to allow easy extension to rich functionalities. -Default +Use pip ------- -By Default, EvaDB installs only the minimal requirements. +EvaDB supports Python (versions >= 3.8). We recommend installing with `pip` within an `isolated virtual environment `_. -.. code-block:: +.. code-block:: bash + python -m venv evadb-venv + source evadb-venv/bin/activate + pip install --upgrade pip pip install evadb -Vision Capability ------------------ +Install additional packages +--------------------------- -You can install EvaDB with the vision extension. -With vision extension, you can run queries to do image classification, object detection, and emotion analysis workloads, etc. +* `evadb[vision]` for vision dependencies. With vision dependencies, we can run queries to do image classification, object detection, and emotion analysis workloads, etc. +* `evadb[document]` for LLM dependencies. With LLM dependencies, we can leverage the capability of LLM to summarize or do question answering for documents. +* `evadb[qdrant]` for embedding-based similarity search. +* `evadb[ludwig]` for model training and finetuning. +* `evadb[ray]` for distributed execution on ray. -.. code-block:: +Install from source +------------------- - pip install evadb[vision] +.. code-block:: bash -Documents Summarization with LLM --------------------------------- + git clone https://github.com/georgia-tech-db/evadb.git + cd evadb + pip install -e . -You can also use EvaDB to leverage the capability of LLM to summarize or do question answering for documents. +.. note:: -.. code-block:: + Check :ref:`Contribution Guide` for more details. - pip install evadb[document] - -Additional Vector Index ------------------------ - -EvaDB installs ``faiss`` vector index by default, but users can also install other index library such as ``qdrant`` for similarity search feature. - -.. code-block:: - - pip install evadb[qdrant] - -Training or Finetuning Model ----------------------------- - -Instead of using existing models for only inference, you can also train a customized function inside EvaDB with the ``ludwig`` extension. +Run your first SQL query in EvaDB +---------------------------------- -.. code-block:: +To run SQL query in EvaDB, we need to first create a `cursor` object. The following query lists all the builtin user-defined functions. - pip install evadb[ludwig] +.. code-block:: python -Better Performance and Scalability ----------------------------------- + import evdb + cursor = evadb.connect().cursor() + print(cursor.query("SHOW UDFS;").df()) -EvaDB also allows users to improve the query performance by using ``ray`` to parallelize queries. +.. note:: -.. code-block:: + Check :ref:`Python APIs` for connection and cursor-related documentation. - pip install evadb[ray] diff --git a/docs/source/reference/api.rst b/docs/source/reference/api.rst index 77caf9484a..0e17a667e6 100644 --- a/docs/source/reference/api.rst +++ b/docs/source/reference/api.rst @@ -1,3 +1,5 @@ +.. _python-api: + Basic API ========== @@ -74,4 +76,4 @@ EvaDBQuery Interface ~evadb.EvaDBQuery.order ~evadb.EvaDBQuery.show ~evadb.EvaDBQuery.sql_query - ~evadb.EvaDBQuery.execute \ No newline at end of file + ~evadb.EvaDBQuery.execute From bb1688397f87c4e19abc394cec1e80cd15083ebe Mon Sep 17 00:00:00 2001 From: xzdandy Date: Thu, 31 Aug 2023 05:38:34 -0400 Subject: [PATCH 2/8] Add use documentation --- docs/_toc.yml | 3 + .../overview/getting-started/data-source.rst | 55 +++++++++++++++++++ docs/source/reference/evaql/load.rst | 2 + docs/source/reference/evaql/select.rst | 2 + docs/source/reference/evaql/use.rst | 36 ++++++++++++ 5 files changed, 98 insertions(+) create mode 100644 docs/source/overview/getting-started/data-source.rst create mode 100644 docs/source/reference/evaql/use.rst diff --git a/docs/_toc.yml b/docs/_toc.yml index 25a7133cb4..3abc616fe9 100644 --- a/docs/_toc.yml +++ b/docs/_toc.yml @@ -7,6 +7,8 @@ parts: sections: - file: source/overview/getting-started/install-guide title: Installation Guide + - file: source/overview/getting-started/data-source + title: Integrate Data Source - file: source/overview/concepts #- file: source/overview/faq @@ -45,6 +47,7 @@ parts: - file: source/reference/evaql/insert - file: source/reference/evaql/delete - file: source/reference/evaql/rename + - file: source/reference/evaql/use - file: source/reference/udfs/index title: Models diff --git a/docs/source/overview/getting-started/data-source.rst b/docs/source/overview/getting-started/data-source.rst new file mode 100644 index 0000000000..59055fbcb7 --- /dev/null +++ b/docs/source/overview/getting-started/data-source.rst @@ -0,0 +1,55 @@ +Integrate Data Source +===================== + +EvaDB supports an extensive data sources for both structured and unstructured data. + +1. Connect to an existing structured data source. + +.. code-block:: python + + cursor.query(""" + CREATE DATABASE postgres_data WITH ENGINE = 'postgres', PARAMETERS = { + "user": "eva", + "password": "password", + "host": "localhost", + "port": "5432", + "database": "evadb" + };""").df() + +The above query connects to an exsiting Postgres database, which allows us to build AI applications in EvaDB without data migration. +For example, the following query previews the available data using :ref:`SELECT`. + +.. code-block:: python + + cursor.query("SELECT * FROM postgres_data.food_review;").df() + +We can also run native queries in the connected database by the :ref:`USE` statement. + +.. code-block:: python + + cursor.query(""" + USE postgres_data { + INSERT INTO food_review (name, review) VALUES ('Customer 1', 'I ordered fried rice but it is too salty.') + };""").df() + + +2. Load unstructured data. EvaDB supports a wide range of type of unstructured data. Below are some example: + +.. code-block:: python + + cursor.query( + "LOAD IMAGE 'reddit-images/*.jpg' INTO reddit_dataset;" + ).df() + +We load the local reddit image dataset into EvaDB. + +.. code-block:: python + + cursor.query("LOAD VIDEO 's3://bucket/eva_videos/mnist.mp4' INTO MNISTVid;").df() + +We load the MNIST video from s3 bucket into EvaDB. + +.. note:: + + Check :ref:`LOAD statement` for all types of supported unstructured data. + diff --git a/docs/source/reference/evaql/load.rst b/docs/source/reference/evaql/load.rst index 9d53f2b7d4..2772ad7992 100644 --- a/docs/source/reference/evaql/load.rst +++ b/docs/source/reference/evaql/load.rst @@ -1,3 +1,5 @@ +.. _sql-load: + LOAD ==== diff --git a/docs/source/reference/evaql/select.rst b/docs/source/reference/evaql/select.rst index bdd36e28df..bc7034f8dd 100644 --- a/docs/source/reference/evaql/select.rst +++ b/docs/source/reference/evaql/select.rst @@ -1,3 +1,5 @@ +.. _sql-select: + SELECT ====== diff --git a/docs/source/reference/evaql/use.rst b/docs/source/reference/evaql/use.rst new file mode 100644 index 0000000000..e8401a6c7c --- /dev/null +++ b/docs/source/reference/evaql/use.rst @@ -0,0 +1,36 @@ +.. _sql-use: + +USE +=== + +The USE statement allows us to run arbitary native queries in the connected database. + +.. code:: mysql + + USE [database_connection] { [native_query] }; + +* `database_connection` is an external database connection instanced by the `CREATE DATABASE statement`. +* `native_query` is an arbitary SQL query supprted by the `database_connection`. + +.. limitation:: + + Currently EvaDB only supports single query in one USE statement. The native_query should not end with semicolon. + +Examples +-------- + +.. code:: mysql + + USE postgres_data { + DROP TABLE IF EXISTS food_review + }; + + USE postgres_data { + CREATE TABLE food_review (name VARCHAR(10), review VARCHAR(1000)) + }; + + USE postgres_data { + INSERT INTO food_review (name, review) VALUES ('Customer 1', 'I ordered fried rice but it is too salty.') + }; + + From a9d6beba1bf40efddafa614877489a76ee0d5670 Mon Sep 17 00:00:00 2001 From: xzdandy Date: Thu, 31 Aug 2023 05:42:05 -0400 Subject: [PATCH 3/8] Use warning --- docs/source/reference/evaql/use.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/reference/evaql/use.rst b/docs/source/reference/evaql/use.rst index e8401a6c7c..69ba808960 100644 --- a/docs/source/reference/evaql/use.rst +++ b/docs/source/reference/evaql/use.rst @@ -12,7 +12,7 @@ The USE statement allows us to run arbitary native queries in the connected data * `database_connection` is an external database connection instanced by the `CREATE DATABASE statement`. * `native_query` is an arbitary SQL query supprted by the `database_connection`. -.. limitation:: +.. warning:: Currently EvaDB only supports single query in one USE statement. The native_query should not end with semicolon. From 9d795a8034e14348fc6545b6d66bff300b5c70d2 Mon Sep 17 00:00:00 2001 From: xzdandy Date: Thu, 31 Aug 2023 05:46:28 -0400 Subject: [PATCH 4/8] minor fix --- docs/source/reference/evaql/use.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/source/reference/evaql/use.rst b/docs/source/reference/evaql/use.rst index 69ba808960..6a4d813fbe 100644 --- a/docs/source/reference/evaql/use.rst +++ b/docs/source/reference/evaql/use.rst @@ -5,16 +5,16 @@ USE The USE statement allows us to run arbitary native queries in the connected database. -.. code:: mysql +.. code:: text USE [database_connection] { [native_query] }; -* `database_connection` is an external database connection instanced by the `CREATE DATABASE statement`. -* `native_query` is an arbitary SQL query supprted by the `database_connection`. +* [database_connection] is an external database connection instanced by the `CREATE DATABASE statement`. +* [native_query] is an arbitary SQL query supprted by the [database_connection]. .. warning:: - Currently EvaDB only supports single query in one USE statement. The native_query should not end with semicolon. + Currently EvaDB only supports single query in one USE statement. The [native_query] should not end with semicolon. Examples -------- From 590bf68855b47fa1a4710a58dfc8deb3a09ace63 Mon Sep 17 00:00:00 2001 From: xzdandy Date: Thu, 31 Aug 2023 06:23:49 -0400 Subject: [PATCH 5/8] Add data source page --- docs/_toc.yml | 5 +++ .../dev-guide/extend/new-data-source.rst | 2 ++ .../overview/getting-started/data-source.rst | 4 +++ docs/source/reference/databases/index.rst | 9 +++++ docs/source/reference/databases/postgres.rst | 36 +++++++++++++++++++ docs/source/reference/evaql/create.rst | 31 ++++++++++++++++ docs/source/reference/evaql/use.rst | 2 +- 7 files changed, 88 insertions(+), 1 deletion(-) create mode 100644 docs/source/reference/databases/index.rst create mode 100644 docs/source/reference/databases/postgres.rst diff --git a/docs/_toc.yml b/docs/_toc.yml index 3abc616fe9..04471a7927 100644 --- a/docs/_toc.yml +++ b/docs/_toc.yml @@ -49,6 +49,11 @@ parts: - file: source/reference/evaql/rename - file: source/reference/evaql/use + - file: source/reference/databases/index + title: Data Sources + sections: + - file: source/reference/databases/postgres + - file: source/reference/udfs/index title: Models sections: diff --git a/docs/source/dev-guide/extend/new-data-source.rst b/docs/source/dev-guide/extend/new-data-source.rst index 0c3261f9fc..844262a8cd 100644 --- a/docs/source/dev-guide/extend/new-data-source.rst +++ b/docs/source/dev-guide/extend/new-data-source.rst @@ -1,3 +1,5 @@ +.. _add-data-source: + Structured Data Source Integration ================================== This document details steps involved in adding a new structured data source integration in EvaDB. diff --git a/docs/source/overview/getting-started/data-source.rst b/docs/source/overview/getting-started/data-source.rst index 59055fbcb7..b773069256 100644 --- a/docs/source/overview/getting-started/data-source.rst +++ b/docs/source/overview/getting-started/data-source.rst @@ -15,6 +15,10 @@ EvaDB supports an extensive data sources for both structured and unstructured da "port": "5432", "database": "evadb" };""").df() + +.. note:: + + Check :ref:`Create DATABASE statement` for syntax documentation and :ref:`Data Sources` for all supported data source engines. The above query connects to an exsiting Postgres database, which allows us to build AI applications in EvaDB without data migration. For example, the following query previews the available data using :ref:`SELECT`. diff --git a/docs/source/reference/databases/index.rst b/docs/source/reference/databases/index.rst new file mode 100644 index 0000000000..fab16ad548 --- /dev/null +++ b/docs/source/reference/databases/index.rst @@ -0,0 +1,9 @@ +.. _data-sources: + +Data Sources +============= + +Below are all supported data sources for EvaDB. We welcome adding new data source integrations in EvaDB. Check :ref:`add-data-source` for guidance. + + +.. tableofcontents:: diff --git a/docs/source/reference/databases/postgres.rst b/docs/source/reference/databases/postgres.rst new file mode 100644 index 0000000000..b060b9c7a9 --- /dev/null +++ b/docs/source/reference/databases/postgres.rst @@ -0,0 +1,36 @@ +PostgreSQL +========== + +The connection to PostgreSQL is based on the `psycopg2`_ library. + +Dependency +---------- + +* psycopg2 + + +Parameters +---------- + +Required: + +* `user` is the database user. +* `password` is the database password. +* `host` is the host name, IP address, or URL. +* `port` is the port used to make TCP/IP connection. +* `database` is the database name. + + +Create Connection +----------------- + +.. code-block:: text + + CREATE DATABASE postgres_data WITH ENGINE = 'postgres', PARAMETERS = { + "user": "eva", + "password": "password", + "host": "localhost", + "port": "5432", + "database": "evadb" + }; + diff --git a/docs/source/reference/evaql/create.rst b/docs/source/reference/evaql/create.rst index edbae69e26..586962e006 100644 --- a/docs/source/reference/evaql/create.rst +++ b/docs/source/reference/evaql/create.rst @@ -1,6 +1,37 @@ CREATE ====== +.. _sql-create-database: + +CREATE DATABASE +--------------- + +The CREATE DATABASE statement allows us to connect to an external structured data store in EvaDB. + +.. code:: text + + CREATE DATABASE [database_connection] + WITH ENGINE = [database_engine], + PARAMETERS = [key_value_parameters]; + +* [database_connection] is the name of the database connection. `[database_connection].[table_name]` will be used as table name to compose SQL queries in EvaDB. +* [database_engine] is the supported database engine. Check :ref:`supported data sources` for all engine and their available configuration parameters. +* [key_value_parameters] is a list of key-value pairs as arguments to establish a connection. + + +Examples +~~~~~~~~ + +.. code:: text + + CREATE DATABASE postgres_data WITH ENGINE = 'postgres', PARAMETERS = { + "user": "eva", + "password": "password", + "host": "localhost", + "port": "5432", + "database": "evadb" + }; + CREATE TABLE ------------ diff --git a/docs/source/reference/evaql/use.rst b/docs/source/reference/evaql/use.rst index 6a4d813fbe..cb10cc6ce9 100644 --- a/docs/source/reference/evaql/use.rst +++ b/docs/source/reference/evaql/use.rst @@ -19,7 +19,7 @@ The USE statement allows us to run arbitary native queries in the connected data Examples -------- -.. code:: mysql +.. code:: text USE postgres_data { DROP TABLE IF EXISTS food_review From 3f537970350ce23fbaac635d4d6df289bcd0ce1c Mon Sep 17 00:00:00 2001 From: xzdandy Date: Thu, 31 Aug 2023 06:29:15 -0400 Subject: [PATCH 6/8] Fix link --- docs/source/reference/databases/postgres.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/reference/databases/postgres.rst b/docs/source/reference/databases/postgres.rst index b060b9c7a9..679cada5e0 100644 --- a/docs/source/reference/databases/postgres.rst +++ b/docs/source/reference/databases/postgres.rst @@ -1,7 +1,7 @@ PostgreSQL ========== -The connection to PostgreSQL is based on the `psycopg2`_ library. +The connection to PostgreSQL is based on the `psycopg2 `_ library. Dependency ---------- From 6fe6c7476497764543ff8786e008cf96acfe220d Mon Sep 17 00:00:00 2001 From: xzdandy Date: Thu, 31 Aug 2023 12:24:35 -0400 Subject: [PATCH 7/8] Minor fix --- .../overview/getting-started/install-guide.rst | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/source/overview/getting-started/install-guide.rst b/docs/source/overview/getting-started/install-guide.rst index 04c36ebedb..b415b76888 100644 --- a/docs/source/overview/getting-started/install-guide.rst +++ b/docs/source/overview/getting-started/install-guide.rst @@ -8,7 +8,7 @@ EvaDB provides couple different installation options to allow easy extension to Use pip ------- -EvaDB supports Python (versions >= 3.8). We recommend installing with `pip` within an `isolated virtual environment `_. +EvaDB supports Python (versions >= 3.8). We recommend installing with ``pip`` within an `isolated virtual environment `_. .. code-block:: bash @@ -20,11 +20,11 @@ EvaDB supports Python (versions >= 3.8). We recommend installing with `pip` with Install additional packages --------------------------- -* `evadb[vision]` for vision dependencies. With vision dependencies, we can run queries to do image classification, object detection, and emotion analysis workloads, etc. -* `evadb[document]` for LLM dependencies. With LLM dependencies, we can leverage the capability of LLM to summarize or do question answering for documents. -* `evadb[qdrant]` for embedding-based similarity search. -* `evadb[ludwig]` for model training and finetuning. -* `evadb[ray]` for distributed execution on ray. +* ``evadb[vision]`` for vision dependencies. With vision dependencies, we can run queries to do image classification, object detection, and emotion analysis workloads, etc. +* ``evadb[document]`` for LLM dependencies. With LLM dependencies, we can leverage the capability of LLM to summarize or do question answering for documents. +* ``evadb[qdrant]`` for embedding-based similarity search. +* ``evadb[ludwig]`` for model training and finetuning. +* ``evadb[ray]`` for distributed execution on ray. Install from source ------------------- @@ -42,7 +42,7 @@ Install from source Run your first SQL query in EvaDB ---------------------------------- -To run SQL query in EvaDB, we need to first create a `cursor` object. The following query lists all the builtin user-defined functions. +To run SQL query in EvaDB, we need to first create a ``cursor`` object. The following query lists all the builtin user-defined functions. .. code-block:: python From 04821ace12da42183e8f33799c8a2a0a9790303b Mon Sep 17 00:00:00 2001 From: xzdandy Date: Thu, 31 Aug 2023 12:26:29 -0400 Subject: [PATCH 8/8] Grammer --- docs/source/overview/getting-started/data-source.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/overview/getting-started/data-source.rst b/docs/source/overview/getting-started/data-source.rst index b773069256..de220bfca3 100644 --- a/docs/source/overview/getting-started/data-source.rst +++ b/docs/source/overview/getting-started/data-source.rst @@ -37,7 +37,7 @@ We can also run native queries in the connected database by the :ref:`USE