diff --git a/docs/_toc.yml b/docs/_toc.yml index d2b094520b..2674879318 100644 --- a/docs/_toc.yml +++ b/docs/_toc.yml @@ -7,6 +7,8 @@ parts: sections: - file: source/overview/getting-started/install-guide title: Installation Guide + - file: source/overview/getting-started/data-source + title: Integrate Data Source - file: source/overview/concepts #- file: source/overview/faq @@ -50,6 +52,12 @@ parts: - file: source/reference/evaql/insert - file: source/reference/evaql/delete - file: source/reference/evaql/rename + - file: source/reference/evaql/use + + - file: source/reference/databases/index + title: Data Sources + sections: + - file: source/reference/databases/postgres - file: source/reference/udfs/index title: Models diff --git a/docs/source/dev-guide/contribute.rst b/docs/source/dev-guide/contribute.rst index fc2f216d37..80d9a4d43a 100644 --- a/docs/source/dev-guide/contribute.rst +++ b/docs/source/dev-guide/contribute.rst @@ -1,3 +1,5 @@ +.. _contributing: + Contributing ---------------- diff --git a/docs/source/dev-guide/extend/new-data-source.rst b/docs/source/dev-guide/extend/new-data-source.rst index 0c3261f9fc..844262a8cd 100644 --- a/docs/source/dev-guide/extend/new-data-source.rst +++ b/docs/source/dev-guide/extend/new-data-source.rst @@ -1,3 +1,5 @@ +.. _add-data-source: + Structured Data Source Integration ================================== This document details steps involved in adding a new structured data source integration in EvaDB. diff --git a/docs/source/overview/getting-started/data-source.rst b/docs/source/overview/getting-started/data-source.rst new file mode 100644 index 0000000000..de220bfca3 --- /dev/null +++ b/docs/source/overview/getting-started/data-source.rst @@ -0,0 +1,59 @@ +Integrate Data Source +===================== + +EvaDB supports an extensive data sources for both structured and unstructured data. + +1. Connect to an existing structured data source. + +.. code-block:: python + + cursor.query(""" + CREATE DATABASE postgres_data WITH ENGINE = 'postgres', PARAMETERS = { + "user": "eva", + "password": "password", + "host": "localhost", + "port": "5432", + "database": "evadb" + };""").df() + +.. note:: + + Check :ref:`Create DATABASE statement` for syntax documentation and :ref:`Data Sources` for all supported data source engines. + +The above query connects to an exsiting Postgres database, which allows us to build AI applications in EvaDB without data migration. +For example, the following query previews the available data using :ref:`SELECT`. + +.. code-block:: python + + cursor.query("SELECT * FROM postgres_data.food_review;").df() + +We can also run native queries in the connected database by the :ref:`USE` statement. + +.. code-block:: python + + cursor.query(""" + USE postgres_data { + INSERT INTO food_review (name, review) VALUES ('Customer 1', 'I ordered fried rice but it is too salty.') + };""").df() + + +2. Load unstructured data. EvaDB supports a wide range of type of unstructured data. Below are some examples: + +.. code-block:: python + + cursor.query( + "LOAD IMAGE 'reddit-images/*.jpg' INTO reddit_dataset;" + ).df() + +We load the local reddit image dataset into EvaDB. + +.. code-block:: python + + cursor.query("LOAD VIDEO 's3://bucket/eva_videos/mnist.mp4' INTO MNISTVid;").df() + +We load the MNIST video from s3 bucket into EvaDB. + +.. note:: + + Check :ref:`LOAD statement` for all types of supported unstructured data. + diff --git a/docs/source/overview/getting-started/install-guide.rst b/docs/source/overview/getting-started/install-guide.rst index 24b63a1791..b415b76888 100644 --- a/docs/source/overview/getting-started/install-guide.rst +++ b/docs/source/overview/getting-started/install-guide.rst @@ -5,57 +5,52 @@ Installation Guide EvaDB provides couple different installation options to allow easy extension to rich functionalities. -Default +Use pip ------- -By Default, EvaDB installs only the minimal requirements. +EvaDB supports Python (versions >= 3.8). We recommend installing with ``pip`` within an `isolated virtual environment `_. -.. code-block:: +.. code-block:: bash + python -m venv evadb-venv + source evadb-venv/bin/activate + pip install --upgrade pip pip install evadb -Vision Capability ------------------ +Install additional packages +--------------------------- -You can install EvaDB with the vision extension. -With vision extension, you can run queries to do image classification, object detection, and emotion analysis workloads, etc. +* ``evadb[vision]`` for vision dependencies. With vision dependencies, we can run queries to do image classification, object detection, and emotion analysis workloads, etc. +* ``evadb[document]`` for LLM dependencies. With LLM dependencies, we can leverage the capability of LLM to summarize or do question answering for documents. +* ``evadb[qdrant]`` for embedding-based similarity search. +* ``evadb[ludwig]`` for model training and finetuning. +* ``evadb[ray]`` for distributed execution on ray. -.. code-block:: +Install from source +------------------- - pip install evadb[vision] +.. code-block:: bash -Documents Summarization with LLM --------------------------------- + git clone https://github.com/georgia-tech-db/evadb.git + cd evadb + pip install -e . -You can also use EvaDB to leverage the capability of LLM to summarize or do question answering for documents. +.. note:: -.. code-block:: + Check :ref:`Contribution Guide` for more details. - pip install evadb[document] - -Additional Vector Index ------------------------ - -EvaDB installs ``faiss`` vector index by default, but users can also install other index library such as ``qdrant`` for similarity search feature. - -.. code-block:: - - pip install evadb[qdrant] - -Training or Finetuning Model ----------------------------- - -Instead of using existing models for only inference, you can also train a customized function inside EvaDB with the ``ludwig`` extension. +Run your first SQL query in EvaDB +---------------------------------- -.. code-block:: +To run SQL query in EvaDB, we need to first create a ``cursor`` object. The following query lists all the builtin user-defined functions. - pip install evadb[ludwig] +.. code-block:: python -Better Performance and Scalability ----------------------------------- + import evdb + cursor = evadb.connect().cursor() + print(cursor.query("SHOW UDFS;").df()) -EvaDB also allows users to improve the query performance by using ``ray`` to parallelize queries. +.. note:: -.. code-block:: + Check :ref:`Python APIs` for connection and cursor-related documentation. - pip install evadb[ray] diff --git a/docs/source/reference/api.rst b/docs/source/reference/api.rst index 77caf9484a..0e17a667e6 100644 --- a/docs/source/reference/api.rst +++ b/docs/source/reference/api.rst @@ -1,3 +1,5 @@ +.. _python-api: + Basic API ========== @@ -74,4 +76,4 @@ EvaDBQuery Interface ~evadb.EvaDBQuery.order ~evadb.EvaDBQuery.show ~evadb.EvaDBQuery.sql_query - ~evadb.EvaDBQuery.execute \ No newline at end of file + ~evadb.EvaDBQuery.execute diff --git a/docs/source/reference/databases/index.rst b/docs/source/reference/databases/index.rst new file mode 100644 index 0000000000..fab16ad548 --- /dev/null +++ b/docs/source/reference/databases/index.rst @@ -0,0 +1,9 @@ +.. _data-sources: + +Data Sources +============= + +Below are all supported data sources for EvaDB. We welcome adding new data source integrations in EvaDB. Check :ref:`add-data-source` for guidance. + + +.. tableofcontents:: diff --git a/docs/source/reference/databases/postgres.rst b/docs/source/reference/databases/postgres.rst new file mode 100644 index 0000000000..679cada5e0 --- /dev/null +++ b/docs/source/reference/databases/postgres.rst @@ -0,0 +1,36 @@ +PostgreSQL +========== + +The connection to PostgreSQL is based on the `psycopg2 `_ library. + +Dependency +---------- + +* psycopg2 + + +Parameters +---------- + +Required: + +* `user` is the database user. +* `password` is the database password. +* `host` is the host name, IP address, or URL. +* `port` is the port used to make TCP/IP connection. +* `database` is the database name. + + +Create Connection +----------------- + +.. code-block:: text + + CREATE DATABASE postgres_data WITH ENGINE = 'postgres', PARAMETERS = { + "user": "eva", + "password": "password", + "host": "localhost", + "port": "5432", + "database": "evadb" + }; + diff --git a/docs/source/reference/evaql/create.rst b/docs/source/reference/evaql/create.rst index edbae69e26..586962e006 100644 --- a/docs/source/reference/evaql/create.rst +++ b/docs/source/reference/evaql/create.rst @@ -1,6 +1,37 @@ CREATE ====== +.. _sql-create-database: + +CREATE DATABASE +--------------- + +The CREATE DATABASE statement allows us to connect to an external structured data store in EvaDB. + +.. code:: text + + CREATE DATABASE [database_connection] + WITH ENGINE = [database_engine], + PARAMETERS = [key_value_parameters]; + +* [database_connection] is the name of the database connection. `[database_connection].[table_name]` will be used as table name to compose SQL queries in EvaDB. +* [database_engine] is the supported database engine. Check :ref:`supported data sources` for all engine and their available configuration parameters. +* [key_value_parameters] is a list of key-value pairs as arguments to establish a connection. + + +Examples +~~~~~~~~ + +.. code:: text + + CREATE DATABASE postgres_data WITH ENGINE = 'postgres', PARAMETERS = { + "user": "eva", + "password": "password", + "host": "localhost", + "port": "5432", + "database": "evadb" + }; + CREATE TABLE ------------ diff --git a/docs/source/reference/evaql/load.rst b/docs/source/reference/evaql/load.rst index 9d53f2b7d4..2772ad7992 100644 --- a/docs/source/reference/evaql/load.rst +++ b/docs/source/reference/evaql/load.rst @@ -1,3 +1,5 @@ +.. _sql-load: + LOAD ==== diff --git a/docs/source/reference/evaql/select.rst b/docs/source/reference/evaql/select.rst index bdd36e28df..bc7034f8dd 100644 --- a/docs/source/reference/evaql/select.rst +++ b/docs/source/reference/evaql/select.rst @@ -1,3 +1,5 @@ +.. _sql-select: + SELECT ====== diff --git a/docs/source/reference/evaql/use.rst b/docs/source/reference/evaql/use.rst new file mode 100644 index 0000000000..cb10cc6ce9 --- /dev/null +++ b/docs/source/reference/evaql/use.rst @@ -0,0 +1,36 @@ +.. _sql-use: + +USE +=== + +The USE statement allows us to run arbitary native queries in the connected database. + +.. code:: text + + USE [database_connection] { [native_query] }; + +* [database_connection] is an external database connection instanced by the `CREATE DATABASE statement`. +* [native_query] is an arbitary SQL query supprted by the [database_connection]. + +.. warning:: + + Currently EvaDB only supports single query in one USE statement. The [native_query] should not end with semicolon. + +Examples +-------- + +.. code:: text + + USE postgres_data { + DROP TABLE IF EXISTS food_review + }; + + USE postgres_data { + CREATE TABLE food_review (name VARCHAR(10), review VARCHAR(1000)) + }; + + USE postgres_data { + INSERT INTO food_review (name, review) VALUES ('Customer 1', 'I ordered fried rice but it is too salty.') + }; + +