-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revamp getting started. #1016
Revamp getting started. #1016
Changes from 7 commits
3a4e148
bb16883
a9d6beb
9d795a8
590bf68
3f53797
f5d19d5
6fe6c74
04821ac
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,5 @@ | ||
.. _contributing: | ||
|
||
Contributing | ||
---------------- | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
Integrate Data Source | ||
===================== | ||
|
||
EvaDB supports an extensive data sources for both structured and unstructured data. | ||
|
||
1. Connect to an existing structured data source. | ||
|
||
.. code-block:: python | ||
|
||
cursor.query(""" | ||
CREATE DATABASE postgres_data WITH ENGINE = 'postgres', PARAMETERS = { | ||
"user": "eva", | ||
"password": "password", | ||
"host": "localhost", | ||
"port": "5432", | ||
"database": "evadb" | ||
};""").df() | ||
|
||
.. note:: | ||
|
||
Check :ref:`Create DATABASE statement<sql-create-database>` for syntax documentation and :ref:`Data Sources<data-sources>` for all supported data source engines. | ||
|
||
The above query connects to an exsiting Postgres database, which allows us to build AI applications in EvaDB without data migration. | ||
For example, the following query previews the available data using :ref:`SELECT<sql-select>`. | ||
|
||
.. code-block:: python | ||
|
||
cursor.query("SELECT * FROM postgres_data.food_review;").df() | ||
|
||
We can also run native queries in the connected database by the :ref:`USE<sql-use>` statement. | ||
|
||
.. code-block:: python | ||
|
||
cursor.query(""" | ||
USE postgres_data { | ||
INSERT INTO food_review (name, review) VALUES ('Customer 1', 'I ordered fried rice but it is too salty.') | ||
};""").df() | ||
|
||
|
||
2. Load unstructured data. EvaDB supports a wide range of type of unstructured data. Below are some example: | ||
|
||
.. code-block:: python | ||
|
||
cursor.query( | ||
"LOAD IMAGE 'reddit-images/*.jpg' INTO reddit_dataset;" | ||
).df() | ||
|
||
We load the local reddit image dataset into EvaDB. | ||
|
||
.. code-block:: python | ||
|
||
cursor.query("LOAD VIDEO 's3://bucket/eva_videos/mnist.mp4' INTO MNISTVid;").df() | ||
|
||
We load the MNIST video from s3 bucket into EvaDB. | ||
|
||
.. note:: | ||
|
||
Check :ref:`LOAD statement<sql-load>` for all types of supported unstructured data. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,57 +5,52 @@ Installation Guide | |
|
||
EvaDB provides couple different installation options to allow easy extension to rich functionalities. | ||
|
||
Default | ||
Use pip | ||
------- | ||
|
||
By Default, EvaDB installs only the minimal requirements. | ||
EvaDB supports Python (versions >= 3.8). We recommend installing with `pip` within an `isolated virtual environment <https://docs.python-guide.org/dev/virtualenvs/>`_. | ||
|
||
.. code-block:: | ||
.. code-block:: bash | ||
|
||
python -m venv evadb-venv | ||
source evadb-venv/bin/activate | ||
pip install --upgrade pip | ||
pip install evadb | ||
|
||
Vision Capability | ||
----------------- | ||
Install additional packages | ||
--------------------------- | ||
|
||
You can install EvaDB with the vision extension. | ||
With vision extension, you can run queries to do image classification, object detection, and emotion analysis workloads, etc. | ||
* `evadb[vision]` for vision dependencies. With vision dependencies, we can run queries to do image classification, object detection, and emotion analysis workloads, etc. | ||
* `evadb[document]` for LLM dependencies. With LLM dependencies, we can leverage the capability of LLM to summarize or do question answering for documents. | ||
* `evadb[qdrant]` for embedding-based similarity search. | ||
* `evadb[ludwig]` for model training and finetuning. | ||
* `evadb[ray]` for distributed execution on ray. | ||
|
||
.. code-block:: | ||
Install from source | ||
------------------- | ||
|
||
pip install evadb[vision] | ||
.. code-block:: bash | ||
|
||
Documents Summarization with LLM | ||
-------------------------------- | ||
git clone https://github.com/georgia-tech-db/evadb.git | ||
cd evadb | ||
pip install -e . | ||
|
||
You can also use EvaDB to leverage the capability of LLM to summarize or do question answering for documents. | ||
.. note:: | ||
|
||
.. code-block:: | ||
Check :ref:`Contribution Guide<contributing>` for more details. | ||
|
||
pip install evadb[document] | ||
|
||
Additional Vector Index | ||
----------------------- | ||
|
||
EvaDB installs ``faiss`` vector index by default, but users can also install other index library such as ``qdrant`` for similarity search feature. | ||
|
||
.. code-block:: | ||
|
||
pip install evadb[qdrant] | ||
|
||
Training or Finetuning Model | ||
---------------------------- | ||
|
||
Instead of using existing models for only inference, you can also train a customized function inside EvaDB with the ``ludwig`` extension. | ||
Run your first SQL query in EvaDB | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Others look good to me. Is showing how to run EvaDB necessary here? Since we already have at getting started page. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The motivation is to show user how to run SQL query in EvaDB since we do not have a |
||
---------------------------------- | ||
|
||
.. code-block:: | ||
To run SQL query in EvaDB, we need to first create a `cursor` object. The following query lists all the builtin user-defined functions. | ||
|
||
pip install evadb[ludwig] | ||
.. code-block:: python | ||
|
||
Better Performance and Scalability | ||
---------------------------------- | ||
import evdb | ||
cursor = evadb.connect().cursor() | ||
print(cursor.query("SHOW UDFS;").df()) | ||
|
||
EvaDB also allows users to improve the query performance by using ``ray`` to parallelize queries. | ||
.. note:: | ||
|
||
.. code-block:: | ||
Check :ref:`Python APIs<python-api>` for connection and cursor-related documentation. | ||
|
||
pip install evadb[ray] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
.. _data-sources: | ||
|
||
Data Sources | ||
============= | ||
|
||
Below are all supported data sources for EvaDB. We welcome adding new data source integrations in EvaDB. Check :ref:`add-data-source` for guidance. | ||
|
||
|
||
.. tableofcontents:: |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
PostgreSQL | ||
========== | ||
|
||
The connection to PostgreSQL is based on the `psycopg2 <https://pypi.org/project/psycopg2/>`_ library. | ||
|
||
Dependency | ||
---------- | ||
|
||
* psycopg2 | ||
|
||
|
||
Parameters | ||
---------- | ||
|
||
Required: | ||
|
||
* `user` is the database user. | ||
* `password` is the database password. | ||
* `host` is the host name, IP address, or URL. | ||
* `port` is the port used to make TCP/IP connection. | ||
* `database` is the database name. | ||
|
||
|
||
Create Connection | ||
----------------- | ||
|
||
.. code-block:: text | ||
|
||
CREATE DATABASE postgres_data WITH ENGINE = 'postgres', PARAMETERS = { | ||
"user": "eva", | ||
"password": "password", | ||
"host": "localhost", | ||
"port": "5432", | ||
"database": "evadb" | ||
}; | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,5 @@ | ||
.. _sql-load: | ||
|
||
LOAD | ||
==== | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,5 @@ | ||
.. _sql-select: | ||
|
||
SELECT | ||
====== | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
.. _sql-use: | ||
|
||
USE | ||
=== | ||
|
||
The USE statement allows us to run arbitary native queries in the connected database. | ||
|
||
.. code:: text | ||
|
||
USE [database_connection] { [native_query] }; | ||
|
||
* [database_connection] is an external database connection instanced by the `CREATE DATABASE statement`. | ||
* [native_query] is an arbitary SQL query supprted by the [database_connection]. | ||
|
||
.. warning:: | ||
|
||
Currently EvaDB only supports single query in one USE statement. The [native_query] should not end with semicolon. | ||
|
||
Examples | ||
-------- | ||
|
||
.. code:: text | ||
|
||
USE postgres_data { | ||
DROP TABLE IF EXISTS food_review | ||
}; | ||
|
||
USE postgres_data { | ||
CREATE TABLE food_review (name VARCHAR(10), review VARCHAR(1000)) | ||
}; | ||
|
||
USE postgres_data { | ||
INSERT INTO food_review (name, review) VALUES ('Customer 1', 'I ordered fried rice but it is too salty.') | ||
}; | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to use
``evadb[vision]``
. Single ` (italic) is hard to see in the doc.