Skip to content

Commit

Permalink
feat: sync master staging (georgia-tech-db#1050)
Browse files Browse the repository at this point in the history
Co-authored-by: Joy Arulraj <[email protected]>
Co-authored-by: Jiashen Cao <[email protected]>
Co-authored-by: Andy Xu <[email protected]>
Co-authored-by: Sayan Sinha <[email protected]>
Co-authored-by: Hersh Dhillon <[email protected]>
  • Loading branch information
6 people authored and a0x8o committed Oct 30, 2023
1 parent a9124e1 commit b87af50
Show file tree
Hide file tree
Showing 91 changed files with 2,045 additions and 419 deletions.
4 changes: 4 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -486,7 +486,11 @@ jobs:
source test_evadb/bin/activate
pip install --upgrade pip
pip debug --verbose
<<<<<<< HEAD
pip install ".[dev,ludwig,qdrant,forecasting,pinecone,chromadb]"
=======
pip install ".[dev,ludwig,qdrant,forecasting]"
>>>>>>> 2dacff69 (feat: sync master staging (#1050))
source test_evadb/bin/activate
bash script/test/test.sh -m "<< parameters.mode >>"

Expand Down
25 changes: 21 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,8 +183,17 @@ EvaDB enables software developers to build AI apps in a few lines of code. Its p
<li> 📝 following us on <a href="https://medium.com/evadb-blog">Medium</a>
</ul>
👋 Hey! If you're excited about our vision of bringing AI inside database systems, show some ❤️ by:
<ul>
<li> 🐙 giving a ⭐ on our <a href="https://github.com/georgia-tech-db/evadb">EvaDB repo on Github</a>
<li> 📟 joining our <a href="https://evadb.ai/community">Slack Community</a>
<li> 🐦 following us on <a href="https://twitter.com/evadb_ai">Twitter</a>
<li> 🐦 following us on <a href="https://medium.com/evadb-blog">Medium</a>
</ul>

## Quick Links

<<<<<<< HEAD
<<<<<<< HEAD
- [Quick Links](#quick-links)
- [Documentation](#documentation)
Expand All @@ -199,12 +208,20 @@ EvaDB enables software developers to build AI apps in a few lines of code. Its p
- [Star History](#star-history)
- [License](#license)
=======
=======
- [Quick Links](#quick-links)
>>>>>>> 2dacff69 (feat: sync master staging (#1050))
- [Documentation](#documentation)
- [Why EvaDB](#why-evadb)
- [How does EvaDB work](#how-does-evadb-work)
- [Community and Support](#community-and-support)
- [Illustrative Queries](#illustrative-queries)
- [Illustrative Apps](#illustrative-apps)
- [More Illustrative Queries](#more-illustrative-queries)
- [Architecture of EvaDB](#architecture-of-evadb)
- [Community and Support](#community-and-support)
- [Contributing](#contributing)
- [Star History](#star-history)
- [License](#license)

## Documentation

Expand Down Expand Up @@ -398,11 +415,11 @@ SELECT ChatGPT('Is this video summary related to Ukraine russia war', text)
* Train an ML model using the <a href="https://ludwig.ai/latest/">Ludwig AI</a> engine to predict a column in a table.

```sql
CREATE UDF IF NOT EXISTS PredictHouseRent FROM
CREATE FUNCTION IF NOT EXISTS PredictHouseRent FROM
( SELECT * FROM HomeRentals )
TYPE Ludwig
'predict' 'rental_price'
'time_limit' 120;
PREDICT 'rental_price'
TIME_LIMIT 120;
```

</details>
Expand Down
55 changes: 40 additions & 15 deletions docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,34 +41,34 @@ parts:
title: Connect to Database
- file: source/overview/concepts
title: Concepts
sections:
- file: source/overview/concepts/data-sources
title: Data Sources
#- file: source/overview/faq

- caption: Use Cases
chapters:
- file: source/usecases/food-review.rst
- file: source/usecases/sentiment-analysis.rst
title: Sentiment Analysis
- file: source/usecases/question-answering.rst
title: Question Answering
- file: source/usecases/text-summarization.rst
title: Text Summarization
- file: source/usecases/image-classification.rst
title: Image Classification
- file: source/usecases/similar-image-search.rst
- file: source/usecases/image-search.rst
title: Image Search
- file: source/usecases/qa-video.rst
title: Video Question Answering
- file: source/usecases/08-chatgpt.ipynb
title: ChatGPT-based Video Question Answering
- file: source/usecases/12-query-pdf.ipynb
title: PDF Question Answering
- file: source/usecases/02-object-detection.ipynb
- file: source/usecases/object-detection.rst
title: Object Detection
- file: source/usecases/03-emotion-analysis.ipynb
title: Emotions Analysis
- file: source/usecases/07-object-segmentation-huggingface.ipynb
title: Image Segmentation
- file: source/usecases/13-privategpt.ipynb
- file: source/usecases/emotion-analysis.rst
title: Emotion Analysis
- file: source/usecases/privategpt.rst
title: PrivateGPT
>>>>>>> 8c5b63dc (release: merge staging into master (#1032))

- caption: User Reference
chapters:
<<<<<<< HEAD
- file: source/reference/evaql
title: Query Language
sections:
Expand All @@ -95,8 +95,10 @@ parts:

<<<<<<< HEAD
=======
=======
>>>>>>> 2dacff69 (feat: sync master staging (#1050))
- file: source/reference/evaql
title: Eva Query Language
title: EvaQL
sections:
- file: source/reference/evaql/load
- file: source/reference/evaql/select
Expand All @@ -109,7 +111,13 @@ parts:
- file: source/reference/evaql/rename
- file: source/reference/evaql/use

<<<<<<< HEAD
>>>>>>> 8c5b63dc (release: merge staging into master (#1032))
=======
- file: source/reference/api
title: Python API

>>>>>>> 2dacff69 (feat: sync master staging (#1050))
- file: source/reference/databases/index
title: Data Sources
sections:
Expand All @@ -123,6 +131,7 @@ parts:
=======
>>>>>>> 8c5b63dc (release: merge staging into master (#1032))

<<<<<<< HEAD
- file: source/reference/vector_databases/index
title: Vector Databases
sections:
Expand All @@ -141,6 +150,13 @@ parts:
title: Model Training with Sklearn
- file: source/reference/ai/model-train-xgboost
title: Model Training with XGBoost
=======
- file: source/reference/ai/index
title: AI Engines
sections:
- file: source/reference/ai/model-train
title: Model Training
>>>>>>> 2dacff69 (feat: sync master staging (#1050))
- file: source/reference/ai/model-forecasting
title: Time Series Forecasting
- file: source/reference/ai/hf
Expand All @@ -149,6 +165,7 @@ parts:
title: OpenAI
- file: source/reference/ai/yolo
title: YOLO
<<<<<<< HEAD
- file: source/reference/ai/stablediffusion
title: Stable Diffusion

Expand All @@ -157,6 +174,10 @@ parts:

- file: source/reference/optimizations
title: Optimizations
=======
- file: source/reference/ai/custom
title: Custom Model
>>>>>>> 2dacff69 (feat: sync master staging (#1050))

# - file: source/reference/io
# title: IO Descriptors
Expand All @@ -172,7 +193,11 @@ parts:
- file: source/benchmarks/text_summarization.rst
title: Text Summarization

<<<<<<< HEAD
- caption: Contribution Guide
=======
- caption: Developer Reference
>>>>>>> 2dacff69 (feat: sync master staging (#1050))
chapters:
- file: source/dev-guide/contribute
title: Contributing to EvaDB
Expand Down
4 changes: 4 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@
=======
>>>>>>> 8c5b63dc (release: merge staging into master (#1032))


# The name of the Pygments (syntax highlighting) style to use.
pygments_style = "github-dark"

Expand Down Expand Up @@ -176,6 +177,7 @@
<<<<<<< HEAD
=======

<<<<<<< HEAD

for i in os.listdir("../tutorials"):
if i in [
Expand All @@ -192,6 +194,8 @@
nb_execution_mode = "off"
>>>>>>> 8c5b63dc (release: merge staging into master (#1032))

=======
>>>>>>> 2dacff69 (feat: sync master staging (#1050))
# -- Initialize Sphinx ----------------------------------------------
def setup(app):
warnings.filterwarnings(
Expand Down
89 changes: 88 additions & 1 deletion docs/source/benchmarks/text_summarization.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
<<<<<<< HEAD
<<<<<<< HEAD
Text Summarization Benchmark
============================

Expand All @@ -12,25 +13,43 @@ Prepare dataset
---------------
=======
Text summarization benchmark
=======
Text Summarization Benchmark
>>>>>>> 2dacff69 (feat: sync master staging (#1050))
============================
In this benchmark, we compare the performance of text summarization between EvaDB and MindsDB on `CNN-DailyMail News <https://www.kaggle.com/datasets/gowrishankarp/newspaper-text-summarization-cnn-dailymail>`_.

<<<<<<< HEAD
1. Prepare dataset
------------------
>>>>>>> 8c5b63dc (release: merge staging into master (#1032))
=======
In this benchmark, we compare the runtime performance of EvaDB and MindsDB on
a text summarization application operating on a news dataset. In particular,
we focus on the `CNN-DailyMail News <https://www.kaggle.com/datasets/gowrishankarp/newspaper-text-summarization-cnn-dailymail>`_ dataset.

All the relevant files are located in the `text summarization benchmark folder on Github <https://github.com/georgia-tech-db/evadb/tree/staging/benchmark/text_summarization>`_.

Prepare dataset
---------------
>>>>>>> 2dacff69 (feat: sync master staging (#1050))

.. code-block:: bash
cd benchmark/text_summarization
bash download_dataset.sh
<<<<<<< HEAD
<<<<<<< HEAD
Use EvaDB for Text Summarization
--------------------------------
=======
2. Using EvaDB to summarize the CNN DailyMail News
--------------------------------------------------
>>>>>>> 8c5b63dc (release: merge staging into master (#1032))
=======
Use EvaDB for Text Summarization
--------------------------------
>>>>>>> 2dacff69 (feat: sync master staging (#1050))

.. note::

Expand All @@ -43,6 +62,7 @@ Use EvaDB for Text Summarization
python text_summarization_with_evadb.py
<<<<<<< HEAD
<<<<<<< HEAD
Loading Data Into EvaDB
~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -98,6 +118,55 @@ Setup SQLite Database
Prepare sqlite database for MindsDB
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>> 8c5b63dc (release: merge staging into master (#1032))
=======
Loading Data Into EvaDB
~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: sql
CREATE TABLE IF NOT EXISTS cnn_news_test(
id TEXT(128),
article TEXT(4096),
highlights TEXT(1024)
);
Creating Text Summarization Function in EvaDB
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: sql
CREATE UDF IF NOT EXISTS TextSummarizer
TYPE HuggingFace
TASK 'summarization'
MODEL 'sshleifer/distilbart-cnn-12-6'
MIN_LENGTH 5
MAX_LENGTH 100;
Tuning EvaDB for Maximum GPU Utilization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python
cursor._evadb.config.update_value("executor", "batch_mem_size", 300000)
cursor._evadb.config.update_value("executor", "gpu_ids", [0,1])
cursor._evadb.config.update_value("experimental", "ray", True)
Text Summarization Query in EvaDB
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: sql
CREATE TABLE IF NOT EXISTS cnn_news_summary AS
SELECT TextSummarizer(article) FROM cnn_news_test;
Use MindsDB for Text Summarization
-----------------------------------

Setup SQLite Database
~~~~~~~~~~~~~~~~~~~~~~
>>>>>>> 2dacff69 (feat: sync master staging (#1050))

.. code-block:: bash
Expand All @@ -110,6 +179,7 @@ Prepare sqlite database for MindsDB
Install MindsDB
~~~~~~~~~~~~~~~

<<<<<<< HEAD
<<<<<<< HEAD
Follow the `MindsDB installation guide <https://docs.mindsdb.com/setup/self-hosted/pip/source>`_ to install it via ``pip``.

Expand All @@ -123,20 +193,32 @@ Follow the `Setup for Source Code via pip <https://docs.mindsdb.com/setup/self-h

At the time of this documentation, we need to manually ``pip install evaluate`` for huggingface model to work in MindsDB.
>>>>>>> 8c5b63dc (release: merge staging into master (#1032))
=======
Follow the `MindsDB nstallation guide <https://docs.mindsdb.com/setup/self-hosted/pip/source>`_ to install it via ``pip``.

.. note::

You will need to manually run ``pip install evaluate`` for the ``HuggingFace`` model to work in MindsDB.
>>>>>>> 2dacff69 (feat: sync master staging (#1050))

After installation, use the ``MySQL`` client for connecting to ``MindsDB``. Update the port number if needed.

.. code-block:: bash
mysql -h 127.0.0.1 --port 47335 -u mindsdb -p
<<<<<<< HEAD
<<<<<<< HEAD
Benchmark MindsDB
~~~~~~~~~~~~~~~~~
=======
Run Experiment
~~~~~~~~~~~~~~
>>>>>>> 8c5b63dc (release: merge staging into master (#1032))
=======
Benchmark MindsDB
~~~~~~~~~~~~~~~~~
>>>>>>> 2dacff69 (feat: sync master staging (#1050))

Connect ``MindsDB`` to the ``sqlite`` database we created before:

Expand Down Expand Up @@ -175,6 +257,7 @@ Use the ``text summarization`` model to summarize the CNN news dataset:
);
<<<<<<< HEAD
<<<<<<< HEAD
Benchmarking Results
--------------------
Expand All @@ -183,6 +266,10 @@ Benchmarking Results
---------------------
Below are numbers from a server with 56 Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz and two Quadro P6000 GPU.
>>>>>>> 8c5b63dc (release: merge staging into master (#1032))
=======
Benchmarking Results
--------------------
>>>>>>> 2dacff69 (feat: sync master staging (#1050))

Here are the key runtime metrics for the ``Text Summarization`` benchmark.

Expand Down
Loading

0 comments on commit b87af50

Please sign in to comment.