feat: sync master staging (georgia-tech-db#1050)

Co-authored-by: Joy Arulraj <[email protected]> Co-authored-by: Jiashen Cao <[email protected]> Co-authored-by: Andy Xu <[email protected]> Co-authored-by: Sayan Sinha <[email protected]> Co-authored-by: Hersh Dhillon <[email protected]>
alexxx-db · Oct 30, 2023 · b87af50 · b87af50
1 parent a9124e1
commit b87af50
Show file tree

Hide file tree

Showing 91 changed files with 2,045 additions and 419 deletions.
diff --git a/.circleci/config.yml b/.circleci/config.yml
@@ -486,7 +486,11 @@ jobs:
               source test_evadb/bin/activate
               pip install --upgrade pip
               pip debug --verbose
+<<<<<<< HEAD
               pip install ".[dev,ludwig,qdrant,forecasting,pinecone,chromadb]"
+=======
+              pip install ".[dev,ludwig,qdrant,forecasting]"
+>>>>>>> 2dacff69 (feat: sync master staging (#1050))
               source test_evadb/bin/activate
               bash script/test/test.sh -m "<< parameters.mode >>"
 

diff --git a/README.md b/README.md
@@ -183,8 +183,17 @@ EvaDB enables software developers to build AI apps in a few lines of code. Its p
   <li> 📝 following us on <a href="https://medium.com/evadb-blog">Medium</a>
 </ul>
 
+👋 Hey! If you're excited about our vision of bringing AI inside database systems, show some ❤️ by: 
+<ul>
+  <li> 🐙 giving a ⭐ on our <a href="https://github.com/georgia-tech-db/evadb">EvaDB repo on Github</a>
+  <li> 📟 joining our <a href="https://evadb.ai/community">Slack Community</a>
+  <li> 🐦 following us on <a href="https://twitter.com/evadb_ai">Twitter</a>
+  <li> 🐦 following us on <a href="https://medium.com/evadb-blog">Medium</a>
+</ul>
+
 ## Quick Links
 
+<<<<<<< HEAD
 <<<<<<< HEAD
 - [Quick Links](#quick-links)
 - [Documentation](#documentation)
@@ -199,12 +208,20 @@ EvaDB enables software developers to build AI apps in a few lines of code. Its p
 - [Star History](#star-history)
 - [License](#license)
 =======
+=======
+- [Quick Links](#quick-links)
+>>>>>>> 2dacff69 (feat: sync master staging (#1050))
 - [Documentation](#documentation)
 - [Why EvaDB](#why-evadb)
 - [How does EvaDB work](#how-does-evadb-work)
-- [Community and Support](#community-and-support)
 - [Illustrative Queries](#illustrative-queries)
 - [Illustrative Apps](#illustrative-apps)
+- [More Illustrative Queries](#more-illustrative-queries)
+- [Architecture of EvaDB](#architecture-of-evadb)
+- [Community and Support](#community-and-support)
+- [Contributing](#contributing)
+- [Star History](#star-history)
+- [License](#license)
 
 ## Documentation
 
@@ -398,11 +415,11 @@ SELECT ChatGPT('Is this video summary related to Ukraine russia war', text)
 * Train an ML model using the <a href="https://ludwig.ai/latest/">Ludwig AI</a> engine to predict a column in a table.
 
 ```sql
-CREATE UDF IF NOT EXISTS PredictHouseRent FROM
+CREATE FUNCTION IF NOT EXISTS PredictHouseRent FROM
 ( SELECT * FROM HomeRentals )
 TYPE Ludwig
-'predict' 'rental_price'
-'time_limit' 120;
+PREDICT 'rental_price'
+TIME_LIMIT 120;
 ```
 
 </details>

diff --git a/docs/_toc.yml b/docs/_toc.yml
@@ -41,34 +41,34 @@ parts:
         title: Connect to Database
       - file: source/overview/concepts
         title: Concepts
+        sections:
+          - file: source/overview/concepts/data-sources
+            title: Data Sources
         #- file: source/overview/faq
 
   - caption: Use Cases
     chapters:
-      - file: source/usecases/food-review.rst
+      - file: source/usecases/sentiment-analysis.rst
         title: Sentiment Analysis
+      - file: source/usecases/question-answering.rst
+        title: Question Answering
+      - file: source/usecases/text-summarization.rst
+        title: Text Summarization
       - file: source/usecases/image-classification.rst
         title: Image Classification
-      - file: source/usecases/similar-image-search.rst
+      - file: source/usecases/image-search.rst
         title: Image Search
-      - file: source/usecases/qa-video.rst
-        title: Video Question Answering
-      - file: source/usecases/08-chatgpt.ipynb
-        title: ChatGPT-based Video Question Answering
-      - file: source/usecases/12-query-pdf.ipynb
-        title: PDF Question Answering
-      - file: source/usecases/02-object-detection.ipynb
+      - file: source/usecases/object-detection.rst
         title: Object Detection
-      - file: source/usecases/03-emotion-analysis.ipynb
-        title: Emotions Analysis
-      - file: source/usecases/07-object-segmentation-huggingface.ipynb
-        title: Image Segmentation
-      - file: source/usecases/13-privategpt.ipynb
+      - file: source/usecases/emotion-analysis.rst
+        title: Emotion Analysis
+      - file: source/usecases/privategpt.rst
         title: PrivateGPT
 >>>>>>> 8c5b63dc (release: merge staging into master (#1032))
 
   - caption: User Reference
     chapters:
+<<<<<<< HEAD
       - file: source/reference/evaql      
         title: Query Language
         sections:
@@ -95,8 +95,10 @@ parts:
 
 <<<<<<< HEAD
 =======
+=======
+>>>>>>> 2dacff69 (feat: sync master staging (#1050))
       - file: source/reference/evaql      
-        title: Eva Query Language 
+        title: EvaQL
         sections:
           - file: source/reference/evaql/load
           - file: source/reference/evaql/select
@@ -109,7 +111,13 @@ parts:
           - file: source/reference/evaql/rename
           - file: source/reference/evaql/use
 
+<<<<<<< HEAD
 >>>>>>> 8c5b63dc (release: merge staging into master (#1032))
+=======
+      - file: source/reference/api
+        title: Python API
+
+>>>>>>> 2dacff69 (feat: sync master staging (#1050))
       - file: source/reference/databases/index
         title: Data Sources
         sections: 
@@ -123,6 +131,7 @@ parts:
 =======
 >>>>>>> 8c5b63dc (release: merge staging into master (#1032))
 
+<<<<<<< HEAD
       - file: source/reference/vector_databases/index
         title: Vector Databases
         sections: 
@@ -141,6 +150,13 @@ parts:
             title: Model Training with Sklearn
           - file: source/reference/ai/model-train-xgboost
             title: Model Training with XGBoost
+=======
+      - file: source/reference/ai/index
+        title: AI Engines
+        sections:
+          - file: source/reference/ai/model-train
+            title: Model Training
+>>>>>>> 2dacff69 (feat: sync master staging (#1050))
           - file: source/reference/ai/model-forecasting
             title: Time Series Forecasting
           - file: source/reference/ai/hf
@@ -149,6 +165,7 @@ parts:
             title: OpenAI 
           - file: source/reference/ai/yolo
             title: YOLO 
+<<<<<<< HEAD
           - file: source/reference/ai/stablediffusion
             title: Stable Diffusion
 
@@ -157,6 +174,10 @@ parts:
 
       - file: source/reference/optimizations
         title: Optimizations
+=======
+          - file: source/reference/ai/custom
+            title: Custom Model
+>>>>>>> 2dacff69 (feat: sync master staging (#1050))
 
       # - file: source/reference/io
       #   title: IO Descriptors
@@ -172,7 +193,11 @@ parts:
       - file: source/benchmarks/text_summarization.rst
         title: Text Summarization
 
+<<<<<<< HEAD
   - caption: Contribution Guide
+=======
+  - caption: Developer Reference
+>>>>>>> 2dacff69 (feat: sync master staging (#1050))
     chapters:
       - file: source/dev-guide/contribute
         title: Contributing to EvaDB

diff --git a/docs/conf.py b/docs/conf.py
@@ -100,6 +100,7 @@
 =======
 >>>>>>> 8c5b63dc (release: merge staging into master (#1032))
 
+
 # The name of the Pygments (syntax highlighting) style to use.
 pygments_style = "github-dark"
 
@@ -176,6 +177,7 @@
 <<<<<<< HEAD
 =======
 
+<<<<<<< HEAD
 
 for i in os.listdir("../tutorials"):
     if i in [
@@ -192,6 +194,8 @@
 nb_execution_mode = "off"
 >>>>>>> 8c5b63dc (release: merge staging into master (#1032))
 
+=======
+>>>>>>> 2dacff69 (feat: sync master staging (#1050))
 # -- Initialize Sphinx ----------------------------------------------
 def setup(app):
     warnings.filterwarnings(

diff --git a/docs/source/benchmarks/text_summarization.rst b/docs/source/benchmarks/text_summarization.rst
@@ -1,4 +1,5 @@
 <<<<<<< HEAD
+<<<<<<< HEAD
 Text Summarization Benchmark 
 ============================
 
@@ -12,25 +13,43 @@ Prepare dataset
 ---------------
 =======
 Text summarization benchmark 
+=======
+Text Summarization Benchmark 
+>>>>>>> 2dacff69 (feat: sync master staging (#1050))
 ============================
-In this benchmark, we compare the performance of text summarization between EvaDB and MindsDB on `CNN-DailyMail News <https://www.kaggle.com/datasets/gowrishankarp/newspaper-text-summarization-cnn-dailymail>`_.
 
+<<<<<<< HEAD
 1. Prepare dataset
 ------------------
 >>>>>>> 8c5b63dc (release: merge staging into master (#1032))
+=======
+In this benchmark, we compare the runtime performance of EvaDB and MindsDB on 
+a text summarization application operating on a news dataset. In particular, 
+we focus on the `CNN-DailyMail News <https://www.kaggle.com/datasets/gowrishankarp/newspaper-text-summarization-cnn-dailymail>`_ dataset.
+
+All the relevant files are located in the `text summarization benchmark folder on Github <https://github.com/georgia-tech-db/evadb/tree/staging/benchmark/text_summarization>`_.
+
+Prepare dataset
+---------------
+>>>>>>> 2dacff69 (feat: sync master staging (#1050))
 
 .. code-block:: bash
 
    cd benchmark/text_summarization
    bash download_dataset.sh
 
+<<<<<<< HEAD
 <<<<<<< HEAD
 Use EvaDB for Text Summarization
 --------------------------------
 =======
 2. Using EvaDB to summarize the CNN DailyMail News
 --------------------------------------------------
 >>>>>>> 8c5b63dc (release: merge staging into master (#1032))
+=======
+Use EvaDB for Text Summarization
+--------------------------------
+>>>>>>> 2dacff69 (feat: sync master staging (#1050))
 
 .. note::
 
@@ -43,6 +62,7 @@ Use EvaDB for Text Summarization
    python text_summarization_with_evadb.py
 
 
+<<<<<<< HEAD
 <<<<<<< HEAD
 Loading Data Into EvaDB
 ~~~~~~~~~~~~~~~~~~~~~~~
@@ -98,6 +118,55 @@ Setup SQLite Database
 Prepare sqlite database for MindsDB
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 >>>>>>> 8c5b63dc (release: merge staging into master (#1032))
+=======
+Loading Data Into EvaDB
+~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: sql
+
+    CREATE TABLE IF NOT EXISTS cnn_news_test(
+        id TEXT(128),
+        article TEXT(4096),
+        highlights TEXT(1024)
+      );
+
+Creating Text Summarization Function in EvaDB
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: sql
+
+   CREATE UDF IF NOT EXISTS TextSummarizer
+         TYPE HuggingFace
+         TASK 'summarization'
+         MODEL 'sshleifer/distilbart-cnn-12-6'
+         MIN_LENGTH 5
+         MAX_LENGTH 100;
+
+
+Tuning EvaDB for Maximum GPU Utilization
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+   cursor._evadb.config.update_value("executor", "batch_mem_size", 300000)
+   cursor._evadb.config.update_value("executor", "gpu_ids", [0,1])
+   cursor._evadb.config.update_value("experimental", "ray", True)
+
+
+Text Summarization Query in EvaDB
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: sql
+
+    CREATE TABLE IF NOT EXISTS cnn_news_summary AS
+    SELECT TextSummarizer(article) FROM cnn_news_test;
+
+Use MindsDB for Text Summarization
+-----------------------------------
+
+Setup SQLite Database 
+~~~~~~~~~~~~~~~~~~~~~~
+>>>>>>> 2dacff69 (feat: sync master staging (#1050))
 
 .. code-block:: bash
 
@@ -110,6 +179,7 @@ Prepare sqlite database for MindsDB
 Install MindsDB
 ~~~~~~~~~~~~~~~
 
+<<<<<<< HEAD
 <<<<<<< HEAD
 Follow the `MindsDB installation guide <https://docs.mindsdb.com/setup/self-hosted/pip/source>`_ to install it via ``pip``.
 
@@ -123,20 +193,32 @@ Follow the `Setup for Source Code via pip <https://docs.mindsdb.com/setup/self-h
 
    At the time of this documentation, we need to manually ``pip install evaluate`` for huggingface model to work in MindsDB.
 >>>>>>> 8c5b63dc (release: merge staging into master (#1032))
+=======
+Follow the `MindsDB nstallation guide <https://docs.mindsdb.com/setup/self-hosted/pip/source>`_ to install it via ``pip``.
+
+.. note::
+
+   You will need to manually run ``pip install evaluate`` for the ``HuggingFace`` model to work in MindsDB.
+>>>>>>> 2dacff69 (feat: sync master staging (#1050))
 
 After installation, use the ``MySQL`` client for connecting to ``MindsDB``. Update the port number if needed.
 
 .. code-block:: bash
 
    mysql -h 127.0.0.1 --port 47335 -u mindsdb -p
 
+<<<<<<< HEAD
 <<<<<<< HEAD
 Benchmark MindsDB 
 ~~~~~~~~~~~~~~~~~
 =======
 Run Experiment
 ~~~~~~~~~~~~~~
 >>>>>>> 8c5b63dc (release: merge staging into master (#1032))
+=======
+Benchmark MindsDB 
+~~~~~~~~~~~~~~~~~
+>>>>>>> 2dacff69 (feat: sync master staging (#1050))
 
 Connect ``MindsDB`` to the ``sqlite`` database we created before:
 
@@ -175,6 +257,7 @@ Use the ``text summarization`` model to summarize the CNN news dataset:
    );
 
 
+<<<<<<< HEAD
 <<<<<<< HEAD
 Benchmarking Results
 --------------------
@@ -183,6 +266,10 @@ Benchmarking Results
 ---------------------
 Below are numbers from a server with 56 Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz and two Quadro P6000 GPU.
 >>>>>>> 8c5b63dc (release: merge staging into master (#1032))
+=======
+Benchmarking Results
+--------------------
+>>>>>>> 2dacff69 (feat: sync master staging (#1050))
 
 Here are the key runtime metrics for the ``Text Summarization`` benchmark.