Skip to content

Commit

Permalink
updated reports folder
Browse files Browse the repository at this point in the history
  • Loading branch information
lisijia6 committed Dec 12, 2023
1 parent 49e4e8b commit 807522d
Show file tree
Hide file tree
Showing 4 changed files with 269 additions and 16 deletions.
2 changes: 1 addition & 1 deletion reports/milestone2.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ In this project we aim to develop an educational application that provides insta

The app pipeline flow is as shown:

<img src="pictures/science_tutor_app_pipeline.png" width="600">
<img src="../pictures/science_tutor_app_pipeline.png" width="600">

## Project Organization

Expand Down
10 changes: 5 additions & 5 deletions reports/milestone3.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Application Pipeline Flow

<img width="1362" alt="image" src="pictures/science_tutor_app_pipeline.png">
<img width="1362" alt="image" src="../pictures/science_tutor_app_pipeline.png">

## Project Organization
.
Expand Down Expand Up @@ -84,10 +84,10 @@ V100 unfortunately does not support bf16. We tried fp16 but due to Huggingface i
The images below show the training output from our Weights & Biases Page. The Weights & Biases Page tracks our model training process. This is done by using the `wandb` library that we included in our `task.py` Python script.

Train Tracking:
<img width="1362" alt="image" src="pictures/wandb_train.png">
<img width="1362" alt="image" src="../pictures/wandb_train.png">

System Tracking:
<img width="1362" alt="image" src="pictures/wandb_system.png">
<img width="1362" alt="image" src="../pictures/wandb_system.png">

## Serverless Training

Expand All @@ -102,10 +102,10 @@ sh cli.sh
```

Google Cloud Storage Bucket with our training code stored in `trainer.tar.gz`:
<img width="1362" alt="image" src="pictures/gcs_model_bucket.png">
<img width="1362" alt="image" src="../pictures/gcs_model_bucket.png">

Vertex AI showing our attempts for model training (currently we are still restricted by Vertex AI's GPU quota and cannot load our model into memory):
<img width="1362" alt="image" src="pictures/vertex_ai_model_training.png">
<img width="1362" alt="image" src="../pictures/vertex_ai_model_training.png">


## Code Structure
Expand Down
20 changes: 10 additions & 10 deletions reports/milestone4.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# AC215 - ScienceTutor

## Application Pipeline Flow
<img width="1362" alt="image" src="pictures/science_tutor_app_pipeline2.png">
<img width="1362" alt="image" src="../pictures/science_tutor_app_pipeline2.png">

## Vertex AI Pipeline for ML Workflow
<img width="800" alt="image" src="pictures/ml_workflow.png">
<img width="800" alt="image" src="../pictures/ml_workflow.png">

## Project Organization
.
Expand Down Expand Up @@ -110,10 +110,10 @@ V100 unfortunately does not support bf16. We tried fp16 but due to Huggingface i
The images below show the training output from our Weights & Biases Page. The Weights & Biases Page tracks our model training process. This is done by using the `wandb` library that we included in our `task.py` Python script.

Train Tracking:
<img width="1362" alt="image" src="pictures/wandb_train.png">
<img width="1362" alt="image" src="../pictures/wandb_train.png">

System Tracking:
<img width="1362" alt="image" src="pictures/wandb_system.png">
<img width="1362" alt="image" src="../pictures/wandb_system.png">

## Serverless Training

Expand All @@ -128,10 +128,10 @@ sh cli.sh
```

Google Cloud Storage Bucket with our training code stored in `trainer.tar.gz`:
<img width="1362" alt="image" src="pictures/gcs_model_bucket.png">
<img width="1362" alt="image" src="../pictures/gcs_model_bucket.png">

Vertex AI showing our attempts for model training (currently we are still restricted by Vertex AI's GPU quota and cannot load our model into memory):
<img width="1362" alt="image" src="pictures/vertex_ai_model_training.png">
<img width="1362" alt="image" src="../pictures/vertex_ai_model_training.png">

## Dataset Evaluation

Expand Down Expand Up @@ -176,14 +176,14 @@ docker build -t ui .
docker run --gpus all -p 7860:7860 -t ui
```
An example conversation with our model is shown below:
<img width="1362" alt="image" src="pictures/web_server_demo.png">
<img width="1362" alt="image" src="../pictures/web_server_demo.png">

For online deployment, we have attempted to deploy our model on Vertex AI Endpoint, via the script in [`src/src/model_deploy/failed_vertex_ai_script.py`](src/model_deploy/failed_vertex_ai_script.py).
However we are advised by Shivas that Vertex AI is not suitable for our use case, because Vertex AI takes only a model and build the API endpoint for you while we have our own service and API, like a web server.
We are then suggested to try compute engine or cloud run. However there is no GPU support for cloud run, so as a workaround, we use compute engine instead.

In this project we deploy our model, as well as the Web UI on Google compute engine, where instance starts from our customized docker:
<img width="1362" alt="image" src="pictures/compute_engine.png">
<img width="1362" alt="image" src="../pictures/compute_engine.png">
As we quantized our model, we have successfully reduced the memory usage and are able to deployed our model on a T4 GPU with n1-highmem-2 instance.
Note that there is an external IP assigned, so that user can directly go to `http://34.125.115.138:7860/` to access our service.
We have stopped the instance to save cost as keeping it running all day would quickly exhaust our credits. Please contact us if you want to try it out, and we will start the instance for you.
Expand Down Expand Up @@ -319,9 +319,9 @@ python3 cli.py -w # Run the ScienceTutor App Pipeline (Data Processor and Model

The Data Processor component processes the ScienceQA dataset from huggingface and converts the data into LLaVA format for training the LLaVA model. The Model Training component takes the packaged model training code and sends job to Vertex AI to finetune the LLaVA model on the science domain. The Vertex AI Pipeline for ML Workflow is shown below (the Compute Engine component is discussed in the Model Deploy Container section).

<img width="800" alt="image" src="pictures/ml_workflow.png">
<img width="800" alt="image" src="../pictures/ml_workflow.png">

<img width="800" alt="image" src="pictures/ml_workflow_pipeline_run.png">
<img width="800" alt="image" src="../pictures/ml_workflow_pipeline_run.png">


#### (4) Model Deploy Container
Expand Down
253 changes: 253 additions & 0 deletions reports/milestone5.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
# AC215 - ScienceTutor

## Project Organization
```
.
├── LICENSE
├── README.md
├── notebooks
│   └── AC215_milestone3_model_training.ipynb
├── pictures
│   ├── apidoc.png
│   ├── chatbot.png
│   ├── compute_engine.png
│   ├── gcs_model_bucket.png
│   ├── k8s.png
│   ├── ml_workflow.png
│   ├── ml_workflow_pipeline_run.png
│   ├── postman.png
│   ├── science_tutor_app_pipeline.png
│   ├── science_tutor_app_pipeline2.png
│   ├── solution_architecture.png
│   ├── technical_architecture.png
│   ├── vertex_ai_model_training.png
│   ├── wandb_system.png
│   ├── wandb_train.png
│   └── web_server_demo.png
├── presentations
│   ├── AC215-midterm-demo.mp4
│   └── AC215-midterm.pdf
├── references
├── reports
│ ├── milestone2.md
│   ├── milestone3.md
│   └── milestone4.md
└── src
├── api-service
│   ├── Dockerfile
│   ├── api
│   │   └── model_backend.py
│   ├── docker-shell.sh
│   └── requirements.txt
├── app_deploy
│   ├── Dockerfile
│   ├── deploy-create-instance.yml
│   ├── deploy-docker-images.yml
│   ├── deploy-k8s-cluster.yml
│   ├── deploy-provision-instance.yml
│   ├── deploy-setup-containers.yml
│   ├── deploy-setup-webserver.yml
│   ├── docker-entrypoint.sh
│   ├── docker-shell.sh
│   ├── inventory.yml
│   └── nginx-conf
│   └── nginx
│   └── nginx.conf
├── data_processing
│   ├── Dockerfile
│   ├── ScienceQA-LLAVA.dvc
│   ├── convert_scienceqa_to_llava.py
│   ├── docker-shell.sh
│   ├── requirements.txt
│   ├── upload_to_gcs.py
│   ├── upload_to_hf.py
│   └── utils.py
├── frontend
│   ├── Dockerfile
│   ├── Dockerfile.dev
│   ├── docker-shell.sh
│   ├── index.html
│   ├── node_modules
│   ├── package-lock.json
│   ├── package.json
│   ├── public
│   │   ├── send.png
│   │   ├── student.png
│   │   ├── teacher.png
│   ├── src
│   │   ├── App.css
│   │   ├── App.jsx
│   │   ├── index.css
│   │   └── main.jsx
│   └── vite.config.js
├── ml_workflow
│   ├── Dockerfile
│   ├── Pipfile
│   ├── Pipfile.lock
│   ├── cli.py
│   ├── docker-entrypoint.sh
│   ├── docker-shell.sh
│   ├── model.py
│   ├── model_deploy.yaml
│   ├── model_training.yaml
│   └── pipeline.yaml
├── model_deploy
│   ├── Dockerfile
│   ├── api_example
│   │   ├── req.json
│   │   └── websocket_streaming.py
│   ├── docker-shell.sh
│   └── failed_vertex_ai_script.py
├── model_inference
│   ├── compute_metric.py
│   └── model_vqa_science.py
└── model_training
├── Dockerfile
├── Pipfile
├── Pipfile.lock
├── cli.sh
├── docker-entrypoint.sh
├── docker-shell.sh
├── download_from_gcs.py
├── download_from_hf.py
├── package
│   ├── PKG-INFO
│   ├── setup.cfg
│   ├── setup.py
│   └── trainer
│   ├── __init__.py
│   ├── task.py
│   └── wandb_api.py
├── package-trainer.sh
├── trainer-yp.tar.gz
├── upload_model_to_gcs.py
└── upload_trainer_to_gcs.py
```

## AC215 - Milestone5 - ScienceTutor

**Team Members** Sijia (Nancy) Li, Ziqing Luo, Yuqing Pan, Jiashu Xu, Xiaohan Zhao

**Group Name** Science Tutor

**Project** In this project we aim to develop an educational application that provides instant and expert answers to science questions that children have in different domains such as natural, social and language science.

### Milestone5
After completions of building a robust ML Pipeline in our previous milestone we have built a backend api service using Flask and a frontend web app using React.
This will be our user-facing application that ties together the various components built in previous milestones.

## Application Design
Before we start implementing the app we built a detailed design document outlining the application’s architecture.
We built a Solution Architecture abd Technical Architecture to ensure all our components work together.

### Solution Architecture
<img width="1362" alt="image" src="../pictures/solution_architecture.png">

### Technical Architecture
<img width="1362" alt="image" src="../pictures/technical_architecture.png">

### Backend API
We built backend api service using Flask to expose model functionality to the frontend.

We provide a `/chat` endpoint with `POST` method. You can check `/apidocs` for Swagger UI API docs.
<img width="1362" src="../pictures/apidoc.png">

We also used postman to test the API.
<img width="1362" src="../pictures/postman.png">

### Frontend
A user friendly React app was built to interact with the Science Tutor chatbot in the web browser using the Llava-7b model finetuned on ScienceQA. Using the app, a user can type a question and upload an image, and then send the messages to the chatbot. The app will send the text and image (if an image is uploaded) to the backend api to get the model's output on what the answer will be to the given question (and image). Once the app gets the response from the backend api, the app will then reply to the user in the chat.

Here is a screenshot of our app:
<img width="1362" alt="image" src="../pictures/chatbot.png">


### Deployment
We used Ansible and Kubernetes to create, provision, and deploy our frontend and backend to GCP in an automated fashion.

We successfully created the Kubernetes cluster for our app in GCP:
<img width="1362" alt="image" src="../pictures/k8s.png">


## Code Structure

The following are the folders from the previous milestones:
```
- data_processing
- model_training
- model_inference
- model_deploy
- ml_workflow
```

### API Service Container

This container has the python file `api/model_backend.py` to run and expose the backend apis.

To run the container locally:
* Open a terminal and go to the location where `src/api-service`
* Run `sh docker-shell.sh`
* The backend server is launched at `http://localhost:5000/` and `http://127.0.0.1:5000`
* Go to `http://127.0.0.1:5000/chat` to interact with the endpoint
* Go to `http://127.0.0.1:5000/apidocs` to view the APIs

### Frontend Container
This container contains all the files to develop and build a react app. There are dockerfiles for both development and production.

To run the container locally:
* Open a terminal and go to the location where `src/frontend`
* Run `sh docker-shell.sh`
* Once inside the docker container, run `npm install`
* Once `npm` is installed, run `npm start`
* Go to `http://localhost:8080` to access the app locally

### Deployment Container
This container helps manage building and deploying all our app containers. This can be achieved with Ansible, with or without Kubernetes.

To run the container locally:
* Open a terminal and go to the location `AC215_ScienceTutor/src/app_deploy`
* Run `sh docker-shell.sh`

#### Deploy with Ansible and Kubernetes

* Build and Push Docker Containers to GCR
```
ansible-playbook deploy-docker-images.yml -i inventory.yml
```

* Create and Deploy Cluster
```
ansible-playbook deploy-k8s-cluster.yml -i inventory.yml --extra-vars cluster_state=present
```
Once the command runs go to `http://<YOUR INGRESS IP>.sslip.io`

#### Deploy with Ansible

* Build and Push Docker Containers to GCR
```
ansible-playbook deploy-docker-images.yml -i inventory.yml
```

* Create Compute Instance (VM) Server in GCP
```
ansible-playbook deploy-create-instance.yml -i inventory.yml --extra-vars cluster_state=present
```

* Provision Compute Instance in GCP
Install and setup all the required things for deployment.
```
ansible-playbook deploy-provision-instance.yml -i inventory.yml
```

* Setup Docker Containers in the Compute Instance
```
ansible-playbook deploy-setup-containers.yml -i inventory.yml
```

* Setup Webserver on the Compute Instance
```
ansible-playbook deploy-setup-webserver.yml -i inventory.yml
```
Once the command runs go to `http://<External IP>`

---

0 comments on commit 807522d

Please sign in to comment.