-
Notifications
You must be signed in to change notification settings - Fork 0
Design Documentation
version 1.0 (2023/10/1): initial version
version 1.1 (2023/10/15): updated backend class diagram and implementation details
version 1.2 (2023/11/5): updated acceptance testing
version 1.3 (2023/11/16): added multiplayer section
version 1.4 (2023/11/28): added deploying animated drawings section
version 1.5 (2023/12/02): added difficulties in building an inference server
For frontend, we used Android Java. For backend, we used Django along with Django REST Framework. Django REST Framework was applied to ensure RESTful APIs, which are critical for efficient, standard communication between frontend and backend.
Our server is deployed on an AWS EC2 instance. For storage of drawing and GIF files, we used AWS s3 bucket. The database management is handled through AWS RDS with MySQL.
For animated drawings feature, we implemented API wrapper for Facebook Research AnimatedDrawings library. AnimatedDrawings is separately deployed on the Bacchus Kubernetes cluster on a A100 GPU to enable fast animate process of drawings. The use of NVIDIA's latest flagship GPU provides quick and reliable inference with its compute capabilities.
A key feature of our system is multiplayer drawing functionality. We reduced synchronization latency and server load by the use of socket channel. By using sockets, we enable real-time, responsive interactions among multiple users. Further implementation details are introduced in a later section, "Making Multiplayer More Reliable".
Our Android project primarily consists of two main layers: the UI Layer and the Data Layer.
The UI Layer is responsible for constructing the interface visible to users. It handles tasks such as displaying information, managing user interactions, and processing inputs. Its role is to ensure seamless interaction between users and the app's functionalities.
On the other hand, the Data Layer is responsible for providing the data required by the UI Layer. This layer performs tasks such as fetching data from servers, processing information, and storing data within the app's local storage. Essentially, it acts as a bridge connecting the UI and the underlying data sources.
By segregating responsibilities between these layers, our Android app maintains a clear separation of concerns, enabling efficient development, testing, and maintenance of the application.
The backend project is divided into the routing layer, view layer, serializer layer, and model layer.
For the routing layer, we match the appropriate view function for each URL request. First, we categorize the requests into user, drawing, and family, and match the view function according to the business logic requested in each urls.py.
The view layer performs the business logic based on the requests received through routing. Whether you are sending gallery images or asking the AI server to generate images, the business logic you perform based on each request is different.
The serializer helps to parse the database data needed by the view. It displays and passes the data contained in the model appropriately according to the view's request.
Model provides access to the data contained in the actual DB according to the ORM. The data in the DB can only be accessed with SQL, but through the usage of model, you can handle each table in an object-oriented way.
The above diagram shows the data models of our project. The main tables consist of Drawing, User, and Family. Drawing table has image and gif urls fields which store s3 bucket addresses of final image and gif files. Drawing table's type field indicates the current stage of the drawing. User table contains information needed for authentication and family role. Family table exists to group users in units of family. DrawingUser and FamilyUser tables exist to establish many to many relationship between Drawing and User, and Family and User. DrawingUser table stores participants of drawing, while FamilyUser table stores members of family.
Description | Endpoint | Method | Details |
---|---|---|---|
List Drawings | /drawing |
GET | Retrieve multiple drawings. |
Create Drawing | /drawing |
POST | Create a new drawing. |
Get Single Drawing | /drawing/{id} |
GET | Retrieve a single drawing by its ID. |
Join Drawing | /drawing/{id}/join |
POST | Join a drawing created by a different user. |
Submit Single Drawing | /drawing/{id}/submit |
Put | Submit a drawing after completion. |
Upload Real-Time Drawing | /drawing/{id}/canvas |
POST | Upload real-time drawing updates. |
Register User | /user |
POST | Register a new user. |
Login User | /user/login |
POST | Log in as a user. |
Logout User | /user/logout |
POST | Log out the current user. |
List Family Members | /family |
GET | Retrieve a list of the user's family members. |
interface DrawingListRequest {
user_id: number;
}
Retrieve multiple drawings.
interface DrawingListResponse {
drawings: Drawing[];
}
interface Drawing {
id: number; // Primary Key
title: string;
description: string;
image_url: string;
ai_image_url: string;
gif_url: string;
type: 'raw' | 'processed' | 'animated';
host_id: number;
participants: FamilyMemberResponse[];
voice_id: number;
created_at: Date;
updated_at: Date;
}
interface FamilyMemberResponse {
username: string;
gender: 'Male' | 'Female' | 'Other';
type: 'Parent' | 'Child';
}
interface DrawingCreateRequest {
host_id: number;
}
Submit a new drawing.
interface DrawingCreateResponse {
id: number;
host_id: number;
invitation_code: string; // drawing의 id를 hash해서 생성
created_at: Date;
}
Retrieve a single drawing by its ID. Returns a Drawing
object.
interface Drawing {
id: number; // Primary Key
title: string;
description: string;
image_url: string;
ai_image_url: string;
gif_url: string;
type: 'raw' | 'processed' | 'animated';
user_id: number;
voice_id: number;
created_at: Date;
updated_at: Date;
}
interface DrawingJoinRequest {
user_id: number;
invitation_code: string;
}
interface DrawingSubmitRequest {
file: File;
title: string;
description: string;
host_id: number;
voice_id: number;
}
Submit a drawing after completion.
interface Drawing {
id: number; // Primary Key
title: string;
description: string;
image_url: string;
ai_image_url: string;
gif_url: string;
type: 'RAW' | 'PROCESSED' | 'ANIMATED';
host_id: number;
voice_id: number;
created_at: Date;
updated_at: Date;
}
interface DrawingCanvasRequest {
id: number; // Primary Key
title: string;
description: string;
image_url: string;
ai_image_url: string;
gif_url: string;
type: 'raw' | 'processed' | 'animated';
user_id: number;
voice_id: number;
created_at: Date;
updated_at: Date;
}
interface UserCreateRequest {
username: string;
password: string; // sha256
gender: 'Male' | 'Female' | 'Other';
type: 'Parent' | 'Child';
}
Register a new user.
interface User {
id: number; // Primary Key
username: string;
password: string; // sha256
gender: 'Male' | 'Female' | 'Other';
type: 'Parent' | 'Child';
family_id: number;
created_at: Date;
}
interface UserLoginRequest {
username: string;
password: string;
}
Log in as a user. Returns a User
object.
interface User {
id: number; // Primary Key
username: string;
password: string; // sha256
gender: 'Male' | 'Female' | 'Other';
type: 'Parent' | 'Child';
family_id: number;
created_at: Date;
}
Log out the current user.
Retrieve a list of the user's family members.
interface FamilyResponse {
users: User[];
}
interface User {
id: number; // Primary Key
username: string;
password: string; // sha256
gender: 'Male' | 'Female' | 'Other';
type: 'Parent' | 'Child';
family_id: number;
created_at: Date;
}
The technical challenge we faced when designing this service was to implement multiplayer collaboration of drawings. How can two or more people view the same, synchronized drawing at the same time? How can we minimize the real-time delay while placing light burden on the server?
The first thought that came to our mind was API polling. The idea was to store the drawing on the server, and the client would request the drawing from the server at regular intervals. When the client modified the drawing, the client would send a request to the server to upload the modified part, and the server would combine the layers of the drawing from different clients into one and store it in s3 bucket.
However, this method had two critical drawbacks. The first was that it resulted in slow real-time update. The client has no way of knowing if a picture has been modified, so it has to make API requests to the server at regular intervals, which inevitably introduces a delay. The second problem was that these requests put too much load on the server. If we decrease the time interval to reduce the delay, the server will be overwhelmed with requests, which will grow exponentially as the number of participants increases. Also, merging operation of several layers of drawings was very heavy, and we didn't like the fact that the intermediate stages of the drawings kept piling up in the storage. With these clear drawbacks, we thought about how to improve it.
To solve the first problem of slow real-time update, we decided to implement socket. Unlike HTTP communication, socket maintains a connection over a port, which allows for real-time, two-way communication. This makes sockets much more real-time than API polling.
We implemented this using the websocket library pusher. The workflow is as follows.
- Create a pusher channel on the server
- Client subscribes to channel
- Server delivers event to channel
- All subscribed clients receive the event
Drawing modifications were passed to the server via the POST API.
To solve the second problem, which was placing too heavy load on server, instead of sending the entire drawing file as we originally thought, we decided to send only the drawing stroke data and not store the drawing on the server at all until the drawing is complete.
If you think about it, if the client has rendered all the strokes from start to finish, there's no need to store a separate drawing file. If we only send and share data for one stroke each time we draw it, instead of polling the API every few hours, there's no need to process it on the server. Then we don't have to worry about merging existing drawings and layers in the repository, and we don't have to store intermediate drawing files.
The final flow we took with this approach is as follows.
- When you draw a stroke, you pass the stroke information to the server via the POST API
- The server passes the drawing stroke information as an event to the pusher channel
- All clients subscribed to the channel receive the stroke information
By making the transferred data light and reducing server operation, you can draw on device 1 and have that modification immediately visible on device 2.
We incorporated facebook research's AnimatedDrawings to animate the drawing. The repository provides two parts: 1) torchserve dockerfile for humanoid pose detection, and 2) graphics code which creates motion gifs based on detected pose.
Instead of having backend server handle machine learning and graphics workloads, we decided it was a better design to separately create an inference server and enable communication between them. Our team implemented flask API wrapper on top of AnimatedDrawings. Because Amazon EC2 was not an efficient option for our AI model, we deployed our inference server on an A100 GPU server provided by Bacchus. We created docker images of torchserve and API wrapper and uploaded them to harbor registry.
We faced several difficulties in incorporating Animated Drawings to our service. The most critical difficulty was in environment setting with Kubernetes and CUDA version.
First main problem was in setting up Kubernetes. Within Animated Drawings, two projects are communicating on localhost, so we created a single pod with two containers. One container hosted torchserve docker image, and the other hosted API-wrapped docker image. Two containers communicated via port forwarding. External requests were handled in container with API wrapper, and this container called torchserve container for ML inference. After receiving inference response, container with API wrapper handles creation of gifs and return the results.
We received permission to deploy a web server on the university's GPU Kubernetes server. However, this server was not optimized for web development needs. For instance, there was no lookup permission for the node. In this situation, it was very difficult to look up the external IP of the node and open the port of the pod to enable external access.
There was also an issue with exposing the port of the pod, so that the inference server could be accessed from outside of the cluster. To solve this problem, we created service and deployment yaml file for Kubernetes and applied them through following command.
kubectl apply -f service.yaml
kubectl apply -f deployment.yaml
We configured node port and target port in service.yaml to properly deliver the request to the inference server.
Second main problem was in properly setting the torchserve container. Despite successfully building a torchserve image and uploading it to the harbor registry, we faced persistent issues with torchserve failing to return accurate inference responses.
Within containers utilizing the mmcv model, we linked the CUDA compiler through mmcv-full. However, we encountered a persistent bug where the system failed to locate the CUDA compiler. We found out that the cause of this issue was from incompatibilities between the CUDA connection and the NVIDIA driver when running the Docker file in a Kubernetes environment, as detailed in the following link: https://github.com/open-mmlab/mmdetection/issues/4471. Furthermore, the CUDA versions that fit were incompatible due to minor version differences, and the ML model, made by Tsinghua University, primarily featured Chinese documentation which added difficulty due to language barrier. Finally, the base image of the dockerfile that the model was put on was very outdated from debian and had many unsupported libraries. In the end, resolving this issue involved tedious and time-consuming version troubleshooting, spanning from mmcv - mmcv CUDA complier - pytorch - python - CUDA toolkit - nvidia driver.
We first checked the state of GPU through the following command:
nvidia-smi
Above command displays GPU's model name and driver version. Then, we installed CUDA, driver, torch, mmcv, and python versions that were compatible with GPU and to each other.
conda install cuda -c nvidia/label/cuda-11.6.0
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
pip install mmcv-full==1.6.2 -f https://download.openmmlab.com/mmcv/dist/cu116/torch1.12/index.html
Finally, there was a problem with PyOpenGL, a computer graphics library for python. We set error_checker to None within the library file and installed following packages:
sudo apt-get install libosmesa6 libosmesa6-dev
The final problem was the sheer amount of time it took for graphics generation. Animation creation time ranged anywhere from 3 to 6 minutes, which was very much unexpected. If the drawn character was simple and does not have much complexity, creation time tended to be shorter, but if the drawn character was complex and had a range of colors and lines, creation time lagged further. After checking the logs, we found out that while the model inference was done in 1-2 seconds, the majority of the time was spent creating each graphics frame. To reduce this animation time, we removed redundant preprocessing, reduced the animation frames by a quarter with blender, and increased the number of workers and threads with gunicorn to solve the single-threaded serving issue in flask. As a result, we succeeded in reducing the animation time by 1/6, from 30 seconds to 80 seconds. Our work is now much faster than the original project we forked.
The image below displays how we used Blender, open-source 3D computer graphics software tool, to optimize the animation.
The following frameworks will be used for unit tests.
-
Android: JUnit
-
Django: pytest-django
Integration tests will be conducted through Espresso.
In client, we will strive for 70% code coverage for both unit and integration tests. In server, we will strive for 75% code coverage for all API endpoints in integration tests.
Plans are to test following user stories.
- As an end user, I want to 1) sign up 2) login, so that I can access and make use of the services provided by LittleStudio.
-
Scenario: The end users clicks on the 1) “Sign Up” 2) “Login” button
- Given: The end user is in the 1) sign up 2) login page
- When: The end user 1) completes the registration process 2) inputs the correct credentials
- Then: The end user should login and be redirected to the tutorial page
2a. As an end user, I want to create a drawing and draw in real time, so that I can build shared memories and artworks.
-
Scenario: The end user is in the “My Gallery” page
- Given: The end user successfully logs in
- When: 1) The end user clicks on the “plus” button via the menu bar, then clicks on the “Create a drawing” button 2) All of the (desired) collaborators join the waiting room and the end user clicks on the “Start Drawing” button
- Then: The end user should be 1) redirected to the waiting room page where they can see a list of collaborators 2) able to draw simultaneously in real time with the collaborators
2b. As an end user, I want to join a drawing and draw in real time, so that I can build shared memories and artworks.
-
Scenario: The end user is in the “My Gallery” page
- Given: The end user successfully logs in
- When: 1) The end user clicks on the “plus” button via the menu bar, clicks on the “Join a drawing” button, and inputs the invitation code. 2) All of the (desired) collaborators join the waiting room and the host (end user who has created the drawing (refer to scenario 3a)) clicks on the “Start Drawing” button
- Then: The end user should be 1) redirected to the waiting room page where they can see a list of collaborators 2) able to draw simultaneously in real time with the collaborators
- As an end user, I want to add a title and description to a finished drawing, so that I can provide context and narration for my drawing.
-
Scenario: The end user should be able to add context to their drawing
- Given: The end user finishes drawing
- When: The end user 1) clicks on the “Finish” button 2) adds a title and description of the drawing and clicks on the “Submit” button
- Then: The end user should be 1) redirected to the submit drawing page 2) redirected to the view drawing page (refer to the “Then” section in scenario 5) and the submitted drawing should be added to the gallery page
- As an end user, I want to view a specific drawing, so that I can appreciate the creativity provided by the animated drawing and view the details of that specific drawing.
-
Scenario: The end user should be able to see their drawing with the details
- Given: The end user is in the “My Gallery” page
- When: The end user clicks on a specific drawing
- Then: The end user should be able to see the original collaborative drawing, animated versions of the drawing, and the information about the drawing, including the title, date created, participants, and description
- As an end user, I want to see a list of my family members and logout.
-
Scenario: The end user has successfully logged in
- Given: The end user clicks on the “My Family Page” button on the navigation bar
- When: 1) The end user is in the “My Family Page” page 2) clicks on the “Logout” button
- Then: The end user should be 1) able to see a list of their family members 2) redirected to the login page