Skip to content

Docker Installation

Yuzhang Hu edited this page Sep 2, 2024 · 11 revisions

Preparison

[UI] Create Notion Entry Page

After creating the Notion Token, go to Notion, create a page as the main entry (For example Readings page), and enable Notion Integration for this page

image

[Backend] Create Environment File

Checkout the repo and copy .env.template to build/.env, then fill up the environment vars:

  • NOTION_TOKEN
  • NOTION_ENTRY_PAGE_ID
  • OPENAI_API_KEY
  • [Optional] REDDIT_CLIENT_ID and REDDIT_CLIENT_SECRET
  • [Optional] Vars with TWITTER_ prefix

Double check LLM_PROVIDER=xxx, default is openai, we could switch to Google Gemini or Ollama, also fill the values accordingly, e.g.

  • LLM_PROVIDER=openai:
    • OPENAI_API_KEY=sk-xxx
    • OPENAI_MODEL=gpt-3.5-turbo-0125
  • LLM_PROVIDER=google:
    • GOOGLE_MODEL=gemini-1.5-flash-latest
    • GOOGLE_API_KEY=xxx
  • LLM_PROVIDER=ollama
    • OLLAMA_MODEL=llama3
    • OLLAMA_URL=http://<ollama_hostname>:11434

Check EMBEDDING_PROVIDER=xxx, default is openai, we could switch to HuggingFace or Ollama as the provider, and also modify the EMBEDDING_MODEL accordingly (Tips: For Huggingface and Ollama, Make sure the embedding models are pre-downloaded and ready to use). e.g.

  • EMBEDDING_PROVIDER=openai
    • EMBEDDING_MODEL=text-embedding-ada-002
  • EMBEDDING_PROVIDER=hf
    • EMBEDDING_MODEL=all-MiniLM-L6-v2
  • EMBEDDING_PROVIDER=ollama
    • EMBEDDING_MODEL=nomic-embed-text

Notes: Replace <ollama_hostname> with the actual Ollama service hostname, and make sure the node is accessible.

[Backend] Deploy Services

Notes: All make commands below are executed at the root of the auto-news source code folder.

make deps && make deploy

[Backend] Start Services

make start

Now that the services are running, it will pull sources every hour.

[UI] Set up Notion Tweet/RSS/Reddit list

Go to the Notion entry page we created before, and we will see the following folder structure has been created automatically:

Readings
├── Inbox
│   ├── Inbox - Article
│   └── Inbox - YouTube
│   └── Inbox - Journal
├── Index
│   ├── Index - Inbox
│   ├── Index - ToRead
│   ├── RSS_List
│   └── Tweet_List
│   └── Reddit_List
└── ToRead
    └── ToRead
  • Go to RSS_List page, and fill in the RSS name and URL
  • Go to Reddit_List page, and fill in the subreddit names
  • Go to Tweet_List page, and fill in the Tweet screen names (Tips: Paid Account Only)

[UI] Set up Notion database views

Go to the Notion ToRead database page; all the data will flow into this database later on. Create the database views for different sources to help us organize flows more easily, E.g., Tweets, Articles, YouTube, RSS, etc. You may want to watch this video to get an initial idea of how to define your personal database views that you customize yourself.

Now, enjoy and have fun.

Operations

[Monitoring] Control Panel

For troubleshooting, we can use the URLs below to access the services and check the logs and data.

Service Role Panel URL
Airflow Orchestration http://localhost:8080
Milvus Vector Database http://localhost:9100
Adminer DB accessor http://localhost:8070

Go to http://localhost:8080, and use the default Airflow account and password airflow to log in (To change it, modify _AIRFLOW_WWW_USER_USERNAME and _AIRFLOW_WWW_USER_PASSWORD: in the docker/docker-compose.yaml file):

Go to http://localhost:8070, and use the below default account to sign in to the database accessor (To change it, modify MYSQL__ env vars from build/.env file):

Go to http://localhost:9100, and click Connect button to log in to the Milvus vector database management UI

Stop/Restart Services

# Stop all services
make stop

# Restart all services
make stop && make start

Redeploy .env and DAGs

Modify build/.env, then:

make stop && make deploy && make start

Upgrade to the latest code

git pull && make stop && make deploy && make start

Rebuild Docker Images

make stop && make build && make start