Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving Self-Hosting and Removing 3rd Party dependencies. #4465

Open
wants to merge 71 commits into
base: main
Choose a base branch
from

Conversation

Podginator
Copy link
Contributor

@Podginator Podginator commented Oct 30, 2024

The intent of this PR is to improve the Self-Hosting documentation, to provide a working setup to get Omnivore working with Docker and Docker Compose. It intends to, as much as possible, remove third party dependencies and reliance on external infrastructure providers such as GCP.

The aim is to establish feature parity, or near feature parity to the previously hosted service. This includes RSS support, webhook support, email newsletter, and PDF Support.

The list of changes to date is below:

  • Create Dockerfile for Queue processing, which is used for parsing articles, alongside asynchronous tasks.

  • Update and expose ImageProxy and use the latest version with ARM64 support.

  • Create new docker-compose file in self-hosting/docker-compose.

  • Provide a minimal .env file to be able to run the service using docker-compose.

  • Created a guide for using Cloudflare Tunnels as a way to integrate with a device at your home.

  • Create a NGINX configuration for those looking to use NGINX Reverse Proxying for the service.

  • Replace use of Google Cloud Storage with Minio an open-source layer compatible with the S3 API that can run on Device.

    • This also allows other services, such as R2 and S3 to be the Storage Provider, if wanted.
  • Improvements to content-fetching to minimise instances where articles refused to parse.

    • Also improved to not use puppeteer for some articles, instead relying on raw html.
  • Overhaul the way email works, to ensure that there is an open source version. Three options are provided here.

    • Docker Mailserver: A production-ready fullstack but simple containerized mail server. This allows incoming emails to be received, parsed, and then added to Omnivore.
    • Amazon Simple Email Service A service provided by S3 that has a free tier. Allows for receiving of emails to a domain. Guide on how to set up in the Self-hosting readme.
    • Zapier: Used as a way to integrate gmail to hosting. This can be realistically achieved using some of the gmail apis, also.
  • Replace pspdfkit - Which required a license and would display the following when using PDFS image

    • Have an option for the Native Browser PDF Viewer for PDF Files. This removes the highlight functionality, but is stable.
      image
    • Create a new pdf viewer using PDF.js an open source pdf library used as the backing for the PDF viewer in firefox. This option includes near feature parity (highlights, reading progress) with the pspdfkit, but may have some bugs.
      image
  • Add some additional fixes to parsing articles, such as a Medium Parser, and a Wired parse

  • Updated Docker images and software to the latest LTS version of Node (20.12)

To-Do:

  • Re-Enable Youtube features - such as extraction of Transcripts.
    • Allow both an AI based feature for this, and a less formatted version.
  • Provide a guide on how to get up and running user Kubernetes.
  • Provide a guide on how to get up and running with Tailscale.
  • Provide a guide on getting email to work with G-Mail without the use of an external server.
  • Attempt to provide a lighter-weight queuing system, and removal of Redis/Caching for single-user hosting.

Copy link

vercel bot commented Oct 30, 2024

@Podginator is attempting to deploy a commit to the omnivore Team on Vercel.

A member of the Team first needs to authorize it.

@KraXen72 KraXen72 mentioned this pull request Nov 1, 2024
@samanthavbarron
Copy link

samanthavbarron commented Nov 1, 2024

This is awesome, it looks like it's taking shape! I might try it out this weekend.

Are there contributions from the community that you can think of that would be helpful for you?

import { getSignedUrl } from '@aws-sdk/s3-request-presigner'
import type { Readable } from 'stream'

// While this is listed as S3, for self hosting we will use MinIO, which is
Copy link

@lovebes lovebes Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use Cloudflare R2 as well for self hosting? What was the decision behind using MinIO?
Asking because R2 is also S3 compatible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not actually familiar with R2 - but anything that is S3 Compatible should work. Let me take a look later to see whether or not the Storage Client I built works with it.

Minio was chosen because it can be self-hosted along with the rest of the application. There is a docker image, and it can all run on the same server without relying on anything external.

I'm trying to ensure everything here can be run self-contained without any need for external services.

That said, as with some of the email changes, I am looking into ways to simplify parts of it too, and having some external services is ok with me.

Copy link

@Mikilio Mikilio Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To find suitable services, I recommend consulting r/self-hosted.
Love the work so far.

Copy link

@volker-fr volker-fr Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3 is a nice idea, provides various options, including self hosted ones.

How about local storage? This would reduce the required dependencies by one.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wow I didn't know Minio can be self-hosted! That sounds like a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3 is a nice idea, provides various options, including self hosted ones.

How about local storage? This would reduce the required dependencies by one.

The uploads are done via signed-urls, so while local-storage would be feasible it'd require a bit more development work.

@jsifalda
Copy link

jsifalda commented Nov 3, 2024

does it mean i would be able to deploy open-source omnivore to vercel? and be able to use this great app even after their shut-down? 🙏

@Podginator Podginator changed the title Self-Hosting Changes Improving Self-Hosting and Removing 3rd Party dependencies. Nov 3, 2024
@RayBB
Copy link

RayBB commented Nov 3, 2024

If this gets worked out I'll add a template for easy self-hosting with Coolify

@alexkotusenko
Copy link

Are you guys planning to add a docker container to self-host Omnivore?

@CLAassistant
Copy link

CLAassistant commented Dec 13, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ Podginator
✅ m1xxos
❌ weeebdev
You have signed the CLA already but the status is still pending? Let us recheck it.

@luca-git
Copy link

luca-git commented Jan 5, 2025

Will this ever happen?

@Podginator
Copy link
Contributor Author

@luca-git I haven't got the permission to merge, but this branch is a perfectly viable way of getting the application up and running as is. I'll speak with @jacksonh about when the merge should take place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.