Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about.html: Include the presence of gVisor as additional security layer #41

Conversation

EtiennePerot
Copy link
Contributor

@EtiennePerot EtiennePerot commented Jul 28, 2024

This updates about.html to reflect the addition of gVisor as an extra layer of security in the Dangerzone document handling process.

The document sometimes conflated "container" and "sandbox", which is understandable because they were effectively the same thing before adding this extra sandboxing layer in the middle. Now it uses "container" only when talking about containers, otherwise it uses "sandbox". index.html was already using this language, so no update needed there.

Corp shill check: There is only one outgoing link to gvisor.dev which I can remove if you'd prefer. It is marked target="_blank" rel="noopener noreferrer" as are other external links on the page. This mentions the word "gVisor" fewer times than "Linux"; the word "gVisor" is used only when (a) first talking about the use of sandboxing in Dangerzone, and (b) when talking about specific gVisor components like the kernel/syscall filters. Other parts of the page use the unqualified word "sandbox" instead.

$ grep gVisor about.html | wc -l
5
$ grep Linux about.html | wc -l 
8

This PR is built on @apyrgio's 2024-05-see-also branch on the assumption that it will be merged into main, and so that the diff shown on GitHub only shows the difference against that branch. My intention is that once #39 is merged, I can rebase and this PR should be edited to have its target branch set to main.

@apyrgio
Copy link
Contributor

apyrgio commented Jul 29, 2024

Thanks a lot for the contribution Etienne 🤩. I haven't managed to look into it just yet, because I want to tie some other loose ends before switching to it (and documenting gVisor's usage in Dangerzone in general). I'll comment on it as soon as possible though.

@apyrgio
Copy link
Contributor

apyrgio commented Aug 19, 2024

Minor heads up, I've converted the changes in this PR from HTML format to Markdown. I've done the same thing for the parent PR as well.

@@ -48,30 +48,30 @@ I got the idea for Dangerzone from Qubes, an operating system that runs everythi

Dangerzone was inspired by TrustedPDF but it works in non-Qubes operating systems, which is important, because most of the journalists I know use Macs and probably won’t be jumping to Qubes for some time.

It uses Linux containers to sandbox dangerous documents instead of virtual machines. And it also adds some features that TrustedPDF doesn’t have: it works with any office documents, not just PDFs; it uses optical character recognition (OCR) to make the safe PDF have a searchable text layer; and it compresses the final safe PDF.
It uses [gVisor](https://gvisor.dev/) sandboxes running in Linux containers to sandbox dangerous documents. And it also adds some features that TrustedPDF doesn’t have: it works with any office documents, not just PDFs; it uses optical character recognition (OCR) to make the safe PDF have a searchable text layer; and it compresses the final safe PDF.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It uses [gVisor](https://gvisor.dev/) sandboxes running in Linux containers to sandbox dangerous documents. And it also adds some features that TrustedPDF doesn’t have: it works with any office documents, not just PDFs; it uses optical character recognition (OCR) to make the safe PDF have a searchable text layer; and it compresses the final safe PDF.
It uses [gVisor](https://gvisor.dev/) sandboxes running in Linux containers to open dangerous documents, instead of virtual machines. And it also adds some features that TrustedPDF doesn’t have: it works with any office documents, not just PDFs; it uses optical character recognition (OCR) to make the safe PDF have a searchable text layer; and it compresses the final safe PDF.

I wanted to avoid using the word "sandbox" twice here. Also I brought back the comparison with virtual machines, since I think it makes sense in this paragraph, where we compare Dangerzone with TrustedPDF.


How does Dangerzone work?
-------------------------

Dangerzone uses Linux containers (two of them), which are sort of like quick, lightweight virtual machines that share the Linux kernel with their host. The easiest way to get containers running on Mac and Windows is by using [Docker Desktop](https://www.docker.com/products/docker-desktop). So when you first install Dangerzone, if you don’t already have Docker Desktop installed, it helps you download and install it.
Dangerzone uses Linux containers (two of them), and runs a sandbox inside each. The easiest way to get containers running on Mac and Windows is by using [Docker Desktop](https://www.docker.com/products/docker-desktop). So when you first install Dangerzone, if you don’t already have Docker Desktop installed, it helps you download and install it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, the gVisor sandbox runs only on the first container. The second container (soon to be removed) does not start a gVisor sandbox, and just recreates the PDF locally.

So, I wouldn't introduce gVisor just yet. If you want to make the container description more accurate and not mention virtual machines, we could describe them as follows:

Suggested change
Dangerzone uses Linux containers (two of them), and runs a sandbox inside each. The easiest way to get containers running on Mac and Windows is by using [Docker Desktop](https://www.docker.com/products/docker-desktop). So when you first install Dangerzone, if you don’t already have Docker Desktop installed, it helps you download and install it.
Dangerzone uses Linux containers (two of them), which are isolated application environments that share the Linux kernel with their host. The easiest way to get containers running on Mac and Windows is by using [Docker Desktop](https://www.docker.com/products/docker-desktop). So when you first install Dangerzone, if you don’t already have Docker Desktop installed, it helps you download and install it.


When Dangerzone starts the container that will sanitize the suspicious document, it _disables networking_ and does not mount anything. So if a malicious document hacks the container, it doesn’t have access to your data and it cant use the internet, so there’s not much it could do.
When Dangerzone starts a container, it will first start a gVisor sandbox _inside_ that container, then runs the potentially-dangerous document processing workload inside the sandbox. This ensures that the process dealing with the document is isolated from the Linux kernel. The sandbox and its parent container are also both configured to _disable networking_ and to not mount anything from the host filesystem. So if a malicious document manages to execute arbitrary code, this code doesn’t have access to the host kernel, doesn't have access to your data, and can't use the internet, so there’s not much it could do.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's the place where we can first mention the gVisor sandbox, given that we talk here for the first phase of the conversion:

Suggested change
When Dangerzone starts a container, it will first start a gVisor sandbox _inside_ that container, then runs the potentially-dangerous document processing workload inside the sandbox. This ensures that the process dealing with the document is isolated from the Linux kernel. The sandbox and its parent container are also both configured to _disable networking_ and to not mount anything from the host filesystem. So if a malicious document manages to execute arbitrary code, this code doesn’t have access to the host kernel, doesn't have access to your data, and can't use the internet, so there’s not much it could do.
When Dangerzone starts the container that will sanitize the suspicious document, it will first start a gVisor sandbox _inside_ that container, then run the potentially-dangerous document processing workload inside the sandbox. This ensures that the process dealing with the document is isolated from the Linux kernel. The sandbox and its parent container are also both configured to _disable networking_ and to not mount anything from the host filesystem. So if a malicious document manages to execute arbitrary code, this code doesn’t have access to the host kernel, doesn't have access to your data, and can't use the internet, so there’s not much it could do.


* _Reads the original document from standard input_
* Uses _LibreOffice_ or _PyMuPDF_ to convert original document to a PDF
* Uses _PyMuPDF_ to split PDF into individual pages, and to convert those into RGB pixel data
* _Writes the number of pages and the RGB pixel data to its standard output_

Then that container quits. The host then writes the RGB pixel data to a volume. A second container starts and:
Then that sandbox quits. The host then writes the RGB pixel data to a volume. A second sandbox starts and:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping in line with the container/sandbox distinction, and because we don't use gVisor in the second conversion phase, I guess we have to refer to "a second container":

Suggested change
Then that sandbox quits. The host then writes the RGB pixel data to a volume. A second sandbox starts and:
Then that sandbox quits. The host then writes the RGB pixel data to a volume. A second container starts and:

We could also explain a bit here that we don't really need containers in the second phase for their security properties. We just want them for code portability. In TrustedPDF, the second phase happens in the host, for instance.


* _Mounts a volume with the RGB pixel data_
* If OCR is enabled, uses _PyMuPDF_ to convert RGB pixel data into a compressed, **searchable** PDF
* Otherwise uses _PyMuPDF_ to convert RGB pixel data into a compressed, **flat** PDF
* _Stores safe PDF in separate volume_

Then that container quits, and the user can open the newly created safe PDF.
Then that sandbox quits, and the user can open the newly created safe PDF.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Then that sandbox quits, and the user can open the newly created safe PDF.
Then that container quits, and the user can open the newly created safe PDF.

@@ -97,12 +97,13 @@ It’s still possible to get hacked with Dangerzone
Like all software, it’s possible that Dangerzone (and more importantly, the software that it relies on like LibreOffice and Docker) has security bugs. Malicious documents are designed to target a specific piece of software – for example, Adobe Reader on Mac. It’s possible that someone could craft a malicious document that specifically targets Dangerzone itself. An attacker would need to chain these exploits together to succeed at hacking Dangerzone:

* An exploit for either LibreOffice or PyMuPDF
* A container escape exploit in the Linux kernel
* A sandbox escape exploit in the gVisor kernel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I would be fine with linking to gVisor's security model, since it's important reading material for those who want to understand the security guarantees of gVisor.

@@ -97,12 +97,13 @@ It’s still possible to get hacked with Dangerzone
Like all software, it’s possible that Dangerzone (and more importantly, the software that it relies on like LibreOffice and Docker) has security bugs. Malicious documents are designed to target a specific piece of software – for example, Adobe Reader on Mac. It’s possible that someone could craft a malicious document that specifically targets Dangerzone itself. An attacker would need to chain these exploits together to succeed at hacking Dangerzone:

* An exploit for either LibreOffice or PyMuPDF
* A container escape exploit in the Linux kernel
* A sandbox escape exploit in the gVisor kernel
* A container escape exploit in the Linux kernel that isn't protected by gVisor's syscall filters
* In Mac and Windows, a VM escape exploit for Docker Desktop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm itching to remove this line. Once the attacker has access to the VM, they have access to the files of the host (sure, subject to some ACL rules depending on your Docker Desktop instlallation), and access to the internet. By our own standards, at this point the attacker is not "contained".

* In Mac and Windows, a VM escape exploit for Docker Desktop

If you opened such a malicious document with Dangerzone, it would start the first container and begin the conversion process. While it was converting the original document (say, a docx file) into a PDF using LibreOffice, it would exploit a vulnerability in LibreOffice to hack the container. Then, it would exploit a vulnerability in the Linux kernel to escape the container, and from there attempt to take over the computer.
If you opened such a malicious document with Dangerzone, it would start the first sandbox and begin the conversion process. While it was converting the original document (say, a docx file) into a PDF using LibreOffice, it would exploit a vulnerability in LibreOffice to achieve code execution. Then, it would exploit a vulnerability in the gVisor kernel to escape the sandbox, then it would exploit a vulnerability in the Linux kernel to escape the container, and from there attempt to take over the computer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you opened such a malicious document with Dangerzone, it would start the first sandbox and begin the conversion process. While it was converting the original document (say, a docx file) into a PDF using LibreOffice, it would exploit a vulnerability in LibreOffice to achieve code execution. Then, it would exploit a vulnerability in the gVisor kernel to escape the sandbox, then it would exploit a vulnerability in the Linux kernel to escape the container, and from there attempt to take over the computer.
For example, let's say that you open a malicious `.docx` file that specifically targets Dangerzone. What Dangerzone would do first is to start a Linux container, then start a gVisor sandbox within it, and finally begin the conversion process into a PDF using LibreOffice. If the malicious document wants to escape to the host, it first needs to exploit a vulnerability in LibreOffice to achieve code execution. Once it has control of LibreOffice, it needs to exploit a vulnerability in the gVisor kernel to escape the sandbox. Assuming it finds one, it then needs to find a different vulnerability in the Linux kernel to escape the container, and from there attempt to take over the computer.

The chain of "Then" made the text a bit difficult to read, so I propose to add some fluff here and there.

@eloquence eloquence deleted the branch freedomofpress:2024-05-see-also August 20, 2024 17:44
@eloquence eloquence closed this Aug 20, 2024
@eloquence
Copy link
Member

I did not mean to close this, sorry - looks like the merge of #39 automatically did so because this targets a branch that no longer exists. @EtiennePerot, could you re-open targeted to main?

@EtiennePerot
Copy link
Contributor Author

@EtiennePerot, could you re-open targeted to main?

Done in #46.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants