-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
about.html
: Include the presence of gVisor as additional security layer
#41
about.html
: Include the presence of gVisor as additional security layer
#41
Conversation
Thanks a lot for the contribution Etienne 🤩. I haven't managed to look into it just yet, because I want to tie some other loose ends before switching to it (and documenting gVisor's usage in Dangerzone in general). I'll comment on it as soon as possible though. |
baad530
to
7939fb8
Compare
1909771
to
baa7b63
Compare
Minor heads up, I've converted the changes in this PR from HTML format to Markdown. I've done the same thing for the parent PR as well. |
@@ -48,30 +48,30 @@ I got the idea for Dangerzone from Qubes, an operating system that runs everythi | |||
|
|||
Dangerzone was inspired by TrustedPDF but it works in non-Qubes operating systems, which is important, because most of the journalists I know use Macs and probably won’t be jumping to Qubes for some time. | |||
|
|||
It uses Linux containers to sandbox dangerous documents instead of virtual machines. And it also adds some features that TrustedPDF doesn’t have: it works with any office documents, not just PDFs; it uses optical character recognition (OCR) to make the safe PDF have a searchable text layer; and it compresses the final safe PDF. | |||
It uses [gVisor](https://gvisor.dev/) sandboxes running in Linux containers to sandbox dangerous documents. And it also adds some features that TrustedPDF doesn’t have: it works with any office documents, not just PDFs; it uses optical character recognition (OCR) to make the safe PDF have a searchable text layer; and it compresses the final safe PDF. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It uses [gVisor](https://gvisor.dev/) sandboxes running in Linux containers to sandbox dangerous documents. And it also adds some features that TrustedPDF doesn’t have: it works with any office documents, not just PDFs; it uses optical character recognition (OCR) to make the safe PDF have a searchable text layer; and it compresses the final safe PDF. | |
It uses [gVisor](https://gvisor.dev/) sandboxes running in Linux containers to open dangerous documents, instead of virtual machines. And it also adds some features that TrustedPDF doesn’t have: it works with any office documents, not just PDFs; it uses optical character recognition (OCR) to make the safe PDF have a searchable text layer; and it compresses the final safe PDF. |
I wanted to avoid using the word "sandbox" twice here. Also I brought back the comparison with virtual machines, since I think it makes sense in this paragraph, where we compare Dangerzone with TrustedPDF.
|
||
How does Dangerzone work? | ||
------------------------- | ||
|
||
Dangerzone uses Linux containers (two of them), which are sort of like quick, lightweight virtual machines that share the Linux kernel with their host. The easiest way to get containers running on Mac and Windows is by using [Docker Desktop](https://www.docker.com/products/docker-desktop). So when you first install Dangerzone, if you don’t already have Docker Desktop installed, it helps you download and install it. | ||
Dangerzone uses Linux containers (two of them), and runs a sandbox inside each. The easiest way to get containers running on Mac and Windows is by using [Docker Desktop](https://www.docker.com/products/docker-desktop). So when you first install Dangerzone, if you don’t already have Docker Desktop installed, it helps you download and install it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, the gVisor sandbox runs only on the first container. The second container (soon to be removed) does not start a gVisor sandbox, and just recreates the PDF locally.
So, I wouldn't introduce gVisor just yet. If you want to make the container description more accurate and not mention virtual machines, we could describe them as follows:
Dangerzone uses Linux containers (two of them), and runs a sandbox inside each. The easiest way to get containers running on Mac and Windows is by using [Docker Desktop](https://www.docker.com/products/docker-desktop). So when you first install Dangerzone, if you don’t already have Docker Desktop installed, it helps you download and install it. | |
Dangerzone uses Linux containers (two of them), which are isolated application environments that share the Linux kernel with their host. The easiest way to get containers running on Mac and Windows is by using [Docker Desktop](https://www.docker.com/products/docker-desktop). So when you first install Dangerzone, if you don’t already have Docker Desktop installed, it helps you download and install it. |
|
||
When Dangerzone starts the container that will sanitize the suspicious document, it _disables networking_ and does not mount anything. So if a malicious document hacks the container, it doesn’t have access to your data and it can’t use the internet, so there’s not much it could do. | ||
When Dangerzone starts a container, it will first start a gVisor sandbox _inside_ that container, then runs the potentially-dangerous document processing workload inside the sandbox. This ensures that the process dealing with the document is isolated from the Linux kernel. The sandbox and its parent container are also both configured to _disable networking_ and to not mount anything from the host filesystem. So if a malicious document manages to execute arbitrary code, this code doesn’t have access to the host kernel, doesn't have access to your data, and can't use the internet, so there’s not much it could do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's the place where we can first mention the gVisor sandbox, given that we talk here for the first phase of the conversion:
When Dangerzone starts a container, it will first start a gVisor sandbox _inside_ that container, then runs the potentially-dangerous document processing workload inside the sandbox. This ensures that the process dealing with the document is isolated from the Linux kernel. The sandbox and its parent container are also both configured to _disable networking_ and to not mount anything from the host filesystem. So if a malicious document manages to execute arbitrary code, this code doesn’t have access to the host kernel, doesn't have access to your data, and can't use the internet, so there’s not much it could do. | |
When Dangerzone starts the container that will sanitize the suspicious document, it will first start a gVisor sandbox _inside_ that container, then run the potentially-dangerous document processing workload inside the sandbox. This ensures that the process dealing with the document is isolated from the Linux kernel. The sandbox and its parent container are also both configured to _disable networking_ and to not mount anything from the host filesystem. So if a malicious document manages to execute arbitrary code, this code doesn’t have access to the host kernel, doesn't have access to your data, and can't use the internet, so there’s not much it could do. |
|
||
* _Reads the original document from standard input_ | ||
* Uses _LibreOffice_ or _PyMuPDF_ to convert original document to a PDF | ||
* Uses _PyMuPDF_ to split PDF into individual pages, and to convert those into RGB pixel data | ||
* _Writes the number of pages and the RGB pixel data to its standard output_ | ||
|
||
Then that container quits. The host then writes the RGB pixel data to a volume. A second container starts and: | ||
Then that sandbox quits. The host then writes the RGB pixel data to a volume. A second sandbox starts and: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping in line with the container/sandbox distinction, and because we don't use gVisor in the second conversion phase, I guess we have to refer to "a second container":
Then that sandbox quits. The host then writes the RGB pixel data to a volume. A second sandbox starts and: | |
Then that sandbox quits. The host then writes the RGB pixel data to a volume. A second container starts and: |
We could also explain a bit here that we don't really need containers in the second phase for their security properties. We just want them for code portability. In TrustedPDF, the second phase happens in the host, for instance.
|
||
* _Mounts a volume with the RGB pixel data_ | ||
* If OCR is enabled, uses _PyMuPDF_ to convert RGB pixel data into a compressed, **searchable** PDF | ||
* Otherwise uses _PyMuPDF_ to convert RGB pixel data into a compressed, **flat** PDF | ||
* _Stores safe PDF in separate volume_ | ||
|
||
Then that container quits, and the user can open the newly created safe PDF. | ||
Then that sandbox quits, and the user can open the newly created safe PDF. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then that sandbox quits, and the user can open the newly created safe PDF. | |
Then that container quits, and the user can open the newly created safe PDF. |
@@ -97,12 +97,13 @@ It’s still possible to get hacked with Dangerzone | |||
Like all software, it’s possible that Dangerzone (and more importantly, the software that it relies on like LibreOffice and Docker) has security bugs. Malicious documents are designed to target a specific piece of software – for example, Adobe Reader on Mac. It’s possible that someone could craft a malicious document that specifically targets Dangerzone itself. An attacker would need to chain these exploits together to succeed at hacking Dangerzone: | |||
|
|||
* An exploit for either LibreOffice or PyMuPDF | |||
* A container escape exploit in the Linux kernel | |||
* A sandbox escape exploit in the gVisor kernel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I would be fine with linking to gVisor's security model, since it's important reading material for those who want to understand the security guarantees of gVisor.
@@ -97,12 +97,13 @@ It’s still possible to get hacked with Dangerzone | |||
Like all software, it’s possible that Dangerzone (and more importantly, the software that it relies on like LibreOffice and Docker) has security bugs. Malicious documents are designed to target a specific piece of software – for example, Adobe Reader on Mac. It’s possible that someone could craft a malicious document that specifically targets Dangerzone itself. An attacker would need to chain these exploits together to succeed at hacking Dangerzone: | |||
|
|||
* An exploit for either LibreOffice or PyMuPDF | |||
* A container escape exploit in the Linux kernel | |||
* A sandbox escape exploit in the gVisor kernel | |||
* A container escape exploit in the Linux kernel that isn't protected by gVisor's syscall filters | |||
* In Mac and Windows, a VM escape exploit for Docker Desktop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm itching to remove this line. Once the attacker has access to the VM, they have access to the files of the host (sure, subject to some ACL rules depending on your Docker Desktop instlallation), and access to the internet. By our own standards, at this point the attacker is not "contained".
* In Mac and Windows, a VM escape exploit for Docker Desktop | ||
|
||
If you opened such a malicious document with Dangerzone, it would start the first container and begin the conversion process. While it was converting the original document (say, a docx file) into a PDF using LibreOffice, it would exploit a vulnerability in LibreOffice to hack the container. Then, it would exploit a vulnerability in the Linux kernel to escape the container, and from there attempt to take over the computer. | ||
If you opened such a malicious document with Dangerzone, it would start the first sandbox and begin the conversion process. While it was converting the original document (say, a docx file) into a PDF using LibreOffice, it would exploit a vulnerability in LibreOffice to achieve code execution. Then, it would exploit a vulnerability in the gVisor kernel to escape the sandbox, then it would exploit a vulnerability in the Linux kernel to escape the container, and from there attempt to take over the computer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you opened such a malicious document with Dangerzone, it would start the first sandbox and begin the conversion process. While it was converting the original document (say, a docx file) into a PDF using LibreOffice, it would exploit a vulnerability in LibreOffice to achieve code execution. Then, it would exploit a vulnerability in the gVisor kernel to escape the sandbox, then it would exploit a vulnerability in the Linux kernel to escape the container, and from there attempt to take over the computer. | |
For example, let's say that you open a malicious `.docx` file that specifically targets Dangerzone. What Dangerzone would do first is to start a Linux container, then start a gVisor sandbox within it, and finally begin the conversion process into a PDF using LibreOffice. If the malicious document wants to escape to the host, it first needs to exploit a vulnerability in LibreOffice to achieve code execution. Once it has control of LibreOffice, it needs to exploit a vulnerability in the gVisor kernel to escape the sandbox. Assuming it finds one, it then needs to find a different vulnerability in the Linux kernel to escape the container, and from there attempt to take over the computer. |
The chain of "Then" made the text a bit difficult to read, so I propose to add some fluff here and there.
I did not mean to close this, sorry - looks like the merge of #39 automatically did so because this targets a branch that no longer exists. @EtiennePerot, could you re-open targeted to |
Done in #46. |
This updates
about.html
to reflect the addition of gVisor as an extra layer of security in the Dangerzone document handling process.The document sometimes conflated "container" and "sandbox", which is understandable because they were effectively the same thing before adding this extra sandboxing layer in the middle. Now it uses "container" only when talking about containers, otherwise it uses "sandbox".
index.html
was already using this language, so no update needed there.Corp shill check: There is only one outgoing link to gvisor.dev which I can remove if you'd prefer. It is marked
target="_blank" rel="noopener noreferrer"
as are other external links on the page. This mentions the word "gVisor" fewer times than "Linux"; the word "gVisor" is used only when (a) first talking about the use of sandboxing in Dangerzone, and (b) when talking about specific gVisor components like the kernel/syscall filters. Other parts of the page use the unqualified word "sandbox" instead.This PR is built on @apyrgio's
2024-05-see-also
branch on the assumption that it will be merged intomain
, and so that the diff shown on GitHub only shows the difference against that branch. My intention is that once #39 is merged, I can rebase and this PR should be edited to have its target branch set tomain
.