Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x86 LVI hardening #58

Merged
merged 8 commits into from
May 26, 2020
Merged

Conversation

jethrogb
Copy link

These patches are needed to mitigate the Load Value Injection (LVI) vulnerability in certain Intel processors.

The patches add several off-by-default passes to the x86 backend. The passes may be enabled as follows:
x86 Target features: lvi-cfi, lvi-load-hardening
LLVM argument: -x86-experimental-lvi-inline-asm-hardening

This is cherry-picking commits 080dd10 71e8021 b1d5810 5b519cf f95a67d e97a3e5 8ce078c 08b8b72 from upstream.

cc @raoulstrackx

@jethrogb
Copy link
Author

This is fixing a known, public, security vulnerability so I'd appreciate a quick review.

@jethrogb
Copy link
Author

cc @nikomatsakis

@nikic
Copy link

nikic commented May 14, 2020

@cuviper Does RedHat backport this?

I think this needs to bake upstream some more, this has landed upstream only a couple of days ago (after sitting in review for a few months). I'm also still waiting for some compile-time regression mitigations for this change (llvm@de92dc2 is a partial mitigation).

@jethrogb
Copy link
Author

jethrogb commented May 14, 2020

bake upstream some more

Can you elaborate on the timeline you have in mind?

The changes just landed upstream but the review process was very thorough. The patches were posted publicly in March but they have been developed and tested privately months prior. We've been using some version of these patches on a fork of Rust for a couple of months without issues.

Since these passes are opt-in, most users will not be affected by landing this. On the other hand, users who want to enable these passes need these patches yesterday to patch known, public, security vulnerabilities.

The patch you linked affects a file that doesn't exist in LLVM 9.

@cuviper
Copy link
Member

cuviper commented May 14, 2020

@cuviper Does RedHat backport this?

No. Our official statement on CVE-2020-0551 is that "no additional changes are deemed necessary or practical to address this flaw at the software layer."

But that's specific to our products, and I do see how a project like Fortanix would need extra care.

@cuviper
Copy link
Member

cuviper commented May 25, 2020

Please note that we've moved to LLVM 10 since rust-lang/rust#67759, so we'll need this PR rebased onto rustc/10.0-2020-05-05 to reach rust master.

@nikic
Copy link

nikic commented May 25, 2020

Nightly has recently updated to LLVM 10, so this would need to go into the https://github.com/rust-lang/llvm-project/tree/rustc/10.0-2020-05-05 branch now.

@cuviper Does RedHat backport this?

No. Our official statement on CVE-2020-0551 is that "no additional changes are deemed necessary or practical to address this flaw at the software layer."

But that's specific to our products, and I do see how a project like Fortanix would need extra care.

That's about what I expected. Can you check with Tom Stellard whether there's any plans to backport this upstream for the LLVM 10.0.1 release? (Independently of RHs own policy on this issue.)

Apart from that, I guess it would be fine to merge this. This has been in LLVM master for a while now without any issues I'm aware of, and most people will never run this code. The only impact for most people would be a slight compile-time regression for debug builds.

scottconstable and others added 8 commits May 25, 2020 16:10
RDF is designed to be target agnostic. Therefore it would be useful to have it available for other targets, such as X86.

Based on a previous patch by Krzysztof Parzyszek

Differential Revision: https://reviews.llvm.org/D75932
…de to "Indirect Thunks"

There are applications for indirect call/branch thunks other than retpoline for Spectre v2, e.g.,

https://software.intel.com/security-software-guidance/software-guidance/load-value-injection

Therefore it makes sense to refactor X86RetpolineThunks as a more general capability.

Differential Revision: https://reviews.llvm.org/D76810
… than Retpoline

Introduce a ThunkInserter CRTP base class from which new thunk types can inherit, e.g., thunks to mitigate https://software.intel.com/security-software-guidance/software-guidance/load-value-injection.

Differential Revision: https://reviews.llvm.org/D76811
…ion (LVI)

This pass replaces each indirect call/jump with a direct call to a thunk that looks like:

lfence
jmpq *%r11

This ensures that if the value in register %r11 was loaded from memory, then
the value in %r11 is (architecturally) correct prior to the jump.
Also adds a new target feature to X86: +lvi-cfi
("cfi" meaning control-flow integrity)
The feature can be added via clang CLI using -mlvi-cfi.

This is an alternate implementation to https://reviews.llvm.org/D75934 That merges the thunk insertion functionality with the existing X86 retpoline code.

Differential Revision: https://reviews.llvm.org/D76812
Adding a pass that replaces every ret instruction with the sequence:

pop <scratch-reg>
lfence
jmp *<scratch-reg>

where <scratch-reg> is some available scratch register, according to the
calling convention of the function being mitigated.

Differential Revision: https://reviews.llvm.org/D75935
… (LVI) Gadgets

Adds a new data structure, ImmutableGraph, and uses RDF to find LVI gadgets and add them to a MachineGadgetGraph.

More specifically, a new X86 machine pass finds Load Value Injection (LVI) gadgets consisting of a load from memory (i.e., SOURCE), and any operation that may transmit the value loaded from memory over a covert channel, or use the value loaded from memory to determine a branch/call target (i.e., SINK).

Also adds a new target feature to X86: +lvi-load-hardening

The feature can be added via the clang CLI using -mlvi-hardening.

Differential Revision: https://reviews.llvm.org/D75936
… (LVI)

After finding all such gadgets in a given function, the pass minimally inserts
LFENCE instructions in such a manner that the following property is satisfied:
for all SOURCE+SINK pairs, all paths in the CFG from SOURCE to SINK contain at
least one LFENCE instruction. The algorithm that implements this minimal
insertion is influenced by an academic paper that minimally inserts memory
fences for high-performance concurrent programs:

http://www.cs.ucr.edu/~lesani/companion/oopsla15/OOPSLA15.pdf

The algorithm implemented in this pass is as follows:

1. Build a condensed CFG (i.e., a GadgetGraph) consisting only of the following components:
  -SOURCE instructions (also includes function arguments)
  -SINK instructions
  -Basic block entry points
  -Basic block terminators
  -LFENCE instructions
2. Analyze the GadgetGraph to determine which SOURCE+SINK pairs (i.e., gadgets) are already mitigated by existing LFENCEs. If all gadgets have been mitigated, go to step 6.
3. Use a heuristic or plugin to approximate minimal LFENCE insertion.
4. Insert one LFENCE along each CFG edge that was cut in step 3.
5. Go to step 2.
6. If any LFENCEs were inserted, return true from runOnFunction() to tell LLVM that the function was modified.

By default, the heuristic used in Step 3 is a greedy heuristic that avoids
inserting LFENCEs into loops unless absolutely necessary. There is also a
CLI option to load a plugin that can provide even better optimization,
inserting fewer fences, while still mitigating all of the LVI gadgets.
The plugin can be found here: https://github.com/intel/lvi-llvm-optimization-plugin,
and a description of the pass's behavior with the plugin can be found here:
https://software.intel.com/security-software-guidance/insights/optimized-mitigation-approach-load-value-injection.

Differential Revision: https://reviews.llvm.org/D75937
…jection (LVI)

Added code to X86AsmParser::emitInstruction() to add an LFENCE after each instruction that may load, and emit a warning if it encounters an instruction that may be vulnerable, but cannot be automatically mitigated.

Differential Revision: https://reviews.llvm.org/D76158
@jethrogb jethrogb changed the base branch from rustc/9.0-2019-12-19 to rustc/10.0-2020-05-05 May 25, 2020 17:20
@jethrogb
Copy link
Author

Updated. Nice bonus is that LLVM 10 is on C++14 so the backport was easier.

@@ -138,9 +138,11 @@
; CHECK-NEXT: Machine Loop Invariant Code Motion
; CHECK-NEXT: Bundle Machine CFG Edges
; CHECK-NEXT: X86 FP Stackifier
; CHECK-NEXT: MachineDominator Tree Construction
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved from a couple of lines down, so I don't think something like de92dc2 is needed.

@jethrogb
Copy link
Author

So can I get a review then?

@tstellar
Copy link

@nikic If you file a bug requesting this backport at bugs.llvm.org and mark it as a blocker of release-10.0.1, I will add it to my review queue.

@tstellar
Copy link

@nikic I should add this doesn't guarantee I'll merge it. The patch needs to be reviewed to make sure it meets the release criteria and get the necessary approvals. But step 1 for this process is filing the bug.

@nikic nikic merged commit e6e8011 into rust-lang:rustc/10.0-2020-05-05 May 26, 2020
nikic pushed a commit to nikic/llvm-project that referenced this pull request May 14, 2023
…callback

The `TypeSystemMap::m_mutex` guards against concurrent modifications
of members of `TypeSystemMap`. In particular, `m_map`.

`TypeSystemMap::ForEach` iterates through the entire `m_map` calling
a user-specified callback for each entry. This is all done while
`m_mutex` is locked. However, there's nothing that guarantees that
the callback itself won't call back into `TypeSystemMap` APIs on the
same thread. This lead to double-locking `m_mutex`, which is undefined
behaviour. We've seen this cause a deadlock in the swift plugin with
following backtrace:

```

int main() {
    std::unique_ptr<int> up = std::make_unique<int>(5);

    volatile int val = *up;
    return val;
}

clang++ -std=c++2a -g -O1 main.cpp

./bin/lldb -o “br se -p return” -o run -o “v *up” -o “expr *up” -b
```

```
frame rust-lang#4: std::lock_guard<std::mutex>::lock_guard
frame rust-lang#5: lldb_private::TypeSystemMap::GetTypeSystemForLanguage <<<< Lock rust-lang#2
frame rust-lang#6: lldb_private::TypeSystemMap::GetTypeSystemForLanguage
frame rust-lang#7: lldb_private::Target::GetScratchTypeSystemForLanguage
...
frame rust-lang#26: lldb_private::SwiftASTContext::LoadLibraryUsingPaths
frame rust-lang#27: lldb_private::SwiftASTContext::LoadModule
frame rust-lang#30: swift::ModuleDecl::collectLinkLibraries
frame rust-lang#31: lldb_private::SwiftASTContext::LoadModule
frame rust-lang#34: lldb_private::SwiftASTContext::GetCompileUnitImportsImpl
frame rust-lang#35: lldb_private::SwiftASTContext::PerformCompileUnitImports
frame rust-lang#36: lldb_private::TypeSystemSwiftTypeRefForExpressions::GetSwiftASTContext
frame rust-lang#37: lldb_private::TypeSystemSwiftTypeRefForExpressions::GetPersistentExpressionState
frame rust-lang#38: lldb_private::Target::GetPersistentSymbol
frame rust-lang#41: lldb_private::TypeSystemMap::ForEach                 <<<< Lock #1
frame rust-lang#42: lldb_private::Target::GetPersistentSymbol
frame rust-lang#43: lldb_private::IRExecutionUnit::FindInUserDefinedSymbols
frame rust-lang#44: lldb_private::IRExecutionUnit::FindSymbol
frame rust-lang#45: lldb_private::IRExecutionUnit::MemoryManager::GetSymbolAddressAndPresence
frame rust-lang#46: lldb_private::IRExecutionUnit::MemoryManager::findSymbol
frame rust-lang#47: non-virtual thunk to lldb_private::IRExecutionUnit::MemoryManager::findSymbol
frame rust-lang#48: llvm::LinkingSymbolResolver::findSymbol
frame rust-lang#49: llvm::LegacyJITSymbolResolver::lookup
frame rust-lang#50: llvm::RuntimeDyldImpl::resolveExternalSymbols
frame rust-lang#51: llvm::RuntimeDyldImpl::resolveRelocations
frame rust-lang#52: llvm::MCJIT::finalizeLoadedModules
frame rust-lang#53: llvm::MCJIT::finalizeObject
frame rust-lang#54: lldb_private::IRExecutionUnit::ReportAllocations
frame rust-lang#55: lldb_private::IRExecutionUnit::GetRunnableInfo
frame rust-lang#56: lldb_private::ClangExpressionParser::PrepareForExecution
frame rust-lang#57: lldb_private::ClangUserExpression::TryParse
frame rust-lang#58: lldb_private::ClangUserExpression::Parse
```

Our solution is to simply iterate over a local copy of `m_map`.

**Testing**

* Confirmed on manual reproducer (would reproduce 100% of the time
  before the patch)

Differential Revision: https://reviews.llvm.org/D149949
nikic pushed a commit to nikic/llvm-project that referenced this pull request Aug 26, 2024
)

Currently, process of replacing bitwise operations consisting of
`LSR`/`LSL` with `And` is performed by `DAGCombiner`.

However, in certain cases, the `AND` generated by this process
can be removed.

Consider following case:
```
        lsr x8, x8, rust-lang#56
        and x8, x8, #0xfc
        ldr w0, [x2, x8]
        ret
```

In this case, we can remove the `AND` by changing the target of `LDR`
to `[X2, X8, LSL rust-lang#2]` and right-shifting amount change to 56 to 58.

after changed:
```
        lsr x8, x8, rust-lang#58
        ldr w0, [x2, x8, lsl rust-lang#2]
        ret
```

This patch checks to see if the `SHIFTING` + `AND` operation on load
target can be optimized and optimizes it if it can.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants