This post contains description and analysis of some SGX attacks and thoughts about consequences of TEE attacks for shielded state sync in general. I’m planning to look more into attacks on other TEEs.
SGX attack basics
Most of the SGX attacks are based on two facts:
- memory and compute are scarce and shared between protected and unprotected execution contexts
- even though, when switching contexts, the shared data structures are often cleared, there are ways to get the protected data either through side-channel attacks or by exploiting vulnerabilities
Often, SGX attacks involve the following techniques and data structures:
- cache
- hyper-threading
- page faults and page tables
- speculative (transient) execution
- timing
- etc
Cache
Cache is a smaller, faster memory located close to a CPU that is used to reduce data access time by storing the frequently accessed data. Cache is usually organised hierarchically and split into levels (L1, L2, L3, etc).
Cache levels are enumerated from faster to slower (i.e., L1 is faster than L2). The faster the cache is, the smaller it is. It is common to talk about three cache levels:
- First level cache, that is split into data cache and instruction cache. Usually not shared between cores
- Mid-level cache (MLC)
- Last level cache (LLC). Most often is shared between cores
Cache miss is a failed attempt to read from cache that results in reading the requested data from the main memory.
When the cache is full and it could be useful to add new data, it requires evicting some old entry. To decide which entry to evict is done with the help of replacement policy.
Note: Enclave data that is residing in cache is not encrypted - it is only encrypted in the main memory.
Hyper-threading
Hyper-threading is a technology that allows a single physical core acting as two logical cores. This implies that the logical cores share physical storage like L1 and L2 caches, floating-point units, etc, so having control over one of the threads allows the attacker to infer information about the enclave thread. Hyper-threading is enabled by default.
Note: multithreading - splitting the work between multiple physical cores; hyper-threading - splitting the core into multiple logical cores.
Disabling hyper-threading could be an option, but there is no way for the enclave to verify that it is disabled.
Page faults
Talking about the main memory, we differentiate between physical and virtual memory. Virtual memory is an abstraction over physical memory that is divided into pages. Page is the smallest unit of data that can be managed in virtual memory. Pages are described in page tables.
Asynchronous Enclave Exit (AEX) is an exist procedure that executes in case of fault or internal interrupt. When AEX happens, the execution context is saved in SSA - State Save Area and the control is handed over to the operating system. The base address of the faulting page is communicated as well.
Manipulating properties of the page table, causing page faults triggering AEX and monitoring page access patterns may allow an attacker to learn enclave-protected information.
Speculative execution / transient execution
Processors execute instructions out of order to improve efficiency. Because of that, some of the values are not known in advance, and the processor can try to predict (speculate) what the value will be and proceed with this speculative value. This is called speculative execution. If the actual value is known and is the same as the speculative value, the speculative part of the instructions “officially” becomes a part of the program state. If the actual and speculative values are not the same, the speculative part (also called transient) gets squashed.
Even though these transient instructions do not become a part of the program state, they do have side-effects, e.g., if they loaded something to cache, the value will stay in cache even if the transient part is squashed.
SGX memory
SGX is assigned a portion of physical memory area called Processor Reserved Memory (PRM) that is divided into Enclave Page Cache (EPC) and Enclave Page Cache Map (EPCM). EPC contains the data, EPCM contains the metadata. When there is not enough space in EPC, the data is encrypted and moved to the main memory. When the page is moved back from the main memory to EPC, it is decrypted, authenticated, and put in L1 cache.
Description of some existing attacks
SGX.fail provides analysis and a couple of very helpful tables describing the existing SGX attacks. About 2/3 of the known SGX attacks can be mitigated by the developer, more than half of which implies executing constant-time code. More annoying attacks are the ones that require Intel updates, because everyone involved in this process is slow. Below I describe some of the attacks that require Intel updates (since the attacks that require developer-side mitigation we can fix by following the guidelines).
xAPIC
APIC is a CPU component that can operate in legacy xAPIC mode in which APIC configuration registers are exposed through a MMIO page. APIC registers are not properly initialised - bytes 4-15 may contain previously processed by the CPU cached data that may come from SGX.
So it is roughly like this: SGX loads data from the main memory and puts it in the cache. The attacker reads the APIC registers and sees the SGX cached data.
This attack is only possible if the attacker has access to the registers, i.e., ring 0 privilege level.
Expose: enclave memory and registers
MMIO
A class of vulnerabilities that expose stale [enclave] data. An application that uses SGX enclaves has access a memory region used to communicate with the enclave. If an attacker maps MMIO to the region, the enclave’s stale data might leak into uncore[1] when the enclave writes/reads from that memory. This attack requires the adversary to have access to MMIO.
Expose: Enclave data, randomness, attestation key
CacheOut + SGXAxe
CacheOut is a speculative execution attack that is capable of leaking data from Intel CPUs (L1-D) via cache eviction. SGXAxe uses CacheOut to specifically break SGX and leak the attestation keys. CacheOut for writes can run without hyperthreading (but not against SGX, it seems), SGXAxe exploits hyper-threading.
Note: it turns out, that Line Fill Buffer that is used to fetch data when a cache miss occurs, sometimes actually ends up containing evicted cache lines (so, backward data flow).
Expose: Enclave memory, CPU registers, attestation keys
Foreshadow
Foreshadow is a transient execution attack similar to Meltdown. The idea is to load data to L1-D cache, evict the data, and use timing measurements to determine what the data is.
Loading the enclave data to L1-D requires root privileges, but leaking the existing L1-D content can be performed in user space.
Expose: CPU registers, enclave memory, attestation keys
Analysis of the attacks
All of the existing attacks imply root privileges for consistent results - i.e., the attacker is the untrusted host OS. As a result of these attacks, the host OS might break SGX’s confidentiality and sometimes integrity, which basically brings us back to the TEE-free situation where the host is trusted with the detection keys. The difference is that in this situation the host will have to make effort.
In case confidentiality is broken, the host gets access to the detection keys and the detection output, which still doesn’t leak the content of the transactions and preserves some false-positive rate. However, it allows the attacker to misuse the detection keys and the detection results, including sharing them, which allows to temporarily reduce the anonymity set size.
If SGX’s integrity is broken, the OS can produce fake remote attestations. This can allow the attacker to execute different code in the enclave, which might lead to:
- mixing in negatives. This can be easily detected by the attacker since it affects the false-positive rate
- discarding positives. The attacker must keep at least
total_messages / 2 ^ fp_rate
entries which corresponds to the result when there are no messages for the receiver. In case the receiver knows there is a message for them that wasn’t delivered (e.g., push-based SSS), this kind of attack is easy to detect.
Mitigations
Storing as little as possible
Most of the attack allow to leak data stored in cache, however, the attacks also provide techniques that allow to load data to the cache. If the data is not stored long-term in the enclave memory, there is nothing to load and the attacker would have to execute the attack precisely when the enclave is performing detection for the user.
Integrity breaks recovery
There is nothing we as developers can do about the MCU [2]-requiring attacks that leak sealing and attestation keys to protect them. What we can do is:
- use host platforms that have the freshest MCU
- watch out for the integrity violation signs:
- Intel detects malicious behaviour of the attesting enclaves and marks them as not trusted
- verify the false-positive rate
- verify the detection of the known messages
- if the integrity break happens, change the FMD keys, stop using the compromised SGX, and publicly blame the host.
- periodically updating the FMD keys would be a good measure to ensure the affected set is as small is possible, but the trade-off between the update complexity and security benefits is up to the user
Costs and practical implications
Attacks require resources, and if the required resources are greater than the benefits from the attack, nobody will perform it. Academia papers breaking things in vacuum do not always result in real world attacks due to very constrained expected environment that is not achievable in real life. Just like in theoretical cryptography: reducing the security of the system from 128 bit (desired) to 127 bit (below the threshold) doesn’t imply that the system is compromised in practice.
What do we use TEEs for?
The main benefits of TEEs comparing to theoretical solutions is that TEEs are cheaper (space, time) and simpler. When our priority is efficiency and not privacy (which I believe is the case for SSS), TEEs don’t compete with the theoretical solutions. We are not chosing between TEEs and better cryptography, it is either TEE or no TEE. So what we get is:
- extra protection against third-party adversaries that get control over the host OS
- verifiable data (including detection keys) usage and deletion
And in case TEE is compromised, we get FMD without TEE, which is a reasonable choice too.
Resources:
- SGX.fail
- x86 Instructions Description
- MMIO class of attacks description by Intel
- xAPIC attack description by Intel
- CacheOut attack whitepaper
- SGAxe attack whitepaper
- Foreshadow attack whitepaper
- ÆPIC Leak: Architecturally Leaking Uninitialized
Data from the Microarchitecture - Intel terminology for speculative execution
- Page definition on Wiki
- Page table-based attacks
- Intel Hyper-threading
- Hyper-threading side-channels mitigation
- Cache definition on Wiki