Skip to content

Commit

Permalink
[lldb][AArch64] Add SME's Array Storage (ZA) register
Browse files Browse the repository at this point in the history
Note: This requires later commits for ZA to function properly,
it is split for ease of review. Testing is also in a later patch.

The "Matrix" part of the Scalable Matrix Extension is a new register
"ZA". You can think of this as a square matrix made up of scalable rows,
where each row is one scalable vector long. However it is not made
of the existing scalable vector registers, it is its own register.
Meaning that the size of ZA is the vector length in bytes * the vector
length in bytes.

https://developer.arm.com/documentation/ddi0616/latest/

It uses the streaming vector length, even when streaming mode itself
is not active. For this reason, it's register data header always
includes the streaming vector length.

Due to it's size I've changed kMaxRegisterByteSize to the maximum
possible ZA size and kTypicalRegisterByteSize will be the maximum
possible scalable vector size. Therefore ZA transactions will cause heap
allocations, and non ZA registers will perform exactly as before.

ZA can be enabled and disabled independently of streaming mode. The way
this works in ptrace is different to SVE versus streaming SVE. Writing
NT_ARM_ZA without register data disables ZA, writing NT_ARM_ZA with
register data enables ZA (LLDB will only support the latter, and only
because it's convenient for us to do so).

https://kernel.org/doc/html/v6.2/arm64/sme.html

LLDB does not handle registers that can appear and dissappear at
runtime. Rather than add complexity to implement that, LLDB will
show a block of 0s when ZA is disabled.

The alternative is not only updating the vector lengths every stop,
but every register definition. It's possible but I'm not sure it's worth
pursuing.

Users should refer to the SVCR register (added in later patches)
for the final word on whether ZA is active or not.

Writing to "VG" during streaming mode will change the size of the
streaming sve registers and ZA. LLDB will not attempt to preserve
register values in this case, we'll just read back the undefined
content the kernel shows. This is in line with, as stated, the
kernel ABIs and the prospective software ABIs look like.

ZA is defined as a vector of size SVL*SVL, so the display in lldb
is very basic. A giant block of values. This is no worse than SVE,
just larger. There is scope to improve this but that can wait
until we see some use cases.

Reviewed By: omjavaid

Differential Revision: https://reviews.llvm.org/D159502
  • Loading branch information
DavidSpickett committed Sep 19, 2023
1 parent e0919b1 commit d99d9d8
Show file tree
Hide file tree
Showing 10 changed files with 336 additions and 8 deletions.
4 changes: 3 additions & 1 deletion lldb/include/lldb/Utility/RegisterValue.h
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,9 @@ class RegisterValue {
// byte AArch64 SVE.
kTypicalRegisterByteSize = 256u,
// Anything else we'll heap allocate storage for it.
kMaxRegisterByteSize = kTypicalRegisterByteSize,
// 256x256 to support 256 byte AArch64 SME's array storage (ZA) register.
// Which is a square of vector length x vector length.
kMaxRegisterByteSize = 256u * 256u,
};

typedef llvm::SmallVector<uint8_t, kTypicalRegisterByteSize> BytesContainer;
Expand Down
235 changes: 231 additions & 4 deletions lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_arm64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,10 @@
0x40b /* ARM Scalable Matrix Extension, Streaming SVE mode */
#endif

#ifndef NT_ARM_ZA
#define NT_ARM_ZA 0x40c /* ARM Scalable Matrix Extension, Array Storage */
#endif

#ifndef NT_ARM_PAC_MASK
#define NT_ARM_PAC_MASK 0x406 /* Pointer authentication code masks */
#endif
Expand Down Expand Up @@ -90,6 +94,16 @@ NativeRegisterContextLinux::CreateHostNativeRegisterContextLinux(
opt_regsets.Set(RegisterInfoPOSIX_arm64::eRegsetMaskSSVE);
}

sve::user_za_header za_header;
ioVec.iov_base = &za_header;
ioVec.iov_len = sizeof(za_header);
regset = NT_ARM_ZA;
if (NativeProcessLinux::PtraceWrapper(PTRACE_GETREGSET,
native_thread.GetID(), &regset,
&ioVec, sizeof(za_header))
.Success())
opt_regsets.Set(RegisterInfoPOSIX_arm64::eRegsetMaskZA);

NativeProcessLinux &process = native_thread.GetProcess();

std::optional<uint64_t> auxv_at_hwcap =
Expand Down Expand Up @@ -314,6 +328,31 @@ NativeRegisterContextLinux_arm64::ReadRegister(const RegisterInfo *reg_info,
offset = reg_info->byte_offset - GetRegisterInfo().GetMTEOffset();
assert(offset < GetMTEControlSize());
src = (uint8_t *)GetMTEControl() + offset;
} else if (IsSME(reg)) {
error = ReadZAHeader();
if (error.Fail())
return error;

// If there is only a header and no registers, ZA is inactive. Read as 0
// in this case.
if (m_za_header.size == sizeof(m_za_header)) {
// This will get reconfigured/reset later, so we are safe to use it.
// ZA is a square of VL * VL and the ptrace buffer also includes the
// header itself.
m_za_ptrace_payload.resize(((m_za_header.vl) * (m_za_header.vl)) +
GetZAHeaderSize());
std::fill(m_za_ptrace_payload.begin(), m_za_ptrace_payload.end(), 0);
} else {
// ZA is active, read the real register.
error = ReadZA();
if (error.Fail())
return error;
}

// ZA is part of the SME set but uses a seperate member buffer for storage.
// Therefore its effective byte offset is always 0 even if it isn't 0 within
// the SME register set.
src = (uint8_t *)GetZABuffer() + GetZAHeaderSize();
} else
return Status("failed - register wasn't recognized to be a GPR or an FPR, "
"write strategy unknown");
Expand Down Expand Up @@ -420,8 +459,12 @@ Status NativeRegisterContextLinux_arm64::WriteRegister(
SetSVERegVG(vg_value);

error = WriteSVEHeader();
if (error.Success())
if (error.Success()) {
// Changing VG during streaming mode also changes the size of ZA.
if (m_sve_state == SVEState::Streaming)
m_za_header_is_valid = false;
ConfigureRegisterContext();
}

if (m_sve_header_is_valid && vg_value == GetSVERegVG())
return error;
Expand Down Expand Up @@ -494,6 +537,20 @@ Status NativeRegisterContextLinux_arm64::WriteRegister(
::memcpy(dst, reg_value.GetBytes(), reg_info->byte_size);

return WriteTLS();
} else if (IsSME(reg)) {
error = ReadZA();
if (error.Fail())
return error;

// ZA is part of the SME set but not stored with the other SME registers.
// So its byte offset is effectively always 0.
dst = (uint8_t *)GetZABuffer() + GetZAHeaderSize();
::memcpy(dst, reg_value.GetBytes(), reg_info->byte_size);

// While this is writing a header that contains a vector length, the only
// way to change that is via the vg register. So here we assume the length
// will always be the current length and no reconfigure is needed.
return WriteZA();
}

return Status("Failed to write register value");
Expand All @@ -503,8 +560,10 @@ enum RegisterSetType : uint32_t {
GPR,
SVE, // Used for SVE and SSVE.
FPR, // When there is no SVE, or SVE in FPSIMD mode.
// Pointer authentication registers are read only, so not included here.
MTE,
TLS,
SME, // ZA only, SVCR and SVG are pseudo registers.
};

static uint8_t *AddRegisterSetType(uint8_t *dst,
Expand Down Expand Up @@ -533,6 +592,24 @@ NativeRegisterContextLinux_arm64::CacheAllRegisters(uint32_t &cached_size) {
if (error.Fail())
return error;

// Here this means, does the system have ZA, not whether it is active.
if (GetRegisterInfo().IsZAEnabled()) {
error = ReadZAHeader();
if (error.Fail())
return error;
// Use header size here because the buffer may contain fake data when ZA is
// disabled. We do not want to write this fake data (all 0s) because this
// would tell the kernel that we want ZA to become active. Which is the
// opposite of what we want in the case where it is currently inactive.
cached_size += sizeof(RegisterSetType) + m_za_header.size;
// For the same reason, we need to force it to be re-read so that it will
// always contain the real header.
m_za_buffer_is_valid = false;
error = ReadZA();
if (error.Fail())
return error;
}

// If SVE is enabled we need not copy FPR separately.
if (GetRegisterInfo().IsSVEEnabled() || GetRegisterInfo().IsSSVEEnabled()) {
// Store mode and register data.
Expand Down Expand Up @@ -583,6 +660,45 @@ Status NativeRegisterContextLinux_arm64::ReadAllRegisterValues(
dst = AddSavedRegisters(dst, RegisterSetType::GPR, GetGPRBuffer(),
GetGPRBufferSize());

// Streaming SVE and the ZA register both use the streaming vector length.
// When you change this, the kernel will invalidate parts of the process
// state. Therefore we need a specific order of restoration for each mode, if
// we also have ZA to restore.
//
// Streaming mode enabled, ZA enabled:
// * Write streaming registers. This sets SVCR.SM and clears SVCR.ZA.
// * Write ZA, this set SVCR.ZA. The register data we provide is written to
// ZA.
// * Result is SVCR.SM and SVCR.ZA set, with the expected data in both
// register sets.
//
// Streaming mode disabled, ZA enabled:
// * Write ZA. This sets SVCR.ZA, and the ZA content. In the majority of cases
// the streaming vector length is changing, so the thread is converted into
// an FPSIMD thread if it is not already one. This also clears SVCR.SM.
// * Write SVE registers, which also clears SVCR.SM but most importantly, puts
// us into full SVE mode instead of FPSIMD mode (where the registers are
// actually the 128 bit Neon registers).
// * Result is we have SVCR.SM = 0, SVCR.ZA = 1 and the expected register
// state.
//
// Restoring in different orders leads to things like the SVE registers being
// truncated due to the FPSIMD mode and ZA being disabled or filled with 0s
// (disabled and 0s looks the same from inside lldb since we fake the value
// when it's disabled).
//
// For more information on this, look up the uses of the relevant NT_ARM_
// constants and the functions vec_set_vector_length, sve_set_common and
// za_set in the Linux Kernel.

if ((m_sve_state != SVEState::Streaming) && GetRegisterInfo().IsZAEnabled()) {
// Use the header size not the buffer size, as we may be using the buffer
// for fake data, which we do not want to write out.
assert(m_za_header.size <= GetZABufferSize());
dst = AddSavedRegisters(dst, RegisterSetType::SME, GetZABuffer(),
m_za_header.size);
}

if (GetRegisterInfo().IsSVEEnabled() || GetRegisterInfo().IsSSVEEnabled()) {
dst = AddRegisterSetType(dst, RegisterSetType::SVE);
*(reinterpret_cast<SVEState *>(dst)) = m_sve_state;
Expand All @@ -593,6 +709,12 @@ Status NativeRegisterContextLinux_arm64::ReadAllRegisterValues(
GetFPRSize());
}

if ((m_sve_state == SVEState::Streaming) && GetRegisterInfo().IsZAEnabled()) {
assert(m_za_header.size <= GetZABufferSize());
dst = AddSavedRegisters(dst, RegisterSetType::SME, GetZABuffer(),
m_za_header.size);
}

if (GetRegisterInfo().IsMTEEnabled()) {
dst = AddSavedRegisters(dst, RegisterSetType::MTE, GetMTEControl(),
GetMTEControlSize());
Expand Down Expand Up @@ -685,6 +807,8 @@ Status NativeRegisterContextLinux_arm64::WriteAllRegisterValues(
return error;

// SVE header has been written configure SVE vector length if needed.
// This could change ZA data too, but that will be restored again later
// anyway.
ConfigureRegisterContext();

// Write header and register data, incrementing src this time.
Expand All @@ -707,6 +831,33 @@ Status NativeRegisterContextLinux_arm64::WriteAllRegisterValues(
GetTLSBuffer(), &src, GetTLSBufferSize(), m_tls_is_valid,
std::bind(&NativeRegisterContextLinux_arm64::WriteTLS, this));
break;
case RegisterSetType::SME:
// To enable or disable ZA you write the regset with or without register
// data. The kernel detects this by looking at the ioVec's length, not the
// ZA header size you pass in. Therefore we must write header and register
// data (if present) in one go every time. Read the header only first just
// to get the size.
::memcpy(GetZAHeader(), src, GetZAHeaderSize());
// Read the header and register data. Can't use the buffer size here, it
// may be incorrect due to being filled with dummy data previously. Resize
// this so WriteZA uses the correct size.
m_za_ptrace_payload.resize(m_za_header.size);
::memcpy(GetZABuffer(), src, GetZABufferSize());
m_za_buffer_is_valid = true;

error = WriteZA();
if (error.Fail())
return error;

// Update size of ZA, which resizes the ptrace payload potentially
// trashing our copy of the data we just wrote.
ConfigureRegisterContext();

// ZA buffer now has proper size, read back the data we wrote above, from
// ptrace.
error = ReadZA();
src += GetZABufferSize();
break;
}

if (error.Fail())
Expand Down Expand Up @@ -734,6 +885,10 @@ bool NativeRegisterContextLinux_arm64::IsSVE(unsigned reg) const {
return GetRegisterInfo().IsSVEReg(reg);
}

bool NativeRegisterContextLinux_arm64::IsSME(unsigned reg) const {
return GetRegisterInfo().IsSMEReg(reg);
}

bool NativeRegisterContextLinux_arm64::IsPAuth(unsigned reg) const {
return GetRegisterInfo().IsPAuthReg(reg);
}
Expand Down Expand Up @@ -887,11 +1042,13 @@ void NativeRegisterContextLinux_arm64::InvalidateAllRegisters() {
m_fpu_is_valid = false;
m_sve_buffer_is_valid = false;
m_sve_header_is_valid = false;
m_za_buffer_is_valid = false;
m_za_header_is_valid = false;
m_pac_mask_is_valid = false;
m_mte_ctrl_is_valid = false;
m_tls_is_valid = false;

// Update SVE registers in case there is change in configuration.
// Update SVE and ZA registers in case there is change in configuration.
ConfigureRegisterContext();
}

Expand Down Expand Up @@ -1057,6 +1214,62 @@ Status NativeRegisterContextLinux_arm64::WriteTLS() {
return WriteRegisterSet(&ioVec, GetTLSBufferSize(), NT_ARM_TLS);
}

Status NativeRegisterContextLinux_arm64::ReadZAHeader() {
Status error;

if (m_za_header_is_valid)
return error;

struct iovec ioVec;
ioVec.iov_base = GetZAHeader();
ioVec.iov_len = GetZAHeaderSize();

error = ReadRegisterSet(&ioVec, GetZAHeaderSize(), NT_ARM_ZA);

if (error.Success())
m_za_header_is_valid = true;

return error;
}

Status NativeRegisterContextLinux_arm64::ReadZA() {
Status error;

if (m_za_buffer_is_valid)
return error;

struct iovec ioVec;
ioVec.iov_base = GetZABuffer();
ioVec.iov_len = GetZABufferSize();

error = ReadRegisterSet(&ioVec, GetZABufferSize(), NT_ARM_ZA);

if (error.Success())
m_za_buffer_is_valid = true;

return error;
}

Status NativeRegisterContextLinux_arm64::WriteZA() {
// Note that because the ZA ptrace payload contains the header also, this
// method will write both. This is done because writing only the header
// will disable ZA, even if .size in the header is correct for an enabled ZA.
Status error;

error = ReadZA();
if (error.Fail())
return error;

struct iovec ioVec;
ioVec.iov_base = GetZABuffer();
ioVec.iov_len = GetZABufferSize();

m_za_buffer_is_valid = false;
m_za_header_is_valid = false;

return WriteRegisterSet(&ioVec, GetZABufferSize(), NT_ARM_ZA);
}

void NativeRegisterContextLinux_arm64::ConfigureRegisterContext() {
// ConfigureRegisterContext gets called from InvalidateAllRegisters
// on every stop and configures SVE vector length and whether we are in
Expand Down Expand Up @@ -1097,15 +1310,28 @@ void NativeRegisterContextLinux_arm64::ConfigureRegisterContext() {
if (m_sve_state == SVEState::Full || m_sve_state == SVEState::FPSIMD ||
m_sve_state == SVEState::Streaming) {
// On every stop we configure SVE vector length by calling
// ConfigureVectorLength regardless of current SVEState of this thread.
// ConfigureVectorLengthSVE regardless of current SVEState of this thread.
uint32_t vq = RegisterInfoPOSIX_arm64::eVectorQuadwordAArch64SVE;
if (sve::vl_valid(m_sve_header.vl))
vq = sve::vq_from_vl(m_sve_header.vl);

GetRegisterInfo().ConfigureVectorLength(vq);
GetRegisterInfo().ConfigureVectorLengthSVE(vq);
m_sve_ptrace_payload.resize(sve::PTraceSize(vq, sve::ptrace_regs_sve));
}
}

if (!m_za_header_is_valid) {
Status error = ReadZAHeader();
if (error.Success()) {
uint32_t vq = RegisterInfoPOSIX_arm64::eVectorQuadwordAArch64SVE;
if (sve::vl_valid(m_za_header.vl))
vq = sve::vq_from_vl(m_za_header.vl);

GetRegisterInfo().ConfigureVectorLengthZA(vq);
m_za_ptrace_payload.resize(m_za_header.size);
m_za_buffer_is_valid = false;
}
}
}

uint32_t NativeRegisterContextLinux_arm64::CalculateFprOffset(
Expand Down Expand Up @@ -1135,6 +1361,7 @@ std::vector<uint32_t> NativeRegisterContextLinux_arm64::GetExpeditedRegisters(
ExpeditedRegs expType) const {
std::vector<uint32_t> expedited_reg_nums =
NativeRegisterContext::GetExpeditedRegisters(expType);
// SVE, non-streaming vector length.
if (m_sve_state == SVEState::FPSIMD || m_sve_state == SVEState::Full)
expedited_reg_nums.push_back(GetRegisterInfo().GetRegNumSVEVG());

Expand Down
Loading

0 comments on commit d99d9d8

Please sign in to comment.