Try launching unit tests on TPUs from CI #596

dlwh · 2024-05-20T19:42:09Z

No description provided.

dlwh · 2024-05-22T06:36:20Z

cc @rjpower since I fixed up a few tests on TPU (mostly just scaled down some initializers?)

rjpower · 2024-05-22T14:08:05Z

Curious... not surprising that the numerics would be slightly different for splash attention but interesting how changing the initialization helps that much. I think we expect a lot of noise through the matmul unit so this is likely what we expect.

I wonder if the scaling is implicitly adjusting the tolerance here, if that makes sense? That is, in the original, our output might be on the scale of 1000.00 {+- 0.5}, and now it's 1.00000 { +- 0.5.}? We're accurate to the same number of digits in both cases, just the digits that we're testing against are different. As a dumb example let's assume one attention went through bf16 and the other didn't, we'd see something like this through the matmuls:

import numpy as np
from paddle_bfloat import bfloat16

x = np.random.normal(-1, 1, 1000)
xs = x * 0.02

b = x.astype(bfloat16)
bs = xs.astype(bfloat16)

print(x@x - b@b)
print(xs@xs - bs@bs)

-0.1878888127475875
8.110935221994353e-05

dlwh · 2024-05-22T16:24:48Z

one of the fixed tests is actually my "pure jax" flash attention vs vanilla dot product, so it's the same algorithm on both CPU and TPU, but the matmuls are different enough on TPU that it matters, even with HIGHEST.

I mostly agree with your hypothesis, though I framed it in terms of "floating point numbers are more precise near 0"

Floating point is so annoying

dlwh added 30 commits May 20, 2024 12:41

Try launching unit tests on TPUs from CI

90e71fd

i hate shell

b1b4a6c

come on gpt-4, don't fail me now

b1ac2aa

pre-commit

f1c4d4f

almost?

126c8b2

ssh-agent

30c8d75

maybe?

61b0180

grrrr

101bbfe

silly, but so close

de5a6c4

delete the tpu

4d4a99f

better logging, somewhat looser tolerances

d84c01f

loosen checkpoint

7069b88

run some tests as forked

7ca970a

sigh

e4f7017

we don't need a matrix

1866b96

jkandcjkancjka

7d21889

...

fd8805e

this?

adf210d

what

75ed330

tweak branch checkout logic

eaf0f0a

acjkancjac

cb312e9

why

db362ac

what the actual fuck

3050380

blech

1fbde75

oops

4f273a6

precision is my enemy

18bbfbe

grr

b05e093

blech think i figured out splash attention

39ec0d7

mesh?

a6e378a

did we do it?

b91381c

skip entry tests

fdeb8e9

dlwh mentioned this pull request May 22, 2024

Issues with Splash Attention #589

Closed

dlwh merged commit 57bbadf into main May 22, 2024
5 checks passed

dlwh deleted the tpu_ci branch May 22, 2024 06:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try launching unit tests on TPUs from CI #596

Try launching unit tests on TPUs from CI #596

dlwh commented May 20, 2024

dlwh commented May 22, 2024

rjpower commented May 22, 2024

dlwh commented May 22, 2024

Try launching unit tests on TPUs from CI #596

Try launching unit tests on TPUs from CI #596

Conversation

dlwh commented May 20, 2024

dlwh commented May 22, 2024

rjpower commented May 22, 2024

dlwh commented May 22, 2024