Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigating serialisation refhooks #5

Open
shikokuchuo opened this issue Jul 13, 2024 · 2 comments
Open

Investigating serialisation refhooks #5

shikokuchuo opened this issue Jul 13, 2024 · 2 comments

Comments

@shikokuchuo
Copy link
Member

I have an instrumented version of nanonext available at shikokuchuo/nanonext@instrumented which just prints to the console each time it executes a hook.

This should hopefully make things easier to reason with.

@sebffischer

@shikokuchuo
Copy link
Member Author

From quick initial tests with torch tensors, a list of identical tensors runs the inhooks once (as designed) and is hence efficient in terms of serialisation. However the outhooks run each time, hence I think why identical fails after unserialization.

I've not yet tested with Arrow or Polars objects as I'm at the airport about to board a flight. Those will be better examples to use because of the special nature of torch serialisation.

@shikokuchuo
Copy link
Member Author

Can confirm same for Arrow and Polars objects - using minimally modified examples from the mirai vignette: https://shikokuchuo.net/mirai/articles/mirai.html#serialization-arrow-polars-and-beyond

Output should be like the below:

m <- mirai(list(a = x, b = x), x = x)
Inhook 1
> m[]
Outhook 1
Outhook 2
# < ...object output omitted... >

compared to where y is a distinct object:

m <- mirai(list(a = x, b = y), x = x, y = y)
Inhook 1
Inhook 2
> m[]
Outhook 1
Outhook 2
# < ...object output omitted... >

Was chatting to @lionel- earlier about this. Apparently copy-on-write semantics does not survive a round-trip through serialization / unserialization in any case. The fact that only one copy of the object is serialized is already an improvement over non-reference objects:

> x <- 1
> .Internal(inspect(x))
@57e435a8b400 14 REALSXP g1c1 [MARK,REF(12)] (len=1, tl=0) 1
> .Internal(inspect(list(x)))
@57e43809ee78 19 VECSXP g0c1 [] (len=1, tl=0)
  @57e435a8b400 14 REALSXP g1c1 [MARK,REF(13)] (len=1, tl=0) 1
> .Internal(inspect(list(x, x)))
@57e437d55858 19 VECSXP g0c2 [] (len=2, tl=0)
  @57e435a8b400 14 REALSXP g1c1 [MARK,REF(15)] (len=1, tl=0) 1
  @57e435a8b400 14 REALSXP g1c1 [MARK,REF(15)] (len=1, tl=0) 1
> .Internal(inspect(list(x, x, x)))
@57e438975f98 19 VECSXP g0c3 [] (len=3, tl=0)
  @57e435a8b400 14 REALSXP g1c1 [MARK,REF(18)] (len=1, tl=0) 1
  @57e435a8b400 14 REALSXP g1c1 [MARK,REF(18)] (len=1, tl=0) 1
  @57e435a8b400 14 REALSXP g1c1 [MARK,REF(18)] (len=1, tl=0) 1

> serialize(list(x), NULL)
 [1] 58 0a 00 00 00 03 00 04 04 01 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00 00 13 00 00 00 01 00 00 00 0e 00 00 00 01
[40] 3f f0 00 00 00 00 00 00
> serialize(list(x, x), NULL)
 [1] 58 0a 00 00 00 03 00 04 04 01 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00 00 13 00 00 00 02 00 00 00 0e 00 00 00 01
[40] 3f f0 00 00 00 00 00 00 00 00 00 0e 00 00 00 01 3f f0 00 00 00 00 00 00
> serialize(list(x, x, x), NULL)
 [1] 58 0a 00 00 00 03 00 04 04 01 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00 00 13 00 00 00 03 00 00 00 0e 00 00 00 01
[40] 3f f0 00 00 00 00 00 00 00 00 00 0e 00 00 00 01 3f f0 00 00 00 00 00 00 00 00 00 0e 00 00 00 01 3f f0 00 00 00 00 00
[79] 00

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant