-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEP: TVM update #88
base: master
Are you sure you want to change the base?
TEP: TVM update #88
Conversation
|
||
## TVM update | ||
### New stack type | ||
This proposal suggests to introduce new type in TVM `Hasher`: it is opaque type used to consequently hash data chunks. This type behave the same way as others and can be considered immutable: `HASHAPP*` ops (described below) "consume" old hasher and return updated one (that way if old hasher was copied, it's copies are completely independent). This object is not serializable and thus can not be returned from TVM (for instance as result of get-method). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to use the name "Digest" as in popular languages like Java or GoLang, because this is a more abstract name for hashing/checksum calculation algorithms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like "digest" refers to result of hashing, here we use Hasher
for intermediate object with some internal state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this state could be implemented via a generic slice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, any state can be can be represented as some state. The question do we really need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4304417
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UQAP7O1Pm6kHiSerD6L52UIXPYyeYId2ha5WsHWPwOCx_-3p
`HASHENDST` - `0xF905` - `b h - b' ` - calculate hash from hasher and store it into the slice. | ||
|
||
|
||
`HASHINFO` - `0xF906` - `h - i` - return hash type of hasher (`sha256` - `0`, `sha512` - `1`, `blake2b` - `2`, `keccak256` - `3`, `keccak512` - `4`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that hardcoding magic numbers is always a bad practice. There are many algorithms with different names and for sure in the future the list of them will be replenished. If at this stage it is assumed that the ID for the hasher will be chosen purely for "historical reasons", then for many it will cause misunderstanding. I suggest using the same function instead of ordinal numbers as for determining the get-methods ID in smart contracts
|
||
## TVM update | ||
### New stack type | ||
This proposal suggests to introduce new type in TVM `Hasher`: it is opaque type used to consequently hash data chunks. This type behave the same way as others and can be considered immutable: `HASHAPP*` ops (described below) "consume" old hasher and return updated one (that way if old hasher was copied, it's copies are completely independent). This object is not serializable and thus can not be returned from TVM (for instance as result of get-method). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see new Hasher object as a not a so good idea.
IMO ideology of TON is to keep everything as much universal as
possible, e.g everything is a cell, wallets = smart contracts, etc.
Maybe we can implement other hashes in the same way
as HASHCU
/HASHSU
works?
I don't actually think that it will be less efficient, I think in the most of use cases we will already have data cell/slice to hash as input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I have an alternative approach which will be same efficent as Hasher, but without introducing new type.
Lets imagine we have opcode HASH_SHA512
which will accept number of stack elements to hash as first stack element (s0
).
So if we want to hash 1 uint256 and 1 slice, currently we will do something like:
HASHSTART_SHA512
PUSHINT 7777777777
HASHAPPU
PUSHSLICE {001010010101}
HASHAPPS
HASHEND
(stack ops are omitted)
But we can do it without hasher type, like:
PUSHSLICE {001010010101}
PUSHINT 7777777777
PUSHINT 2
HASH_SHA512
So we read first stack value, it is 2, it means that we need to read 2 more values and hash them, in reverse order.
This way i think it will be much more clear, and keep architecture cleaner and even faster :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure which "universality" is broken here.
We are looking for the methods which can be used to hash large chunks of data, like 100kB. Approach with hasher
objects allows to copy partially hashed data and cheaper get digest of data with the same prefix (not sure that we practically need it though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By universality I mean that we introducing new type which cannot be returned and serialized to cell.
Not sure too, where we can use the feature with copying hasher during single contract call, since it cannot be saved and reused later.
I could propose 3 options here:
- Do it without hasher object and without copy feature, HASH_SHA512 approach proposed 1 message before, looks more efficent and smaller (less stack ops needed).
- Add cell [de]serialization to hasher, then in could be saved, returned and accepted as input, this way prefix feature has more potential I think. And it is actually not a problem to serialize mid state of hasher and unpack later.
- Combine 2 approaches, add something like
HASH_SHA512_STATE
which will put cell with hasher state to stack. And add HASH_SHA512 which will have a flag in opcode, accept hasher or not, example:
PUSHSLICE {001010010101}
PUSHINT 7777777777
PUSHINT 2
HASH_SHA512_STATE <- put cell with mid state of hasher to stack
PUSHSLICE {11110}
PUSHINT 1
PUSH s2 <- push hasher state
HASH_SHA512_END <- computes final hash which consists from 3 values, 2 of which was in state
This way it will be smaller and consume less stack ops i think. Maybe it can even accept tuple instead of args, then it will be 1 push, without push of len.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure which "universality" is broken here.
We are looking for the methods which can be used to hash large chunks of data, like 100kB. Approach withhasher
objects allows to copy partially hashed data and cheaper get digest of data with the same prefix (not sure that we practically need it though).
how you can hash 100kb data on a blockchain? It requires much more gas that possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how you can hash 100kb data on a blockchain? It requires much more gas that possible
Currently yes, but it doesn't look like intended behavior: things which are cheap to nodes (and hashing is cheap) should not be prohibitively expensive for contracts.
This proposal suggests to | ||
1. Extend **c7** tuple from 10 to 14 elements: | ||
- **10**: code of the smart contract. | ||
- **11**: value of the incoming message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we have it on stack when processing internal message, and it is ok, i don't think we need to move it to c7
at least because we have it only for internal messages, stack looks pretty fine. Less universal and not so useful imo.
1. Extend **c7** tuple from 10 to 14 elements: | ||
- **10**: code of the smart contract. | ||
- **11**: value of the incoming message. | ||
- **12**: fees collected in the storage phase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just adjust the balance by this fee?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean here. Contract balance which can be retrieved by get_balance()
(and stored in 7th element of c7) is already balance after storage fee deducted. Here we want to give contract an option to account storage fee if necessary.
Probably it is a good idea to add additional opcodes:
|
I propose to also add opcodes for modular aritmethics & working with elliptic curves, so:
|
To further simplify long arithmetic operations it is suggested to define
|
Strongly recommend adding anything EC-related as Ristretto API with Merlin: Ristretto uses the same Curve25519 and its privkeys can be mapped to compatible Ed25519/X25519 pubkeys. |
Would be nice to have perfect encapsulation for recursive execution of continuations. This would allow us to cheaply execute untrusted code in order to get simple "yes/no" or numeric result while providing asset safety guarantees to the users. E.g. we should bring Use-case: predictable and safe standardized Jetton code that allows embedding policy as executable code. That code has static (known AOT) gas limits and can only reply "allow/deny" — that is, cannot mess up accounting. |
Yes, having something like "RUNVM" that would isolate the contract and its stack is very, very important since, in the TACT compiler, we are executing "init" functions that could theoretically become a vulnerability. |
Please, can we just avoid esoteric zkp in ton. There are a lot of tools for ETH and we don't need to slow down innovation here. |
I disagree on that take. Ristretto is the opposite of "esoteric zkp". It is a rectification of the original Curve25519 design. Let me clarify. Curve25519 is designed for speed, but does not give prime-order group out of the box. Since it was needed for only two protocols at the time — Schnorr signatures and Diffie-Hellman key exchange, — its author used a couple of hacks to make those specific protocols work. One of them is crude multiplication by 8 that kills the cofactor and eliminates specific risks of small subgroup attacks. Due to these hacks, a whole lot of things relevant to blockchains are ±broken with Ed25519/X25519 protocols:
What is Ristretto?
Compared to the the rest of the ZKP scene, Curve25519 uses conservative assumptions (no pairings, pure discrete log hardness), its design is battle-tested for 11 years and has been reviewed by possibly more eyes than all other ZKP projects' combined. |
To give feedback on suggestions:
Discussion about special curve operation (except signature checks) are currently postponed until we understand how cheap those operations can be on current tvm. |
Almost nothing makes sense in this message. Why use curve25591 if we need to integrate with existing tools and research of ZKP? There are no reason not to support other curves. |
zkp and zk-snark are very important in order to be able to implement private transactions and thus private workchains. Edit: ristretto vs libsecp vs monero benchmark and xk-snark vs zk-stark vs bulletproof. |
There is a need for commiting |
Current and it is expected final list of opcode additions to TVM. Gas prices are preliminary. Opcodes to work with new c7 values26 gas for each
Gas
Arithmetics26 gas for each, except quiet and shift operations, 34 gas for them
Stack operationsRemove limitations of working with top 256 elements of stack only
HashesThe following hashes are added:
Crypto
Ristretto Libsodium implementation
BLS: operations on pairing friendly BLS12-381 curve. BLST implementation is used. Also ops for BLS (another one) signature scheme.
RUNVM
Sending messages
|
|
|
In the recent TVM Upgrade (#686) a new useful opcode was added:
which actually replaces the opcode:
Which is still described in the document tvm.pdf but has not yet been implemented. Please reveal the details of the reasoning behind the decision. In my research, I realized that this code is implemented in an alternative implementation of TVM. I understand that there are no obvious reasons to strive for full compatibility of all possible implementations, especially since the process should be arranged in reverse, given that the reference implementation is here. At the same time, we have the development of TVM technology in more than two third-party projects, it may be useful to take care of compatibility, thus supporting the spread of the new general term TVM-based blockchain. I'm interested in learning more about the strategic plans for the development of the virtual machine and the criteria on the basis of which decisions are made. I also think there should be one source of truth, for example tvm.pdf. in the case of the |
These two instructions do not replace each other (well they can even help each other actually). However, in contrast to |
@EmelyanenkoK Thanks for the detailed explanation. |
@ilyar |
This proposal is still work in progress and not scheduled even for testing in public testnet: we are figuring out what else updates in TVM can be needed in the future.