Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEP: TVM update #88

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

EmelyanenkoK
Copy link
Member

@EmelyanenkoK EmelyanenkoK commented Sep 7, 2022

This proposal is still work in progress and not scheduled even for testing in public testnet: we are figuring out what else updates in TVM can be needed in the future.


## TVM update
### New stack type
This proposal suggests to introduce new type in TVM `Hasher`: it is opaque type used to consequently hash data chunks. This type behave the same way as others and can be considered immutable: `HASHAPP*` ops (described below) "consume" old hasher and return updated one (that way if old hasher was copied, it's copies are completely independent). This object is not serializable and thus can not be returned from TVM (for instance as result of get-method).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to use the name "Digest" as in popular languages like Java or GoLang, because this is a more abstract name for hashing/checksum calculation algorithms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like "digest" refers to result of hashing, here we use Hasher for intermediate object with some internal state.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this state could be implemented via a generic slice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, any state can be can be represented as some state. The question do we really need it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4304417

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UQAP7O1Pm6kHiSerD6L52UIXPYyeYId2ha5WsHWPwOCx_-3p

`HASHENDST` - `0xF905` - `b h - b' ` - calculate hash from hasher and store it into the slice.


`HASHINFO` - `0xF906` - `h - i` - return hash type of hasher (`sha256` - `0`, `sha512` - `1`, `blake2b` - `2`, `keccak256` - `3`, `keccak512` - `4`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that hardcoding magic numbers is always a bad practice. There are many algorithms with different names and for sure in the future the list of them will be replenished. If at this stage it is assumed that the ID for the hasher will be chosen purely for "historical reasons", then for many it will cause misunderstanding. I suggest using the same function instead of ordinal numbers as for determining the get-methods ID in smart contracts


## TVM update
### New stack type
This proposal suggests to introduce new type in TVM `Hasher`: it is opaque type used to consequently hash data chunks. This type behave the same way as others and can be considered immutable: `HASHAPP*` ops (described below) "consume" old hasher and return updated one (that way if old hasher was copied, it's copies are completely independent). This object is not serializable and thus can not be returned from TVM (for instance as result of get-method).
Copy link
Contributor

@xssnick xssnick Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see new Hasher object as a not a so good idea.

IMO ideology of TON is to keep everything as much universal as
possible, e.g everything is a cell, wallets = smart contracts, etc.

Maybe we can implement other hashes in the same way
as HASHCU/HASHSU works?

I don't actually think that it will be less efficient, I think in the most of use cases we will already have data cell/slice to hash as input.

Copy link
Contributor

@xssnick xssnick Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I have an alternative approach which will be same efficent as Hasher, but without introducing new type.

Lets imagine we have opcode HASH_SHA512 which will accept number of stack elements to hash as first stack element (s0).

So if we want to hash 1 uint256 and 1 slice, currently we will do something like:

HASHSTART_SHA512
PUSHINT 7777777777
HASHAPPU
PUSHSLICE {001010010101}
HASHAPPS
HASHEND

(stack ops are omitted)

But we can do it without hasher type, like:

PUSHSLICE {001010010101}
PUSHINT 7777777777
PUSHINT 2
HASH_SHA512

So we read first stack value, it is 2, it means that we need to read 2 more values and hash them, in reverse order.

This way i think it will be much more clear, and keep architecture cleaner and even faster :)

Copy link
Member Author

@EmelyanenkoK EmelyanenkoK Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure which "universality" is broken here.
We are looking for the methods which can be used to hash large chunks of data, like 100kB. Approach with hasher objects allows to copy partially hashed data and cheaper get digest of data with the same prefix (not sure that we practically need it though).

Copy link
Contributor

@xssnick xssnick Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By universality I mean that we introducing new type which cannot be returned and serialized to cell.

Not sure too, where we can use the feature with copying hasher during single contract call, since it cannot be saved and reused later.

I could propose 3 options here:

  1. Do it without hasher object and without copy feature, HASH_SHA512 approach proposed 1 message before, looks more efficent and smaller (less stack ops needed).
  2. Add cell [de]serialization to hasher, then in could be saved, returned and accepted as input, this way prefix feature has more potential I think. And it is actually not a problem to serialize mid state of hasher and unpack later.
  3. Combine 2 approaches, add something like HASH_SHA512_STATE which will put cell with hasher state to stack. And add HASH_SHA512 which will have a flag in opcode, accept hasher or not, example:
PUSHSLICE {001010010101}
PUSHINT 7777777777
PUSHINT 2
HASH_SHA512_STATE  <- put cell with mid state of hasher to stack

PUSHSLICE {11110}
PUSHINT 1
PUSH s2 <- push hasher state
HASH_SHA512_END <- computes final hash which consists from 3 values, 2 of which was in state

This way it will be smaller and consume less stack ops i think. Maybe it can even accept tuple instead of args, then it will be 1 push, without push of len.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure which "universality" is broken here.
We are looking for the methods which can be used to hash large chunks of data, like 100kB. Approach with hasher objects allows to copy partially hashed data and cheaper get digest of data with the same prefix (not sure that we practically need it though).

how you can hash 100kb data on a blockchain? It requires much more gas that possible.

Copy link
Member Author

@EmelyanenkoK EmelyanenkoK Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how you can hash 100kb data on a blockchain? It requires much more gas that possible

Currently yes, but it doesn't look like intended behavior: things which are cheap to nodes (and hashing is cheap) should not be prohibitively expensive for contracts.

This proposal suggests to
1. Extend **c7** tuple from 10 to 14 elements:
- **10**: code of the smart contract.
- **11**: value of the incoming message.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we have it on stack when processing internal message, and it is ok, i don't think we need to move it to c7 at least because we have it only for internal messages, stack looks pretty fine. Less universal and not so useful imo.

1. Extend **c7** tuple from 10 to 14 elements:
- **10**: code of the smart contract.
- **11**: value of the incoming message.
- **12**: fees collected in the storage phase.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just adjust the balance by this fee?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean here. Contract balance which can be retrieved by get_balance() (and stored in 7th element of c7) is already balance after storage fee deducted. Here we want to give contract an option to account storage fee if necessary.

@EmelyanenkoK
Copy link
Member Author

Probably it is a good idea to add additional opcodes:

  • GLOBALID (can be added to c7), this will allow to implement cross-chain replay protection
  • SETBOUNCEONACTIONPHASEFAIL - set/unset transaction executor mode that will bounce on ActionPhase fail

@dariotarantini
Copy link

dariotarantini commented Oct 27, 2022

I propose to also add opcodes for modular aritmethics & working with elliptic curves, so:

  • MULMOD, ADDMOD, SUBMOD, EXPMOD
  • ECADD, ECMUL
    This would make ECDSA implementation much easier and cheaper than doing everything in FunC.

@EmelyanenkoK
Copy link
Member Author

To further simplify long arithmetic operations it is suggested to define d=0 mode in A9mscdf op-code the following way:
it is equivalent to d=3, however prior to division or shift additional integer number w from the stack is added to the intermediate number.
For instance

  • MULDIV (xy)/z transforms to (xy+w)/z - MULADDDIV
  • MULDIVMOD transforms to MULADDDIVMOD - (x y w z -- q r), q = floor((xy+w)/z), r = xy+w-zq

@oleganza
Copy link

oleganza commented Dec 20, 2022

Strongly recommend adding anything EC-related as Ristretto API with Merlin:

Ristretto uses the same Curve25519 and its privkeys can be mapped to compatible Ed25519/X25519 pubkeys.

@oleganza
Copy link

Would be nice to have perfect encapsulation for recursive execution of continuations. This would allow us to cheaply execute untrusted code in order to get simple "yes/no" or numeric result while providing asset safety guarantees to the users.

E.g. we should bring RUNVM (or something like that) to TVM that remembers the entire state of the VM on a separate stack, restores them after execution and then returns the nested VM results for introspection in the outer program.

Use-case: predictable and safe standardized Jetton code that allows embedding policy as executable code. That code has static (known AOT) gas limits and can only reply "allow/deny" — that is, cannot mess up accounting.

@ex3ndr
Copy link

ex3ndr commented Dec 20, 2022

Yes, having something like "RUNVM" that would isolate the contract and its stack is very, very important since, in the TACT compiler, we are executing "init" functions that could theoretically become a vulnerability.

@ex3ndr
Copy link

ex3ndr commented Dec 20, 2022

Strongly recommend adding anything EC-related as Ristretto API with Merlin:
https://ristretto.group/
https://merlin.cool/
Ristretto uses the same Curve25519 and its privkeys can be mapped to compatible Ed25519/X25519 pubkeys.

Please, can we just avoid esoteric zkp in ton. There are a lot of tools for ETH and we don't need to slow down innovation here.

@oleganza
Copy link

oleganza commented Dec 21, 2022

Please, can we just avoid esoteric zkp in ton. There are a lot of tools for ETH and we don't need to slow down innovation here.

I disagree on that take. Ristretto is the opposite of "esoteric zkp". It is a rectification of the original Curve25519 design.

Let me clarify.

Curve25519 is designed for speed, but does not give prime-order group out of the box. Since it was needed for only two protocols at the time — Schnorr signatures and Diffie-Hellman key exchange, — its author used a couple of hacks to make those specific protocols work. One of them is crude multiplication by 8 that kills the cofactor and eliminates specific risks of small subgroup attacks.

Due to these hacks, a whole lot of things relevant to blockchains are ±broken with Ed25519/X25519 protocols:

  • Deterministic key derivation (BIP32) is not possible due to non-linear bit-twiddling in Ed25519.
  • Batch signature verification is inconsistently defined.
  • DH scheme uses different format for the keys than the Schnorr scheme.
  • Performing ElGamal encryption and Schnorr proofs on it can't be cleanly composed.
  • Advanced ZKP like Bulletproofs won't work.

What is Ristretto?

  • It is a mathematically clean prime-order group.
  • It is a pair of short encoding/decoding functions (20 LOC each) over Curve25519 that uses the same point format as in Ed25519 under-the-hood.
  • All the performance gains and under-the-hood implementations of Curve25519 remain.
  • You can use the same key pair cleanly for signatures, D-H, ElGamal, and key derivation.
  • Things compose without hacks.
  • You can always convert Ristretto keypair to the existing Ed25519 keys for compatibility.

Compared to the the rest of the ZKP scene, Curve25519 uses conservative assumptions (no pairings, pure discrete log hardness), its design is battle-tested for 11 years and has been reviewed by possibly more eyes than all other ZKP projects' combined.

@EmelyanenkoK
Copy link
Member Author

To give feedback on suggestions:
as it is now, we plan to

  1. discard "hashers" in favor of hash operations which pull data elements from the stack
  2. global_id and bounce-on-action-fail opcodes
  3. add ADDDIV operations
  4. runvm opcode with signature close to 'runvmx' in fift

Discussion about special curve operation (except signature checks) are currently postponed until we understand how cheap those operations can be on current tvm.

@ex3ndr
Copy link

ex3ndr commented Dec 21, 2022

Please, can we just avoid esoteric zkp in ton. There are a lot of tools for ETH and we don't need to slow down innovation here.

I disagree on that take. Ristretto is the opposite of "esoteric zkp". It is a rectification of the original Curve25519 design.

Let me clarify.

Curve25519 is designed for speed, but does not give prime-order group out of the box. Since it was needed for only two protocols at the time — Schnorr signatures and Diffie-Hellman key exchange, — its author used a couple of hacks to make those specific protocols work. One of them is crude multiplication by 8 that kills the cofactor and eliminates specific risks of small subgroup attacks.

Due to these hacks, a whole lot of things relevant to blockchains are ±broken with Ed25519/X25519 protocols:

  • Deterministic key derivation (BIP32) is not possible due to non-linear bit-twiddling in Ed25519.
  • Batch signature verification is inconsistently defined.
  • DH scheme uses different format for the keys than the Schnorr scheme.
  • Performing ElGamal encryption and Schnorr proofs on it can't be cleanly composed.
  • Advanced ZKP like Bulletproofs won't work.

What is Ristretto?

  • It is a mathematically clean prime-order group.
  • It is a pair of short encoding/decoding functions (20 LOC each) over Curve25519 that uses the same point format as in Ed25519 under-the-hood.
  • All the performance gains and under-the-hood implementations of Curve25519 remain.
  • You can use the same key pair cleanly for signatures, D-H, ElGamal, and key derivation.
  • Things compose without hacks.
  • You can always convert Ristretto keypair to the existing Ed25519 keys for compatibility.

Compared to the the rest of the ZKP scene, Curve25519 uses conservative assumptions (no pairings, pure discrete log hardness), its design is battle-tested for 11 years and has been reviewed by possibly more eyes than all other ZKP projects' combined.

Almost nothing makes sense in this message. Why use curve25591 if we need to integrate with existing tools and research of ZKP? There are no reason not to support other curves.

@alfredonodo
Copy link

alfredonodo commented Dec 22, 2022

zkp and zk-snark are very important in order to be able to implement private transactions and thus private workchains.

Edit: ristretto vs libsecp vs monero benchmark and xk-snark vs zk-stark vs bulletproof.

@ex3ndr
Copy link

ex3ndr commented Feb 22, 2023

There is a need for commiting c4 (contract data) without commiting c5 and bouncing a message in the case of failure after commit. This is required for lazy deployment to make it transparent for users.

@EmelyanenkoK
Copy link
Member Author

Current and it is expected final list of opcode additions to TVM. Gas prices are preliminary.

Opcodes to work with new c7 values

26 gas for each

  • MYCODE - retrieve code of smart-contract from c7
  • INCOMINGVALUE - retrieve value of incoming message from c7
  • STORAGEFEES - retrieve value of storage phase fees from c7
  • PREVBLOCKSINFOTUPLE - retrive PrevBlocksInfo: [last_mc_blocks, prev_key_block] from c7
  • PREVMCBLOCKS - retrive only last_mc_blocks
  • PREVKEYBLOCK - retrieve only prev_key_block
  • GLOBALID - retrieve global_id from 19 network config

Gas

  • GASCONSUMED - returns gas consumed by VM so far - 26 gas

Arithmetics

26 gas for each, except quiet and shift operations, 34 gas for them

  • ADDDIVMOD, ADDDIVMODR, ADDDIVMODC, MULADDDIVMOD, MULADDDIVMODR, MULADDDIVMODC, LSHIFTADDDIVMOD, LSHIFTADDDIVMODR, LSHIFTADDDIVMODC, LSHIFT#ADDDIVMOD, LSHIFT#ADDDIVMODR, LSHIFT#ADDDIVMODC, QADDDIVMOD, QADDDIVMODR, QADDDIVMODC - flavour of division instruction, which add a number to the intermediate value before division (e.g. (xy+w)/z) operation

Stack operations

Remove limitations of working with top 256 elements of stack only

  • Arguments of PICK, ROLL, ROLLREV, BLKSWX, REVX, DROPX, XCHGX, CHKDEPTH, ONLYTOPX, ONLYX are now unlimited.
  • ROLL, ROLLREV, BLKSWX, REVX, ONLYTOPX consume more gas when arguments are big: additional gas consuming is max(arg-255,0) (for argument less than 256 the gas consumption is constant and corresponds to current behavior)

Hashes

The following hashes are added:

  • SHA256 - openssl implementation, 1/33 gas per byte
  • SHA512 - openssl implementation, 1/16 gas per byte
  • BLAKE2B - openssl implementation, 1/19 gas per byte,
  • KECCAK256 - ethereum compatible implementation http://keccak.noekeon.org/, 1/11 gas per byte
  • KECCAK512 - ethereum compatible implementation http://keccak.noekeon.org/, 1/6 gas per byte

Crypto

Ristretto Libsodium implementation

  • RIST255_FROMHASH - deterministic generation of valid point from some hash, 600 gas
  • RIST255_VALIDATE - checking that integer is valid x-coordinate of the curve point, 200 gas
  • RIST255_ADD, RIST255_SUB - addition/subtraction of points on curve, 600 gas
  • RIST255_MUL - scalar multiplication, 2000 gas
  • RIST255_MULBASE - scalar multiplication on base point, 750 gas
  • RIST255_PUSHL - push order of Ristretto group (specific constant), 26 gas

BLS: operations on pairing friendly BLS12-381 curve. BLST implementation is used. Also ops for BLS (another one) signature scheme.

  • BLS_VERIFY - (slice key, slice msg, slice signature -> bool) - checks BLS signature - 61300 gas
  • BLS_AGGREGATE - (slice sign_n, ... slice sign_1, int n -> slice sign_aggr) - aggregate signatures - gas = -2645 + n * 4355; (n>0)
  • BLS_FASTAGGREGATEVERIFY- (slice key_n, ..., slice key_1, int n, slice msg, slice signature -> bool) - checks BLS signature for set of keys - gas = 58400 + n * 2990
  • BLS_AGGREGATEVERIFY - (slice key_n, ..., slice key_1, slice msg_n, ..., slice msg_1, int n, slice signature -> bool) - checks BLS signature for set of keys - gas = 37275 + n * 22290
  • BLS_G1ADD/BLS_G1_SUB - 3925 gas
  • BLS_G1_NEG - 765 gas
  • BLS_G1MUL - 5180 gas
  • BLS_G1MULTIEXP - gas = 11375 + n * 630 + n/floor(max(log2(n), 4)) * 8820
  • BLS_G1_ZERO - 34 gas
  • BLS_MAP_TO_G1 - 2330 gas
  • BLS_G1_INGROUP - 2930 gas
  • BLS_G1_ISZERO - 34 gas
  • BLS_G2ADD/BLS_G2_SUB - 6100 gas
  • BLS_G2_NEG - 1550 gas
  • BLS_G2MUL - 10530 gas
  • BLS_G2MULTIEXP- gas = 30388 + n * 1280 + n/floor(max(log2(n), 4)) * 22840
  • BLS_G2_ZERO - 34 gas
  • BLS_MAP_TO_G2 - 7970 gas
  • BLS_G2_INGROUP - 4255 gas
  • BLS_G2_ISZERO - 34 gas
  • BLS_PAIRING - gas = 20000 + n * 11770

RUNVM

40 gas units

  • RUNVM, RUNVMX - signature of function is equivalent to Fift's RUNVM(X): run vm in accordance to flags. RUNVM gets flags from instruction, RUNVMX gets flags from the stack. Flags:
// Mode:
// +1 = same_c3 (set c3 to code)
// +2 = push_0 (push an implicit 0 before running the code)
// +4 = load c4 (persistent data) from stack and return its final value
// +8 = load gas limit from stack and return consumed gas
// +16 = load c7 (smart-contract context)
// +32 = return c5 (actions)
// +64 = pop hard gas limit (enabled by ACCEPT) from stack as well
// +128 = isolated gas consumption (separate set of visited cells, reset chksgn counter)
// +256 = pop number N, return exactly N values from stack (only if res=0 or 1; if not enough then res=stk_und)

Sending messages

  • SENDMSG takes a cell and mode as input. Creates an output action and returns a fee for creating a message. Mode has the same effect as in the case of SENDRAWMSG. Additionally +1024 means - do not create an action, only estimate fee. Other modes affect the fee calculation as follows: +64 substitutes the entire balance of the incoming message as an outcoming value (slightly inaccurate, gas expenses that cannot be estimated before the computation is completed are not taken into account), +128 substitutes the value of the entire balance of the contract before the start of the computation phase (slightly inaccurate, since gas expenses that cannot be estimated before the completion of the computation phase are not taken into account).
  • SENDRAWMSG,RAWRESERVE,SETLIBRARY - +16 flag is added, that means in the case of action fail - bounce transaction. Won't work if +2 is used

@ex3ndr
Copy link

ex3ndr commented Apr 27, 2023

  1. Isn't sha256 already exist? Whats the difference?
  2. Since we are adding more curves, is it possible to add any classical zkp one?

@EmelyanenkoK
Copy link
Member Author

  1. Current sha256 is limited to 127 bytes. New opcodes allows hashing of arbitrary long bytestrings consumed from stack as slices
  2. We consider BLS12-381 as most common now (except maybe bn254?). BLS12-381 is more secure and performant than bn254 while can be used the same way

@ilyar
Copy link

ilyar commented Sep 15, 2023

In the recent TVM Upgrade (#686) a new useful opcode was added:

x{F802} @Defop GASCONSUMED
// ( -- g_c) returns gas consumed by VM so far 26 gas

which actually replaces the opcode:

x{F802} @Defop BUYGAS
// (x -- ) computes the amount of gas that can be bought for x nanograms,
// and sets g_l accordingly in the same way as SETGASLIMIT

Which is still described in the document tvm.pdf but has not yet been implemented.

Please reveal the details of the reasoning behind the decision.

In my research, I realized that this code is implemented in an alternative implementation of TVM.

I understand that there are no obvious reasons to strive for full compatibility of all possible implementations, especially since the process should be arranged in reverse, given that the reference implementation is here.

At the same time, we have the development of TVM technology in more than two third-party projects, it may be useful to take care of compatibility, thus supporting the spread of the new general term TVM-based blockchain.

I'm interested in learning more about the strategic plans for the development of the virtual machine and the criteria on the basis of which decisions are made.

I also think there should be one source of truth, for example tvm.pdf. in the case of the GASCONSUMED opcode, it should not take the place of BUYGAS because the latter took it earlier in the specification according to which other implementations can be implemented.

@EmelyanenkoK
Copy link
Member Author

EmelyanenkoK commented Sep 18, 2023

GASCONSUMED and unimplemented yet BUYGAS serves different purposes.
GASCONSUMED allows you to understand how much gas was already used. It doesn't change gas limits or anything, just adds integer to stack

BUYGAS increase gas limits by amount of gas equivalent to some amount in TONs.

These two instructions do not replace each other (well they can even help each other actually). However, in contrast to GASCONSUMED which can not be implemented in current TVM, BUYGAS can be emulated by slightly more expensive sequence of other existing operations, thus implementing of BUYGAS was not a high priority.

@ilyar
Copy link

ilyar commented Sep 18, 2023

@EmelyanenkoK Thanks for the detailed explanation.
There is also a thinning why the GASCONSUMED opcode uses code x{F802}? This code is described in the specification as a BUYGAS opcode, it would be logical for a new opcode to use a new code, for example x{F807} from reserved for gas-related primitives.

@EmelyanenkoK
Copy link
Member Author

@ilyar GASCONSUMED was changed to x{F807} for compatibility reasons: ton-blockchain/ton@030ebaf
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.