diff --git a/agenda/agenda_3.md b/agenda/agenda_3.md index 4373226..01f464f 100644 --- a/agenda/agenda_3.md +++ b/agenda/agenda_3.md @@ -1,13 +1,419 @@ -**Meeting Info** -- February 17, 2023 19:00 UTC -- Duration: 30 minutes -- Zoom: To be shared in the #core-community-call channel on Solana Tech Discord +Solana Core Community Call - Feb 17th, 2023 -**Agenda** +Time: Feb 17th, 2023 -- [SIMD-16 - Application account write fees](https://github.com/solana-foundation/solana-improvement-documents/pull/16) - [PR](https://github.com/solana-labs/solana/pull/30137) -- Timely Vote Credits - * [SIMD](https://github.com/solana-foundation/solana-improvement-documents/pull/33) - * [First PR](https://github.com/solana-labs/solana/pull/29524) +Link Video: +https://www.youtube.com/watch?v=XkkxQAF-HhE&list=PLilwLeBwGuK7e_mH_sFwTytYQxalh7xd5&index=1 +Speakers profiles + +Galactus from the Mango team:https://github.com/godmodegalactus + +Topic: Talking about application account right fees and the introduction +of a new kind of fee when interacting with DAPs. + +Philip from Fire Dancer: https://twitter.com/jump_firedancer + +Asked a question regarding who can call the rebate for application fees +and discussed the ownership of PDAs (Program-Derived Accounts). + +Jerry from Ellipsis: +https://www.linkedin.com/in/jerry-francis-8033a8225?originalSubdomain=tz + +Contributed to the discussion about who can call the rebate for +application fees and clarified the process for invoking signed +instructions for PDAs. + +Zentatsu: + +Presented on Simd 33, which is about \"timely vote credits\" to address +the issue of some validators achieving higher vote credits by delaying +votes and surveying forks before voting. + +Richard from Finance JP Team:https://www.linkedin.com/in/rman/ + +Ashwin from Solana Labs:https://github.com/AshwinSekar + +Key concepts mentioned: + +SIMD 16 Application Write Fees - https://github.com/solana-foundation/.. + +SIMD 33 Timely Vote Credits - https://github.com/solana-foundation/... + +Solana runtime - +https://docs.solana.com/developing/programming-model/runtime + +Welcome all to the Core Communitycall this month. Today we\'ll have +Galactus from the Mango team talking about 70, 16 which is application +account right fees and then discussing Cindy 33 which is the timely vote +credits keeping in mind time. + +I like to keep or I\'d like to cap each discussion about 15 minutes each +to give each group their ability to present and discuss with the rest of +the core team but Galactus would you like to go ahead and start. + +Can you guys hear me well? Okay that\'s great! Thank you Jacob. + +So today I will present to you the application fees. I\'ll just share my +screen. I\'m a gym character from mango and today I will present to you +application fees. We want to introduce a new kind of piece. It is like a +fee when you interact with the DAP and this is actually will go to the +DAPAuthority like the account controlled by the dab instead of going to +the validator itself. There will be a rebate mechanism where the DAP can +actually give back the fees to the user so that the overall transaction +of the cluster won\'t increase. These kind of fees actually will be paid +even if the transaction eventually fails. + +So why do we want to introduce this application piece? So right now what +we see in the cluster is like in Solana transaction scheduler, we put +transactions into batches and each batch is executed parallelly. So what +happens is like a batch will take locks on its accounts. For example, +right locks and so no other batch can actually use the same account in +the same way at the same time. So these transactions are read right or +even forwarded to the next validator so if this actually creates an +incentive to span the whole spam the whole cluster. For example, like +high-frequency market-makers, they will just create new transactions +spamming the network. Actually, this will create more and more +transactions that will actually touch the same kind of accounts and +eventually it will choke the validator with these transactions. So we +want to discourage this kind of behavior of spamming and even there are +custom programs which will only CPI into the dab if they can extract +profit. + +To discuss this kind of behavior and to encourage actually creating +proper and valid transactions we want to introduce this kind of fee. +Eventually this will reduce congestion in the Solana Network thank you +and actually Dapps will also have the ability to collect extra fees with +this proposal so these are the resources we have like the Simd 16. We +have also created a Pock and adapted based on the puck. We want to +introduce a new native Solana program called application fees program +with this program ID. I will just talk a little bit about three bits so +the owner of the application fee like the owner of the the account can +issue a rebate by CPI into like CPI into a special instruction like they +can issue a partial or full rebate to the fees and the rebate will be +transferred back to the pair in the same at the end of the transaction. +Actually if the authority issues the rebate but eventually if the +transaction fails then there will be no rebates. So we have multiple +ways to implement this feature: the first thing was storing the +application fees on The Ledger. This is the implementation done in the +park but recently there were some complications like calculating the +fees because we have to load all the accounts and everything. + +So we have decided to go more like a setting application fee by +instruction so I will just explain more about this. About setting, we +have three instructions principally: one is pay application fee, in +which pair agrees to pay application fee for a dab for a certain account +then the dab actually can use; the second instruction check application +fee where it just check for an account if this amount is paid or not; +and the third, which is debate where that can rebate the user its +application history like the fees so the pay application fee +instructions can take multiple accounts. + +So you can give a list of accounts and passes in accounts and the list +of fees you want to pay for each account as an array of events. So the +instruction must include this instruction of interacting with Dapps, +which have implemented an application fee feature. + +Even if the transaction fails, the pair will end up paying this +application fees, where they could even pay the application fee if like +he include this instruction. But actually it was not needed or he may +overpay the fees, this instruction is used by the runtime - Solana +runtime. So you cannot CPI into this instruction from another smart +contract and you can actually include multiple accounts belonging to +different apps. This won\'t be an issue. + +So the second instruction is check application fees instruction where +Dapps can issue CPI into to check if they have paid application fees. +Here you have one account and where you want to, where we expect to have +an application fee and the expected application fee. If the application +fee is partially paid or not paid the instruction will give an error +written or an error application fee not paid error. If it\'s fully or +overpaid it will return okay. In case of partial payment the user will +lose the amount he has paid as application fee. The instruction which +can be called multiple times across multiple instructions. + +It won\'t be an issue with Jim Galactus just being mindful of time. We +have five minutes, you have to cut down. + +I know that there\'s been a decent amount of discussion on this MD and +just like to open the floor for anybody with questions that have read +this MD or have so far read the presentation. + +This is Phillip profile from fire dancer: I still wanted to ask you +about this: Who exactly can call to rebate? + +Actually, I understood your question. It was more like in the case of +pdas who really called the rebate right? That was your question +initially? So suppose if it\'s a PDA, I guess I have two different +interpretations of what you\'ve been saying. One is if it\'s a PDA then +the program that can sign for the PDA can call rebate exactly and other +times you\'ve talked about the owner of the account being able to call +the rebate. Sure, those are two different things and if it\'s an account +which is larger than a PDA then anyways I just have two different +understandings and I still don\'t feel like I have full clarity on which +one you mean. + +Okay so actually if you check the metadata of an account like account +metadata, you always have an owner associated with each account. +Usually, when it\'s a key pair like a normal key pair it\'s the same +owner even for the PDA. But with assign instructions you can transfer +the ownership of the account to some other keypad like some other +entity. So in that case they have to sign this transaction, they have to +sign this instruction to initiate the rebate. + +If you have a PDA and the PDA is a token account then the owner of the +PDA is the token program exactly one you can sign for it is your +program? + +No, it\'s actually the token program because a token program has to +invoke science like invoke sign instruction to sign the PDA, which you +cannot sign on behalf of the token program. The PDF you see doesn\'t +think that\'s right but maybe I\'m mistaken. I think it\'s where you +derive the PDA from that program ID is the one that can evoke a sign but +I\'ll double check on this and get back to you. + +We can discuss it more and this is Jerry from Ellipsis. Hey guys, so +based on the signing topic, I think Philip is correct when you have an +invoked sign occur. it\'s a PDA of the calling program not of whatever +program is being invoked. Let me think, I actually think it\'s like when +you create a PDA you like, and then you give it to a program ID this +PDA, and then you can just say invoke signed using foreign. Let me get +back on this or maybe I can explain how it works. I\'ve explained this a +number of times before about the way that it works. + +Just to make sure that we\'re all on the same semantically is you have a +program and you create a new public key or you create a new address that +is derived from that program ID but doesn\'t live on the curve. From +that program, you\'re able to call invoke signed into a downstream +program with the PDA as assigner so you just wanted to make sure +there\'s no confusion around like how that that process works here but +so we have we have the owner right we have the owner which is the +program which has created this PDA. + +Right in the end so we have a fixed owner of the of the PDA. So the +fixed owner of the PDA so when you say fixed owner do you mean the +system level owner or do you mean owner of this PDA account that has a +certain structure tied to it? right because when we\'re talking about +the owner of most pdas if you create a PDA from any if you have a PDAv +from any arbitrary program the owner is always going to start a system +program. There is never going to be like an owner that\'s something else +unless you explicitly allocate and assign. + +Okay, I Agree with you, the same program is the owner. So what are you +saying when you say fixed owner of PDA I\'m not quite following what +you\'re talking about there. I guess let me look into it more and I will +write more about that in this IMD. + +Is it okay with you guys? Alright. sounds good. I would love to see the +exact details for this so I Think i\'m being mindful of time. We\'ll +move on to the next person. Thank you thank you very much for further +discussion on this Cindy you can find the link I have posted in the chat +I\'ll post it again in the chat. We can both discuss there more as well +as in Discordv under core technology. + +I\'m moving on to the next one: let\'s go to Zentatsu with Simd 33. Hi +thanks, I\'m gonna share my screen if that\'s okay and can you please +tell me how much time I should expect to have here? Is it another 15 +minutes or is it about more than that or until the until he calls end so +roughly about 13 14 minutes. + +Okay, thank you, so I will share my screen. Oh shoot, well, I can\'t +share my screen because this is the first time I\'ve used this program +on my new computer, and I will have to go to some permission settings to +be able to do that. So, I hope I can just reference what I\'m talking +about, and people will be able to follow. So, I made a proposal quite a +while ago for this thing I call \'timely vote credits.\' The purpose +that I\'m speaking about this today is that I really just want to make +sure that anybody at Solana Labs or anyone else who\'s a Solana +developer, that is unfamiliar with the topic, can have some familiarity +and can ask me any questions. In addition, I was hoping that after I +make my case for the idea, that I maybe can get somebody at Sona Labs to +kind of be a champion of the change, to help me get it pushed through +the various processes that are required to get it there. I\'ve had a +pull request open, and it\'s kind of achieved attention sometimes, but I +don\'t think very consistently, and I think that it\'s\... I don\'t know +what the existing mechanism is for community contributions to the Solana +codebase, but I hope that there can be a process by which there\'s maybe +a champion for changes that can act on their behalf from Solana Labs +because from the outside, it\'s kind of hard to get traction without +that. Okay, so speaking about the change specifically, for a long time, +we\'ve noticed that the validator set kind of watches this since our\... +you know, a lot of what we do and the profit we want to try to make +comes from, you know, our vote credit, about credit achievements, that +there are some validators that are appearing to be achieving higher vote +credits essentially by delaying their votes and surveying the state of +forks before casting votes. Now, of course, all validators do this to +some degree because built into the Solana codebase and also built into +sort of sane voting practices is the idea of waiting a little bit if you +haven\'t seen consensus achieved on the fork that you\'re voting on, so +that you don\'t get too far ahead of consensus and then get locked very +up for a very long time should it end up being the case you\'re on the +wrong fork. So there\'s some natural amount of that that occurs. But +there\'s also, you know, within the domain of making that choice, +there\'s also explicitly waiting for a long time to make a vote to try +to make the most accurate vote. And we don\'t want validators to wait on +votes because that will slow down consensus, which will then slow down, +you know, user perception of transaction completion. So I propose this +idea where vote credits would be calculated based upon the latency of +the vote, which is how long it takes from the time that the vote is cast +to when it, or sorry, how the vote, the slot that\'s being voted on, how +long it takes for that vote to land on the chain. And I\'ve written that +up in CINDY 33, and I\'ve implemented a pull request for that, and I\'ve +gotten lots of great feedback from Sona Labs over months of time, and +I\'ve tried to incorporate all that feedback into the change that I\'ve +proposed. The current proposal is that it be done in stages, in several +changes because the change requires updating the actual vote account +state for all vote accounts because extra data needs to be stored in +vote accounts to track this latency, and that requires new space in vote +accounts, and so that adds complexities because, you know, obviously it +has to be done in a very careful manner so that nobody\'s prevented from +voting or vote transactions don\'t fail for validators because of bugs. +And also because expanding the size of vote accounts requires adding +lamps to achieve new rent-exemption levels, and there\'s a consideration +there. All of that is written up in the CINDY, I think. + +And also, I\'ve, you know, sometimes these long-running changes have a +lot of discussions that can be kind of hard to follow, so I hope that\'s +not the case with the change I have open right now. But I\'m more than +happy to answer questions about the change and also about the proposal. +If I had been able to share my screen, I\'d show the chart that I\'ve +been creating periodically where I have been looking at the Historical +vote State over every Epoch or many epochs, and then Computing the vote +credits that were achieved by validators and the vote credits that would +be achieved if this proposal were implemented. And having a table +showing sort of what the effects are, and from that table, I believe you +can see kind of who the laggers are because you can see which validators +have, say, the top four or five achieved vote credits every Epoch would +have significantly reduced credits each of them because a significant +fraction of their votes are delayed, and I think that is evidence that +this is what\'s happening. And in fact, I think that economically +speaking, the ideal voting pattern right now is to vote only on +finalized blocks because you\'ll get the maximum possible credits. And +if you\'re a private validator, since none of your votes will ever be, +you know, on a fork that ever has a chance of dying, so you\'ll never +get locked out and you\'ll always achieve maximum credits. And in fact, +the validator code base and the protocol will accept all those votes +because it accepts votes up to, I think, two or three hundred or four +hundred slots old, so it\'s very likely that you\'ll be able to land +every vote if you vote in that manner. But of course, that\'s a strategy +that would halt the cluster for everyone in it, so we don\'t want people +doing that, and this proposal is a way to ensure that that strategy and +strategies that would artificially delay any amount are less profitable +or potentially not profitable at all. So that\'s my presentation about +the topic. I\'m sorry I didn\'t share anything on the screen. Are there +any questions, and is there anybody who would be willing to partner up +with me on this? This topic has been somewhat beaten to death. I know +I\'ve proposed it many times. I\'ve talked about it in Discord many +times, so maybe everyone\'s already familiar with it, and if so, great. + +Then I guess the only thing left is, is there anybody that would be, +to\... I don\'t know how this works with small labs, how your +organization works with the community, but, you know, and I\'m not +demanding anything. It\'s not something I want to be in, you know, like +taking anybody\'s time from Salon Labs outside of what the organization +thinks is proper for that person. I\'m just hoping that there can be +some kind of collaboration here, just to touch on that real quick. So, +in some D1, we kind of outlined the best way to get your proposal pushed +into the codebase is to be able to write it yourself. So that\'s step +one. I know that you\'ve made some PRs, that\'s great. The second step +is that there\'s also a contributor access policy now for specifically +the Song Labs repos on the mono repo, and you could request triage +access to start working with other people to get this through. Other +than that, I would allow people from Solana Labs to talk about that +since you\'re going your target is the Solana Labs client for first +implementation, but overall in the long term, it would be multiple +client implementations. Hey, this is Ashwin from Solana Labs. I mean, +the first pull request looks almost done, just keep pinging me. I don\'t +know, GitHub notifications are pretty hard, so just keep pinging me in +Trent every time you have an update. I know it takes a while because we +move fast to resolve all the merge conflicts, but I think you\'re doing +everything right to get it on track. If we\'re slow to respond, you +know, just keep DMing me on Discord \[Music\]. I didn\'t know that that +was an option. I didn\'t want to pester anybody, but I\'m happy to +pester. So, I mean, I think he did it a couple of times in the past, and +I\'ve looked at it every time, so just keep me if you\'re not getting a +response. And I didn\'t see the 70. Maybe we should add some reviewers. +I\'ll add myself and Carl, and hopefully, we can get that merged too. +Great if there aren\'t any other obvious questions.\" + +There was a question raised by someone at Jp last time. I sort of +brought this up in a validator meeting, and that question, I think, was +about: does this create any kind of perverse voting incentives for +validators? Does it disturb the possibility of achieving consensus +because validators may decide that it\'s more profitable to vote, you +know, to commit more to forks and then potentially get locked out a lot +more? And what does that do? Does it make the blockchain brittle? And +that\'s an interesting discussion to have, and I\'m happy to have that. +I\'ve thought about it a lot, and I don\'t think it does because right +now, it can only make things better. Right now, like I said, the best +incentive is to simply not contribute a consensus and vote on finalized +blocks only. Or if you\'re not going to do that, if you\'re not, if you +don\'t have the stomach for something that you know anti-social, you can +choose your point of how far you want to wait before voting when you +otherwise could vote, and that\'s what the current what I call ligers +are doing, different ones doing it to different degrees, some of them +are, I think, much better at it, and they and as a result, they be much +less affected by this timely vote credits proposal, but all of them will +be to some degree.\" + +Hey Richard from finances jp team here, I\'m not sure where that concern +was specifically raised, but if anything, that was probably more like +out of caution since, as far as you know, Solano\'s consensus protocol +has not been formally analyzed, at least internally at jp, so we\'re +having our security auditors look at the current implementation first of +all and trying to achieve some formal proofs about how the network +reaches consensus. But certainly, I think the final answer team gave you +the commitment in Discord that we will implement your proposal. I guess +it\'s just a question for the way this indeed gets accepted about how +all this lands on mainnet. I think that is mostly out of our control, +but for now, we haven\'t worked on implementing tower BFT for the like +our current exist yet, so therefore it\'s also a bit tricky to get +started on the work now, but as soon as we do work on it, we can commit +to also implementing your proposal, and we really appreciate the work on +your site.\" + +No problem, I mean, I don\'t, I didn\'t mean to point a finger or +something. It\'s just, it wasn\'t a DM after one of the meetings, and +they expressed very valid concerns. I\'m not even trying to say it was +in some way inappropriate, but there were good concerns, and I just +think that if someone wants to talk about those, I\'m happy to talk +about them, that\'s all because I mean, like you said, it hasn\'t been +formally very formally analyzed, so I think there\'s some math you can +do to, you know, I\'m not that great at math, but some math you can do +to say, well, given the chance of, you know, a fork if I make this vote, +the chance that it\'s on the wrong Court versus the potential credit I +earn by making the vote. I think there\'s some self-referential function +or something that decides what the benefit is of making a vote, and of +course, we want that to be structured such that it\'s best for the +cluster, and I believe that what I\'m proposing is closer to that. But, +like you say, until formal analysis is done, I guess I can\'t prove +that. Yes, there\'s also a sorry, there\'s also the threshold check +which stops you from voting too far consecutively so you\'re still +within a balance of not locking yourself out and being detrimental to +the cluster.\" + +And may I point out, and you know, I\'m sure a lot of people know this +already, there are some of us validators who you know are voting and +adding more credits and sort of committing harder to forks and earning +more vote credits as a result, of course, with the potential also of +getting longer lockouts should there be a fork, and that\'s the risk we +take, and whether that\'s a better risk for the customer overall, I +mean, I guess I can\'t say, although thus far in the past year and a +half or two years, it hasn\'t seemed to be a problem, you know +tentatively but at any rate that strategy that I just described will be +worse after this change like it will mean that that strategy will well +actually I guess I\'ll have to think about it. I believe that it will +reduce the benefit of that because I\'m sorry go ahead we see this kind +of change direction of being implemented in other protocols most notably +is too they have an exponential drop-off in rewards on late attestations +so I don\'t see any last reason against implementing timely vote credits +either there\'s a bit more nuanced to the by the Solana Labs code Works +which might in some cases be counter-intuitive to how bft should +resolve. I do hope that we get this proposal through maybe I don\'t know +timelines. Okay, so we\'ve reached time, thank you all for coming today +if we have further discussion the two Some Ds I\'ll post them again in +the chat are simd 16 and some d33 so zans was 33 which is the timely +vote credits and then 16 is what GM Galactus went over with the +application right fees we can continue discussion there also if you want +to do something more ad hoc there\'s the core technology channel in +Salon attack Discord but thank you everyone for coming today. diff --git a/agenda/agenda_4.md b/agenda/agenda_4.md index f0bbe1b..a55b7ac 100644 --- a/agenda/agenda_4.md +++ b/agenda/agenda_4.md @@ -1,8 +1,259 @@ -**Meeting Info** -- March 17, 2023 19:00 UTC -- Duration: 30 minutes -- Zoom: To be shared in the #core-community-call channel on Solana Tech Discord +# **[AGENDA 4 [Core Community Call - March 17, 2023 - Epoch Rewards V2](https://www.youtube.com/watch?v=IbviAInuSHk&list=PLilwLeBwGuK7e_mH_sFwTytYQxalh7xd5&index=4&t=2s)]{.underline}** -**Agenda** +- Time: March 17.2023 -- [SIMD: Partitioned Epoch Reward Distribution](https://github.com/solana-foundation/solana-improvement-documents/pull/15) +- Link: + > [[https://www.youtube.com/watch?v=IbviAInuSHk&list=PLilwLeBwGuK7e_mH_sFwTytYQxalh7xd5&index=4&t=2s]{.underline}](https://www.youtube.com/watch?v=IbviAInuSHk&list=PLilwLeBwGuK7e_mH_sFwTytYQxalh7xd5&index=4&t=2s) + +```{=html} + +``` +- **Speakers profiles** + +```{=html} + +``` +- Haoran Yi (Works for Solana Labs) + +```{=html} + +``` +- **Key concepts mentioned:** + +```{=html} + +``` +- SIMD 15 Epoch Rewards V2 + > [[https://github.com/solana-foundation/...]{.underline}](https://github.com/solana-foundation/%E2%80%A6) + +Okay, welcome everyone to today\'s core Community call. We have on the +agenda going through the epoch VT rewards by Howardum. That is, I\'ll +let you take it away. It\'s incepted Anthony. Did you want to say +anything first or do you want to just skip ahead? Let\'s go, let\'s go. +Go ahead. + +Okay. I'll show my screen. Okay, alright, we can see everything good. +Okay, so good afternoon, everyone. My name is Haran, I work for Solana +labs, and in this talk, I\'m going to give a brief introduction about +the new epoch rewarder proposal that we are discussing on the following +link as a CMD pull request. + +So here is some background information. We are experiencing a very long +block time at the epoch boundary, and the following table shows an +example of the timing for the blocktime around the epoch 400. We can see +that the highest number around for 90 percent of the validators are +mainnet for this particular block along the epoch boundary. It will take +around 38 seconds. + +Aaron, I believe you\'re not showing a presentation. You\'re just +showing your code right now. Okay, we see Visual Studio codeum instead +of seeing the PDF. Okay, let me reshare it again. How about that? Can +you see it now? Sorry about that. + +So let\'s go back. So basically, we are having a very long Epoch time at +the epoch boundary, and at this long time is because of paying out the +rewards at the pocket boundary blocks. So the more stake accounts we +have, the longer it will take, so the current approach won\'t be +scalable. That\'s why we are proposing a change to pay out the epocket +rewards. So instead of paying out the rewards at just one block at the +poker boundary, the new proposal is to spread out the rewarded +distribution over multiple blocks, and the new approach was divided at +the block distribution into two phases. The first one is the reward +calculation phase that basically is calculated the rewards are going to +be paid out during the epoch. After the reward calculation phase, then +there is going to be another phase called the rewarded distribution +phase in this phase actual reward will be credited to the stake +accounts. + +So as I mentioned earlier reward calculation is just computed the +rewards to be distributed and based on the timing we have seen before, +so we think that 40 seconds may be a good time for the reward +calculation period given that each block is 400 milliseconds so we +estimate that it\'s going to take a thousand blocks for the reward +calculation result to be available, and the reward calculation is done +by a background Services data services started Epoch boundary and the +last for a thousand blocks to compute the rewarded results for +validators which are faster they may finish the computation before a +thousand block height if that\'s the case then the validator would need +to Cache The result and make it available until the Thousand blocks and +for some reason for some validator which are slow then they will have to +wait at the thousand block height for the result to be available before +they can enter into the next phase a reward distribution phase. + +So following the reward calculation is a reward distribution similarly +rewarded distribution were happens over M blocks and the two minimize +the impact of all the block and other transaction processing during the +rewarding period so we are targeting to reward basically earning 64. +accounts for every tick since each block has 64 ticks so that will give +us about 4K Total Rewards that can be distributed in one block and the +reward distribution what happens in the blog before any transaction +processing, and now since the reward with the new approach, the rewards +are not going to be distributed at one particular block, instead, it\'s +going to spread out between multiple blocks so in order to track the +progress of a rewarded distribution. There will be a two-system account +that are added to help to track the rewarded distribution. The first one +is the epoch rewarded history and then it is basically a 500 fixed-size +array to be consistent with the stake with the stake history we choose +that number to be 512, and it\'s a fixed-size array, and each entry in +the array contains three elements, three fields. The first one is the +total rewarding land port for the epoch, and the second one is how much +reward has already been distributed in import that\'s the progress, and +the other one, the last one, is the hash of all the rewards that are +going to be paid out. This is introduced to verify the rewarded +distribution. It\'s similar to the role of account hash, but it\'s more, +it\'s just like a more specific for rewards, it\'s only harsh from all +the rewards that are going to be paid out. + +The second Israel account to introduce is called Epoch reward Reserve. +As I mentioned earlier in rewarded distribution, for each block all the +rewards are going to be distributed in M blocks, so we introduced the +second set of systems. This is basically MCS verse that keeps tracks of +all the rewards that are going to be distributed in one block similarly, +it has a balance field that describes how much rewards are going to be +distributed for this particular block, and it also has a hash that\'s +the hash of all the rewards are going to be distributed in this block, +and since we have M system accounts, the address of the one particular +system icon for a particular block is basically determined by harsh the +base ID to the block height then we will get the unique address for the +rewards going to be distributed for again. + +As we\'ve seen earlier here, this is the rule hash, it\'s computed by +accumulating all the hashes from all the reserves together so after we +credit all the reserves credit order rewards from the reserves, we will +compare the hash against the rule hash, make sure they match each other +so that we can be sure that the reward distribution is correct. + +That\'s the main changes of the proposal, and there are a few +differences with the new reward scheme than the older one, and the first +one is that we were restricted the stake account access during rewarding +period, that means any withdrawal merge split stick or any statement +manipulation have to wait until the rewards finished, if any +transactions that involve those operations were submitted during the +reward paying out period they will get a transaction error, we will +introduce a new transaction error for it, let\'s lock the reward icon +during epoch reward, that\'s the first impact, the second impact is that +since now the rewards are going to be paid out in multiple blocks, there +will be changes for snapshot and the cluster restart during those reward +periods, so to accommodate this reward distribution over multiple blocks +so for any snapshot that is taken during the reward period we will +include a new field in a snapshot that\'s stores to store the reward +calculation result that\'s how much rewards are going to be distributed, +and when the cluster restarts from a snapshot taken during the reward +period, it will have to load the result and resume the rewarded +distribution process as if it is was going on before it restarted, yeah, +that\'s the overview of all the new proposals for the new rewards and +the more details can be found on this link as the pull request here, + +yeah, let\'s go into Q&A sessions if you have any. Well, if anybody +cannot talk, feel free to raise your hand, and I will give access to +your question, yeah, thank you for presenting this, um, I do remember a +brief discussion + +About this form in the Discord maybe three or four or five months ago, +and I do remember asking this question, so I\'m sorry if it\'s being +reproduced here, but there are some dashboards that read rewards to be +able to, you know, provide information to end-users about rewards across +the entirety of Solana, and up to this point, it\'s fairly easy to do, +you just look at the first block of the first successful block of an +Epoch, and you get all the rewards there. How would a mechanism like +that work in this new system? + +Would you have to be constantly watching for rewards blocks and sort of, +you know, building up that information incrementally, or will there be +some point where it\'ll be obvious that it\'s done and you can do one +query to an RPC server to get all the information? But if you just care +about the total number of rewards, I think you can still do it at the +first block of the Epoch boundary. I can address a little bit of this, +um, since the rewards slot or blocks are gonna be paid out, are you +gonna be deterministic, you\'re actually going to be able to generate a +query based on which um stake account or validator you want to know the +rewards for, so we can actually make RPC queries that function the same, +it will need some extension but it\'ll be slightly different from what +we have today per validator and have to make like 2,000 queries or +something, is that what you\'re saying? + +I mean, you can just query all of the blocks in that case, like if you +need to do a large range of rewards you just get the entire payout or +distribution block range for the slots, they\'re basically, I gotta get +blocks with a limit call with whatever the um duration of the +distribution has been reached when all the distribution is completed, +and then make that call, is that what you\'re suggesting? Yeah, today, +right, yeah, yeah, you know, it\'s, it\'s over by the thousandth block +after the epoch, so you can just make the call right there instead, +right, so today, where you just know that it\'s, you know, the first +block in the epoch, now it\'s going to be, you know, the thousandth +block, or however many blocks before, I don\'t think any of this is +fully parametrized yet, okay, and so well, so someone would have to have +sort of code that understands that to be able to compute what that block +would be, or there\'ll be an RPC call to ask that at some point, it +like, is that how you anticipate it going? + +Yeah, basically, it\'ll depend on how much complexity there ends up +being in the final thing. I don\'t anticipate it being that difficult, +this could probably be like an RPC helper versus an actual RPC endpoint, +but I think it\'ll be fairly trivial to put together a way to +equivalently get, I mean admittedly, it\'s going to be a little heavier +because you have to get multiple blocks, right, but okay, and I do have +another question, but I don\'t want to monopolize this, so I\'ll wait to +see if other people have a question first, all right, next question on +the answer is was Jeff, so go ahead, Jeffy should be able to unmute to +ask your question. + +Just wanted to ask if there\'s a cluster we started in the first +thousand blocks, is there any change to change that procedure, because +the snapshot won\'t have the result in it, yeah, we have a talk about +this, so in that case when the snapshot is taken during the reward +computation phase, the snapshot has to wait, it has to wait until the +result is available and store the result there, okay, zans, you\'re +welcome to ask your question, okay, so, is it the case that validators +will only validate a block if the validator, the leader producing that +block has the appropriate amount of either computation work or rewards +pair at work in that block, or is it the case that a validator can just +sort of like decide not to do it on their block and be like, I\'ve +modified the code to not do this because I don\'t want to burn the CPU +on this and it doesn\'t give me any rewards, so I\'m just not going to +do it, or is that even a possibility, or is it the case the cluster +won\'t even accept a block unless it includes these details because +they\'re expected for I mean they\'re sort of built into the protocol +that this has to happen in this order. Because the order is +deterministic for the current Epoch, there will be some randomness +across different epochs, so if the leader. For example it doesn\'t +matter whether he\'s a leader or not, in fact, if the validator doesn\'t +include the expected set of rewards in the block, I think the harsh were +mismatched, and his block will be rejected, okay, so this part of the +block does go into the hash, so that validators have to agree on it to +vote on it, so I guess that answers my question, and we just kind of +accept that the first and leaders kind of have to do his work without +any compensation, that\'s just kind of like built into the protocol, +there\'s no, you know, need to add any compensation for doing this work, +it\'s not expected to impact the transactions that could be fit into a +block, like it doesn\'t take up extra block space or extra compute time +that could prevent a validator from including as many transactions as it +would, otherwise it\'s the same problem as we have right now with only +one, whatever that first leader or like whatever that first block is, +kind of gets screwed on including transactions because everyone is +Computing this, so you\'d have this is just spreading out the +computation to make it more feasible. + +I was just wondering if there\'s an opportunity to sort of like make +that even more addressed by sort of paying a validator for having to do +this, just putting it out there, um, it samples a thousand slots, right, +so it kind of spreads the workout out of everyone, you\'d be kind of +paying everyone equally, which means there\'s no up except the dudes +that are unfortunate and only get one block in the first you know about, +well whatever it takes, like that\'s cool, all right, is there any other +questions about this, the epoch reward changes, okay, cool, so this is a +Cindy still, it\'s a I put the link in the chat for everyone to add to +the discussion later, um, 7015, um, you\'re welcome to discuss it for +the next rest of the time on this call, I\'d like to open the floor for +anybody that has just general Q&A questions, just feel free to raise +your hand and I\'ll give you access to speak and you can ask your +question, if there\'s nobody with questions, we can end this early, so +I\'ll give it a few more moments, did you guys have a, you know, +third-year anniversary party or anything like that? + +Thank you, everyone, for joining today, um, there will be a space +afterward that if anybody has any questions or that they don\'t ask +here, welcome to join, so be on Twitter, you can see it on Solana and +underscore devs, and we\'ll happy to chat there as well, thank you, guys +today. diff --git a/agenda/agenda_5.md b/agenda/agenda_5.md index ecc783b..93aebd5 100644 --- a/agenda/agenda_5.md +++ b/agenda/agenda_5.md @@ -1,9 +1,326 @@ -Meeting Info +# **[AGENDA 5]{.underline}** [**[Core Community Call - April 21, 2023 -]{.underline}** **[QUIC, Syscall for Restart Slots]{.underline}**](https://www.youtube.com/watch?v=AEnkivbha0k&list=PLilwLeBwGuK7e_mH_sFwTytYQxalh7xd5&index=6) -April 17, 2023 18:00 UTC -Duration: 30 minutes -Zoom: To be shared in the #core-community-call channel on Solana Tech Discord -Agenda +- Time: April 21, 2023 -- Retroactive proposal for QUIC connection handling in TPU -- [SIMD-0047](https://github.com/solana-foundation/solana-improvement-documents/pull/47) Syscall to get the last restart slot \ No newline at end of file +- Link Video: + > [[https://www.youtube.com/watch?v=AEnkivbha0k&list=PLilwLeBwGuK7e_mH_sFwTytYQxalh7xd5&index=6]{.underline}](https://www.youtube.com/watch?v=AEnkivbha0k&list=PLilwLeBwGuK7e_mH_sFwTytYQxalh7xd5&index=6) + +```{=html} + +``` +- **Speakers profiles** + +```{=html} + +``` +- Max from Mango: + > [[https://twitter.com/m_schneider]{.underline}](https://twitter.com/m_schneider) + +```{=html} + +``` +- **Key concepts mentioned:** + +```{=html} + +``` +- **Syscall to get the last restart slot + > [[https://github.com/solana-foundation/]{.underline}](https://github.com/solana-foundation/).. + > .** + +- **QUIC Tuning Parameters** + +Welcome to this month\'s core Community Call. I have posted the agenda +in the chat. So, Max from Mango will be presenting on the Simds 46th or +on the new Cisfar for the last vote. Last voted on the slot, and then +he\'ll also want to talk about how quickly the different tuning +parameters were chosen for Quick. And Max just joined. I will promote +you as a panelist. Go ahead, Max. Sorry for being late. + +So, the floor is yours, Max. You can go ahead and get started with +whichever one you want to start with---either the quick tuning +parameters or your CMD that you\'ve been working on. For the quick +tuning parameters, the thing that was, like, my request, which I was +hoping for, was that we have someone who worked on it. You know, lead +the discussion. I don\'t know if someone is here that worked on it, and +I can at least give the outside perspective of reverse engineering some +of those things. But, you know, reverse engineering requirements is very +difficult sometimes. And that was the main request, that we get someone +who knows all the requirements, maybe to present them. If there\'s no +one here, then maybe we put it for next time. + +Could you talk a little bit about how y\'all chose the tuning for Quick? +I can talk a little bit. We had a somewhat specific performance idea in +mind. We wanted it to be in that 50 to 100K TPS (transactions per +second) range in the front end or, like, at most at 100K TPS. So, we +tried to split the streams that way and allow enough streams for each +client to be able to fit within that. So our approach was mainly from +benchmarking in the test environment and seeing what bandwidth we could +obtain under a somewhat optimal connection---like the worst-case TPS in +the real validator---and trying to calibrate that to what we thought the +back end behind Quick could sustain or handle. Also, we had to consider +keeping the memory within something that was feasible for a normal +validator. Obviously, if you have more streams, you need to have more +memory to hold all the packets that are kind of in flight and +potentially reconstruct a transaction that might cross multiple packets +and things. + +So, the more outstanding streams you allow, the more memory you need to +frame all those incoming streams. You have one limit we found, and we +were curious about it. I think it\'s like 2,000 transactions every 50 +milliseconds, like a leaky bucket model. That would mean that\'s just to +verify, concentrate. I mean, I think we were talking about it in +Discord, right? That\'s not like a hard limit. The 50 milliseconds is +kind of a target for that stage. It\'s spending time in +\'instigverify,\' and things are coming in behind it, right into the +channel, to get backed up. So, we want to make sure that we\'re pinging +the channel pretty frequently, like within 100 milliseconds, to ensure +that the channel can\'t back up faster than the incoming packet flow. + +So, I mean, it doesn\'t have a hard requirement. Like, if you have a +machine that can basically verify 10,000 transactions in 50 +milliseconds, it will clear all of those, and it will bring up the new +batch, and it will verify it. But it\'s just a way to keep the queue +from doing an exponential fill up. Right, if you have a large batch and +you let it fill in behind you, and that\'s even larger than the next +batch, then you\'re just, you know, like an unbounded condition, and +that\'s not good for the memory in the Valter. So, there are a couple of +different solutions to that, but, you know, we didn\'t like the +boundedChannel solution because you can\'t really have good visibility +into the channel. You can\'t really drop intelligently inside the +channel. So, we felt like it was better to just pull everything out of +the channel, and we have that kind of, you know, the db random thing for +like if we\'re getting an extreme amount of packets that we really +don\'t have time to look at really anything in the packet list at all. +But if we\'re in a less extreme scenario where the machine can handle +the flow, you know, we do like a round-robin between senders, kind of +dropping, but that algorithm can\'t handle some cases where we have to +handle the case of a really wide pipe coming in and a very small amount +of compute and then a very narrow pipe coming in a large amount of +compute, right? So, all those variables can be different between across +machines. So, trying to handle any kind of configuration, I guess, makes +sense. I think this limit is a bit surprising, right? Because, at least +for us, it was a bit surprising. + +There\'s another one I think it\'s the receive window size that we +discussed a few times, and I think the idea that was, like, you have +basically based on the stake, different receive window sizes, which +limits the number of parallel streams there can be per connection. It\'s +like eight connections per identity, and then I think there\'s some, +depending on the stake, each connection can have a certain number of +streams. The eight connections are just to kind of, if you had clients +behind a router or you had a race condition where you got disconnected +and you needed to reconnect, you might have, you know, some connections +overlapping, so just not to kick you out immediately if you had a stale +connection in the connection pool. + +But we\'re using the streams to throttle based on stake, and then, of +course, a budget for unstaked as well, and those are subject to +tweaking, I think. We wanted to roll out the first version and then see, +kind of, you know, monitor the metrics and then update them potentially +as we see, you know, the use in the validator. I don\'t think they\'re, +I mean, they seem to have been fairly reasonable defaults but open to +tweaking, I think. For the just a question for any of the people from +the fire dancer team, I know that you all have been working on it quick. +Did y\'all choose similar like received window parameters in tuning, or +have you all not gotten to that part yet? I don\'t think Nick is on this +call; he\'d be the best person to answer it. Let me go see if I can find +Nick. Okay, thank you. I think he\'s on vacation today. Oh, then I +probably won\'t be finding Nick. I don\'t have the answer to that off +the top of my head. Okay, maybe probably we probably should sync as we +go further into the implementation of like what are the different +parameters that we have seen because in the future I think as Steven, as +you said, it was a good default to start with, but what is better +performance in the long run would be nice. So a question that charges +real fast about the state-based variables, right? So, Morse, you have +more stake generally gets more bandwidth and more resources, and you +know, say of transactions into the value, so that\'s the idea, which is +really cool and like a novel, it\'s like a Solana Innovation, you know, +you\'ve seen that before. As Antipsy, did you have a question on this? I +just wanted to expand a little bit on that. Is it because the larger +stake is assumed to have larger pipes, or is it because larger stake +actually sort of does more in some way that requires different +variables? And the only reason I ask is that if it\'s because larger +stake is assumed as a proxy for larger pipes, then maybe it would be +better to have larger pipes actually be something that one could specify +on the command line or something. You could say how much bandwidth you +expect to be able to utilize, and then smaller validators can do more if +they\'re able to, and larger validators could do less if they\'re, you +know, it doesn\'t if that makes sense. Did I come through there? It\'s +not necessarily a hardware thing, although more higher state validators +should have more, you know, hardware, more resources, I guess, and more +hardware to handle a higher load---10 gigabits, but maybe JP has 100 +gigabits or something, so I don\'t know if they may have more or less +stake than I have. So I was just wondering if you know if that\'s stake +is assumed as a proxy for bandwidth or if it\'s something else that +causes these variables to need to be different. Standard proxy, it\'s +just that stake is really the only, you know, civil-resistant identifier +that we have in the network right to determine how much us resources and +things that you should be in control of essentially, like how much block +space right should you have. I mean, I would say, you know, those +overall bandwidths could be like a scaling like if the validator had +customized hooks for scaling the overall numbers right, you could do +that right. So you would have like let\'s say the value has you know +100,000 connections now or outstanding streams maybe if you had a 10 GB +or 100 GB you would want to make it like you know 10,000 or something +but you would still like to distribute those streams across the stake. + +That\'s kind of what I\'m saying. If there\'s any opportunity for a +protocol to custom-tune their numbers in ways that better match their +actual configuration, regardless of their stake, that\'s kind of what I +was getting at. There isn\'t today in terms of a command line flag or +anything, but that potentially could be added if it seemed to be a +limiter or helpful. + +In the interest of time, we probably should move on to this MD that you +wanted to talk to, Max. So, if you want to go ahead and chat about that, +we can get it started there. + +\"Okay, let\'s open this file. This is the newest one we wanted to +propose. Basically, the motivation here is, on a high level, what do D5 +protocols do when the cluster restarts? I think some protocols have been +built in with certain slot limits for orders or things like that. But +especially in lending protocols, there\'s a lot of \'first-come, +first-serve.\' Basically, when the network starts up, we need to assess +that not necessarily all RPC nodes are already at the right state. Like, +that\'s something we\'ve seen before, is that sometimes a lot of things +will not work on a coordinated restart. So the idea here is, well, right +now, it is just a little bit random and uncontrolled how things behave. +We can expose to the application developer that there was a controlled +restart. So this is currently the last hard time for Hard Fork on the +bank in the reference client. So this is basically just a very simple +system variable that allows you to access that data and then implement +custom logic. Maybe you want to lock down liquidations for another 100 +slots until the oracles have time to update, etc. Could be that the +chain was in a very different state or the prices of certain assets were +in a very different state before the restart occurred. So this is kind +of like just more programmability. The proposal goes into detail what +exactly we want to expose, and I think there was some feedback already +that this would also be a good mechanism to maybe actually expose other +things. So just curious, you know, before we touch it and create an +implementation, if there are other data needed than just the slot, it +would be good to add it now to the proposal. So we\'re kind of on the +one pass over it, and it\'s everything we need.\" + +\"I think that\'s a question in the chat for you, Max. The question is +if the protocol allowed validators to stop the slot timestamp at the +current time on the restart instead of having to catch up gradually, +would that be another solution?\" + +\"Oh, so I did some analysis on how the timestamp works right now in the +last restart. I\'m trying to find it, but I\'ll post the data later. But +basically, you have the restart slot, and then usually it\'s a few more +slots to go, and only once those votes come in, actually, the cluster +time gets updated. So I think around three slots after restart, you\'ll +actually have a correct cluster time or a somewhat updated plus the +time. So suddenly, you\'ll have jumps from before the restart to after +the restart. But these are the things that are kind of like can cause +actual issues in protocols, right? A lot of times, the cluster time is +supposed to move fairly connected to each other, but in those moments, +it doesn\'t. There\'s a real disconnect there for a few slots, and then +transactions usually start piling in around that time as well, as the +cluster time starts moving again due to user transactions, not both +transactions.\" + +\"You know, I think this is just one of the effects that we\'re seeing, +right? So that\'s the one you\'re describing. This is like one of the +symptoms that we can circumvent with this measure. But I think, in +general, this information is a little bit more rich. Right now, I think +there were some ideas about lending protocols blocking liquidations and +allowing people to deposit more collateral, kind of like a margin call +scenario where you have at least two minutes to top up your balance +because there\'s a freaking cluster restart. You know, it\'s not +happening every day, and it\'s maybe not the user\'s fault that they +couldn\'t manage their position. I think that was like one of the main +concerns why we wanted to be as configurable as possible and as exposed +as possible to the application developer, rather than just fixing just +one particular issue about cluster time.\" + +\"Okay, cool. So if anybody has any questions, they can voice them now. +If not, there\'s also a link to the PR for the Sunday in the chat, so +that if you can\'t get to it here, we can do the discussion on the Cindy +as well. I\'ll ask some time for anybody that has any questions for this +excellent idea from proper runtime developments.\" + +\"Two things: one is terminology. I think it\'s also written in the +comments that restarts, slots, and halves are two different things. So a +hard fork would be a scenario in which a part of the cluster on purpose +tries to diverge from the rest and make up their own cluster +essentially, which is probably not meant to be. The other thing is that +we are trying to move away from Cisco specifically, and we will be using +or completely replacing them by fit and programs in the future for all +the things which actually do compute anything. But in this case, this is +just a lookup of some global value without any computation behind it. So +it should probably go into a system variable, and I can\'t say what the +JP scene is going to do, but in our implementation, we feed all the +switches with system variables anyway. So that would only be like a +trivial identity function, which is kind of useless in that sense. Just +saying, probably easier to start off with this voice and think about +what else you want to put in there. Anyone has proposals for what else +to put in?\" + +\"Well, here, I think if no one has any other proposals, we can end +here. We can continue the discussion on the Cindy, though. You have a +question? Go ahead.\" + +\"Is there ever a value in knowing more than the last restart? Like, say +there were two restarts within, I don\'t know, two hours because there +was some problem that recurred. Would that be useful to know, or is it +always only the last restart that is useful to know? In other words, +does it need to be a single value, or do you want it to be some like a +set of n values that end most recent restarts?\" + +\"I think I would rather see what adoption looks like on the single +value and then go from there. Having a new feature, a historic value, I +find it hard to reason about historic values. Well, but I mean, in +particular, you\'re identifying a use case that you believe covers +something that\'s important to you. + +I\'m asking you if you had two restarts that happened within four hours, +would that change what you would want to know or not? No, probably not, +but I think you want to get back to normal operation as quickly as +possible. And I think that the delays, probably within 100 slots for +most applications, so as far as the time frame is concerned, are just +very, very short. I wouldn\'t know what four hours ago, another restart +would change on the current situation, right? So, of course, +unfortunately, okay, thank you. That doesn\'t answer my question, thank +you. + +I had a follow-up for Alex. You mentioned that the current hard Fork +slot is currently synonymous for a restart, but there are other ways to +trigger a restart, but there\'s no consensus stage shared about those +resources, right? Or is there another way to get those? It\'s a kind of +the other problem, the security issue. How are you gonna reach consensus +about the researchers, right? But I was really talking just about the +terminology that a hard Fork means that you are diverging on purpose +because part of the cluster wants to do something else, and otherwise, +it\'s just a fork like any other fork that will get removed at some +point. It\'s a slow block, right? + +Basically, so I think the differentiation between a slow block and the +restart is probably not so\... on an application, it doesn\'t really +make a difference, right? If you have a one-hour small block or you have +a hard fork, it\'s probably the same. If it took like an hour to +organize that or 24 hours, so what I mean, there\'s no like terminology +called this or anything, but if people, for example, say like Bitcoin +and Bitcoin Gold, these are hard forks, well, like what they did one +there back in the day because the community actually switches to an +entirely different protocol version in the restart slide. And this is +not\... I mean, this is obviously depends on the exact reset scenario, +but if you are just relearning the same version again, always only minor +black faces and all that, we are on the same thing, then this is not +really hard for because it\'s not\... you\'re not off from anything +else. + +But like what people understand on that term, a hard fork means that +then there are two blockages and two sets of validators and two +networks, essentially, which do a different phase. And I mean, usually, +there is that tries to avoid that scenario of hard forking and instead +come back to one consensus of global networking. + +All right, interesting. Coming in sometime because we are one minute +overtime. Let\'s bring this discussion further into the CMD. I will post +it again in the chat, and then we can further the discussion on this +specific MD for 47 with Max and Alexander and more. But thank you all +for coming on today on this month\'s core Community call. Thanks for +watching us, Jacob. diff --git a/agenda/agenda_6.md b/agenda/agenda_6.md index 6fd4bab..800cbe2 100644 --- a/agenda/agenda_6.md +++ b/agenda/agenda_6.md @@ -1,8 +1,337 @@ -**Meeting Info** -- May 19, 2023 18:00 UTC -- Duration: 30 minutes -- Zoom: To be shared in the #core-community-call channel on Solana Tech Discord +# **[AGENDA 6 [Core Community Call - May 19, 2023 - Optimistic Cluster Restart Automation](https://www.youtube.com/watch?v=GA5AVg_svj8&list=PLilwLeBwGuK7e_mH_sFwTytYQxalh7xd5&index=5)]{.underline}** -**Agenda** +- Time: May 19, 2023 -- [SIMD-0046](https://github.com/solana-foundation/solana-improvement-documents/pull/46) Optimistic cluster restart automation +- Link Video: + > [[https://www.youtube.com/watch?v=GA5AVg_svj8&list=PLilwLeBwGuK7e_mH_sFwTytYQxalh7xd5&index=5]{.underline}](https://www.youtube.com/watch?v=GA5AVg_svj8&list=PLilwLeBwGuK7e_mH_sFwTytYQxalh7xd5&index=5) + +```{=html} + +``` +- **Speakers profiles** + +> **Will:** Mentioned at the beginning of the text, Will is going to +> talk about the upcoming release changes. +> +> **Wen:** Wen is part of Solana Labs since last September and is +> working on consensus. + +- **Key concepts mentioned:** + +```{=html} + +``` +- **Optimistic Cluster Restart Automation -** + > [[https://github.com/solana-foundation/]{.underline}](https://github.com/solana-foundation/).. + > . + +Welcome, everyone, to this month\'s core Community call. There are a +couple of things on the agenda. So, the first thing that we\'re going to +hear from Will, before you, Will, will be about the upcoming release +changes that they have or that they\'re pushing. And then, we\'ll hear +from Wen on 7046. + +Before I get started, I wanted to first call out that if you ever want +to see the agenda for the meeting, you can always go to the core +Community call repo, which I just linked in the chat. And then, there +are two PRs that would be or are getting close to consensus on. So, +it\'d be great if people could find time to give their review on them. +The first one is PR #33, which is the timely vote credits. And then, the +previous one, which we actually already merged in, but would love to +have any extra eyes on it, would be PR #15, which is partition Epoch +Rewards. + +Go ahead, well, you can go and see your spiel real quick. Thanks. So, as +you probably all know, the upgrade of mainnet to 11417 is underway. We +asked for 25 at the beginning of this week, and we\'re currently at 27. +Thank you; we appreciate the overachievement there. Everything\'s +looking good. Our plan is to ask for general adoption at the start of +next week. So you can anticipate that request in case anyone hasn\'t +seen it yet. I would encourage you to read the outage report from the +February outage. At the time, it was widely believed that the outage was +caused by the 114 upgrade. That was definitely not the case. I\'ll drop +a link to that report in just a moment in the chat here. It\'s also +linked in the MB announcements channel on Discord. + +We also have some audit reports that we\'ll be publishing later today +related to 114. Feel free to peruse those. There\'s nothing scary in +them. If there were, we wouldn\'t be shipping the release. But you know, +I might give you some peace of mind and some interesting reading. So, +thanks to everyone who\'s already upgraded, and looking forward to +getting the rest of the questions on 114 soon. + +All right, Wen, you\'re up. Oh, okay, let\'s share my slides first, and +I can present my slide show and share screens. Sorry, what I should +explain\... Oh my, so hello, and to those who I haven\'t met yet, I\'m +Wen, joined Solana Labs last September. And personally, I\'m working on +consensus, and I\'m interested in high-performance and high-reliability +system design. + +So, how come I don\'t see my slides to share? Oops, no, sorry. Why +don\'t I have my slides to share? I have it previously, and let me do it +again. Sorry for wasting time. I don\'t see my slides to share. I can +share my whole screen. Maybe that\'ll work. Okay, do you see the slide +now? Yep. + +Motivation Requirements. Okay, sounds good. So, motivation and +requirements. We are interested in making Solana more reliable, of +course. So, for high reliability, there are many things you can do. +First of all, you can write very high-quality code. Then you have fewer +outages. But no code is ever perfect, so you need testing to improve it. +But testing couldn\'t catch all problems, so you need monitoring to tell +you what problems you have and when. + +Monitoring tells you there is a big outage; you kind of need outage +handling to get the cluster back to the same state. So, there are a lot +of efforts inside Solana to improve code quality, testing, monitoring, +and other stuff, and auditing and formal verification. And today, the +proposal is only about outage handling. + +So, first of all, let\'s look at how we handle outages now. When I\'m +talking about outages, I\'m talking about the last outage where the +whole cluster just couldn\'t make progress anymore, not making progress. +And somehow, what we needed to do is, we need to restart validators, +keep them saying a start state to start with so the cluster can continue +functioning again. So, this is called cluster restart, which I have a +link here. It\'s very different from sporadic single validator restart, +which happens all the time and that doesn\'t impact reliability at all +normally because you still have a lot of validators functioning. + +So, what we do now, first of all, we would try to find the highest +optimistically confirmed block. Optimistically confirmed means you have +a block which got the votes from the majority of the validators. We use +two-thirds here. So when the block is optimistically confirmed, ideally, +we shouldn\'t roll it back because it may contain user transactions. If +you roll it back, it will have economic impacts, very big economic +impacts. + +So the whole design goal in this proposal is to try not to roll back +optimistically confirmed blocks if possible. So, in reality, today, we +also try to do the same. But since today we don\'t really have an +autonomous process to do this, today we use what\'s called social +consensus. So when there is an outage and people gather in the Discord +Channel, they would first confirm, \"Yes, there is an outage. All the +validators are not making progress.\" Then, they\'ll try to decide to go +for a restart because the cluster doesn\'t seem to be recovering. Then, +they\'ll try to see where to restart. + +So, we need to have one block which everyone restarts from. And to make +sure we\'re not rolling back user transactions, we need to find this +highest optimistically confirmed block where we agree we will start. And +what we do now is we would normally do a consensus or like voting in the +Discord channel, say, \"My local confirmed block is X, and what do you +see?\" So if most people vote for X, then go with X. So that\'s how we +do it now. + +And after that, after we decide which block to start from, the validator +monitors with operators would stop the validator, and sometimes if +there\'s a bug that could lead to an outage again, then we might install +a new binary. But that doesn\'t happen very often. And next, because we +decided we need to start from this block, so everyone needs to have the +same block. So you would create a snapshot with a hard fork at this +block we decided on. And if you don\'t have that block locally, you +would download it from a trusted source. + +And after that, you would restart validators with the following +arguments. Two arguments: one is \"wait for a supermajority to add slot +X,\" and the next is \"I think the bank hash at this slot is blah.\" So +this makes sure you restart with the correct snapshot, you have the +correct block and correct hash. And then, you would wait for 80 percent +of the people to reach the same state as you. Then the whole cluster +begins to function. You start to make new blocks again, and you start to +vote, and all things go to normal. + +So, there are a lot of problems with this current restart process. Maybe +the biggest problem is that it takes a long time. The whole cluster +restart takes about\... it could take about several hours, which would +make your reliability not very good. And so, in the future, we might +have other efforts to make it faster. + +Today, we\'re focused on solving a small problem, which is improving the +process of finding the highest confirmed block. It\'s quite challenging +for Han to manually review the votes of two thousand to three thousand +validators to determine the block. Thus, we aim to develop a protocol +that enables machines to automatically find the highest confirmed block +without Han\'s intervention. Here are the design goals: + +Avoid cross-part negatives: If a block was confirmed before the restart, +we should not discard it. Doing so would have severe consequences. + +Allow false positives: It\'s acceptable if some blocks are not confirmed +before the restart but are mistakenly considered confirmed. + +Let\'s consider a scenario with a 67% threshold. If a slot receives 66% +of the votes and no competing blocks exist, it\'s fine for 80% of the +validators in the restart to decide to start from there, even if it\'s +not yet confirmed. Confirming more is acceptable, but rolling back +confirmed blocks is not allowed. We\'ll prioritize avoiding false +negatives in all design choices, and false positives might be tolerated. + +The proposed approach involves incorporating a silent repair phase where +validators negotiate amongst themselves to determine the block they +should restart from. During this phase, no new blocks will be created, +and the aim is to converge quickly. All validators should stick to their +votes from before the restart; no changes are allowed. + +To achieve this, we\'ll use gossip to exchange most information during +the repair phase. To prevent interference between validators who have +restarted and those who haven\'t, we\'ll use a new shred version to form +separate gossip groups. Two new gossip messages will be sent: one +containing last-voted work slots and another to ensure everyone shares +the same metadata. This will ensure that all validators have the same +data and metadata by the end of the silent repair phase, allowing them +to make unanimous decisions and start from the same block. + +So, last-voted Fork slots is where we share the last vote before the +restart because the validators might be on different slots. Some people +vote faster, some people vote slower, so just one last vote slot is +normally not enough. So we would send nine hours of slots on the same +Fork so that people can\'t get the whole Fork, and people can get a +better view of the metadata. Also, after you repaired everything and you +have all the metadata, then you send out your new vote. We don\'t call +this vote to distinguish between a normal vote and this vote, so here we +call it the Fork, which is actually just a vote. So you send out after +all the repairs are done, and you have automated data where you think we +should restart from. And also, you also send out how many heaviest Forks +you received from other people because we need to decide when we would +exit this silent repair phase after we see that 80 percent of the peers +received this heaviest fourth message from 80 to people. We check that +and everyone agrees on the same block, same slot, and hash. If yes, then +we exit this phase and proceed to a real restart. So we do what we +currently do; otherwise, if anything happens that makes you count, +proceed, just stop, print all debugging information can intervene or +maybe switch back to the old restart method. How much time do I have? +I\'m probably a little bit slow. + +We have 11 minutes for both finishing questions, but we can see how far +we can get. Okay, that\'s fine. + +So you are welcome to read the slide. So, I\'m just going to generally +introduce the silent repair phase, and then we can proceed to questions. +So first of all, when you restart a validator with this new arg, +immediately we send the last voted pork slots, which is what you voted +last, and all the slots on this fork. And after that, everyone +aggregates the last voted work slots from all the other restarted +validators, and you could start repairing a slot if you think this slot +could potentially have been up and confirmed before the restart. And we +could draw a line somewhere to say these are the candidates who could +have been confirmed before the slot, and the other blocks I don\'t care. +So you repair all the blocks you care about, and after we repair all of +them, you aggregate all the last votes and the last voted work slots and +choose your hub habits for it. So I think we are at 20 minutes now. I +don\'t know whether I\'ve introduced the new method enough so everyone +has a good graph, but we could see if anyone has any questions at this +point. + +I can proceed if no one has questions. Does anybody have any questions +currently on the current approach, or should we just continue? Alright, +go ahead and continue then. + +When? Okay, so\... Exit the Silent Repair Phase. To exit a silent repair +phase, I think the most important thing in audit handling is to make +sure everyone is on the same page. Otherwise, if you think everyone\'s +on the same page, you single-handedly enter the restart and start making +new blocks, while everyone still is people repairing blocks. That would +be disastrous, and another audit might be happening. So, here we are +very careful when we exit the silent repair phase. We would count +whether enough people are ready for action. + +So the current check is whether 80% of the validators receive the +heaviest Fork response from 80% of people. So, we cut 80 here because +this is a current line we draw when we do a restart. We wait for 80% to +join the restart, and then we proceed. And also, even if we see that 80% +of people respond, we also linger for maybe two minutes because a gossip +message propagation takes time. So, linger for two minutes so that my +heaviest fork which contains how many responses I got\... Hence, +everyone can perform security checks. + +Check one is: Did everyone agree on the same block, which means the same +slot and the same hash? Also, whether there\'s a local optimistically +confirmed block before the restart, right? So because my local confirmed +block means before the restart, I saw two-thirds of the votes on this +block already. So, you would imagine this block should be on the +selected Fork. It should be the ancestor or it should be the selected +block. If it\'s not the case, then the security check fails because +something\'s wrong. My local confirmed block is getting rolled back, +then you would also exit and hold and wait for an inspection. And if +every check succeeds, then perform the current restart logic. We +probably will click clear the gossip CRDs table so that we don\'t carry +the old restart messages into the new environment, and then we would +automatically start snapshot creation. + +In contrast to what we\'re doing today, where we manually ask people to +manually do it using Ledger, here we might start earlier once we find +out everyone agrees on the same block. Then we exit and execute the same +logic we are doing now in this new discussion. + +So I have a few links, one is a Google doc. I started this proposal just +starting, so it contains many details and a lot of designs we rejected +and why. Because I\'m currently modifying both the SM Med draft and this +Google doc, so this Google doc might be outdated, and some design +choices might be different if that\'s the case. SMED is the newest +proposal, and the actual SMID draft is here as well. + +I also have some description here. Let me know if you have questions. Do +you have questions? Can you hear me? Yes, we can hear you. Yep, so +actually, I\'m from the Mango team, and we are currently implementing +SMD 47, which is like the last three-star slots. So, I\'m wondering if +this silent repair stage should be considered also a restart slot. +Sorry, I haven\'t checked that doc in detail, so I think your proposal, +correct me if I\'m wrong, your proposal is to expose the last restarted +slot somehow through Ledger, right? Exactly, so here I think it\'s +orthogonal, but once the silent repair phase is over, and we know where +we are restarting from, we could also connect to your code and expose +this information somewhere, right? Oh, exactly. So, when it\'s in the +silent repair phase, we don\'t know for sure it\'s a restart. They are +negotiating, but they don\'t know whether they can decide on the same +block. So in that case, I don\'t think we will expose anything, and once +that phase is over and we know we are really entering the restart, then +we can expose that restart slot. Did that answer the question? Okay, it +makes sense. There shouldn\'t actually be any changes to that proposal +because we\'re going, once we\'ve committed to restarting at the +coordinated restart slot, there\'s going to be a hard Fork anyway, and +we\'ll\... So that\'ll update this as far. Okay, that\'s cool. So when +it\'s in the silent repair phase, we don\'t know for sure it\'s a +restart. They are negotiating, but they don\'t know whether they can +decide on the same block. So in that case, I don\'t think we will expose +anything, and once that phase is over and we know we are really entering +the restart, then we can expose that restart slot. Did that answer the +question? Okay, it makes sense. There shouldn\'t actually be any changes +to that proposal because we\'re going once we\'ve committed to +restarting at the coordinated restart slot, there\'s going to be a hard +Fork anyway, and we\'ll\... So that\'ll update this as far. Okay, +that\'s cool. So that works nicely together. + +And any other questions? We have a question. I\... I muted you. Say yes, +so I just\... Can you hear me? Okay, yep, okay, sure. So as a validator +operator, I want to say it\'s very encouraging to see how much really +good thought design seems to have gone into what you\'re proposing here. +I predict that what would happen during another restart event is that +there would still be hiccups that will get alerting, and there\'ll be a +lot of confusion and talk. So, and if we\'re kind of racing some +automated process, just keep in mind that you may want controls there, +as certainly messaging from the validator to sort of let us be very +aware of what\'s going on so that we can make decisions about oh, you +know, because if someone believes that there\'s a bug or a reason that +the restart shouldn\'t proceed, allowing something to run away with a +whole cluster restart that we want to pause. Just having controls and +informational messages to help us understand what\'s happening are very +important. I just want to emphasize that, that\'s all. + +Yes, I think maybe later there will be\... I would try to get more +feedback from The Operators because this really impacts how you operate +during an outage, right? So it helps to get more feedback there. So I +think first of all, I totally agree, and the current approach is opt-in. +If you don\'t restart your battery data with that flag, nothing would +happen. We would keep the current approach if you feel more comfortable +with that, and also, of course, the outage handling is mostly to assist +people; it\'s not to replace people. + +And we\'ll also, of course, give you ways to inspect what\'s happening +inside and ways to decide, \"No, this automatic restart is not working; +I should do something else." That\'s totally doable. It will be all +command-line controlled. Does that answer the question? Yes, it did. If +there\'s anybody else that wants to ask further questions on this in +SMID, I posted in the chat, we can take the discussion there. And thank +you all for joining another Core Community call. You all have a good +month. Thanks. diff --git a/agenda/agenda_7.md b/agenda/agenda_7.md index fd6f5e5..7a95e3e 100644 --- a/agenda/agenda_7.md +++ b/agenda/agenda_7.md @@ -1,8 +1,376 @@ -**Meeting Info** -- June 16, 2023 18:00 UTC -- Duration: 30 minutes -- Zoom: To be shared in the #core-community-call channel on Solana Tech Discord +# **[GENDA 7 [Core Community Call - June 16, 2023 - Light Clients](https://www.youtube.com/watch?v=9m_M8zEw1cE&list=PLilwLeBwGuK7e_mH_sFwTytYQxalh7xd5&index=7)]{.underline}** -**Agenda** +- Time: June 16, 2023 -- [SIMD 0052](https://github.com/solana-foundation/solana-improvement-documents/pull/52) +- Video: + > [[https://www.youtube.com/watch?v=9m_M8zEw1cE&list=PLilwLeBwGuK7e_mH_sFwTytYQxalh7xd5&index=7]{.underline}](https://www.youtube.com/watch?v=9m_M8zEw1cE&list=PLilwLeBwGuK7e_mH_sFwTytYQxalh7xd5&index=7) + +- **Speakers profiles** + +```{=html} + +``` +- [[https://www.tinydancer.io/]{.underline}](https://www.tinydancer.io/) + > (Tiny Dance) + +```{=html} + +``` +- **Key concepts mentioned:** + +```{=html} + +``` +- SIMD-0052: Add Transaction Proof and Block Merkle for Light + > Clients - + > [[https://github.com/solana-foundation/...]{.underline}](https://github.com/solana-foundation/%E2%80%A6) + +Welcome, everyone, to this month\'s Core Community call. Today, we have +for discussion light clients, and we have the Tiny Dancer team, Anoushk +and Hirsch, talking about different ways that they have been thinking +about implementing it. The agenda can be found, as usual, on the Core +Community call repository, and the specific Cindy is 52, which I have +just added to the chat. Anoushk, you\'re welcome to take it away. + +Thanks for the intro, Jacob. Let me just share my screen. Thanks, +everyone, for joining in and giving your time. I\'m Anoushk from the +Tiny Dance team, and we\'ve been working on implementing the flight plan +for Solana. It\'s draft 7052 for adding transaction proof verification. +So today, we\'re going to give a brief overview of our research. + +So, I\'d like to start with why we\'re doing this, the motivation behind +assembly is primarily to implement light plans for Solana. The reason we +need light plans is that users need to verify queries that they make to +the RPC, and right now, they cannot do that, which is why they have to +trust the RPC that they give them the correct, you know, these like +lines need to be low Hardware pieces of software, and they need to be +able to run on a phone or browser. So, this is really important for +security for the blockchain Network, as we already know, more mature +networks like Ethereum that have been there for longer already have +tight lines, and this was a glaring problem in Solana. So, the crux of +this MD is adding something called transaction proof. So, on a high +level, it\'s just a multi-proof saying that from your transaction that +you sent, was included in the block and it succeeded or failed, however, +it should have. It did like awe the change that we\'re making here +requires adding requires some way of making sure that a particular +signature and a status, whether it\'s success or failure, was included +in that transaction in the block. The user needs to be able to verify +the inclusion of the transaction and the execution status. We would also +add an RPC method that would allow anyone to just call that method and +get its proof for the particular transaction. So, as I mentioned, for +the reason that we need this is that RPCs could give you incorrect +information that the transaction actually succeeded or it was included +in the blog, but it actually did and that\'s an attacker. You can still +verify this data if you have a snapshot, but obviously snapshots are, +you know, 30, 40 gigabits. + +And that\'s a lot of data for the empty user. Also, relating to a +different simile about stakewade attestations, you can actually use +transaction proofs to verify that a particular value that has a certain +stake off-chain, as mentioned in the SPV proposal, a validator could +technically use reduced transaction proofs to use them as a checkpoint +to verify a certain state and as part of a different proposal that uses +SPV, it can also be used for interchange verification. Here\'s a list of +important resources that you could take a look at. This is the CMD that +we wrote, DSP proposal. There\'s an open issue that talks about adding +statuses to the bank cache, and that\'s the interchange and SBP +proposal. So, this is the part that we really there\'s been a lot of +discussion on and we think that there needs to be more con-like more +discussion on from different parts of the code of the community on +different ways to implement it in our CMD right now. + +We are really implementing the first way, which is modifying the block +has to include statuses, but over the last week after discussing with +some members of the jp team and other members of Surround Labs, we also +found different ways to implement it to tackle different issues that may +arise while implementing the first method. So, I\'m gonna dive into each +of these very beautifully. So, modifying the block hash is pretty +straightforward and again part of the history proposal. So, all it does +is you add the transaction status along with the transaction signature +into the Merkle tree of the transactions, which is part of each entry, +and that gets hashed into that gets mobilized into the block hash and +becomes part of the bank cache. Currently, the bank cache is a +sequential hash of all the entries, but making it a virtuality would be +better in terms of verification size, and obviously having the status is +important for verifying if the transaction actually succeeded. Here\'s a +quick overview of the pros and cons. So, the pros being that on the +client side, the verification is less computationally heavy, so that\'s +good for low Hardware devices, and it\'s part of the core consensus +protocol, which means that all validators have to write it, you know, +overall or with. The downside of this being that it\'s quite a major +string general require-like feature flag activation, which means that +the round trip from implementing this to actually going live on Mainnet +would be quite long, and also there is a computational overhead of +memorializing entries that was initially taken into account. + +Moving on to the second method, which is adding a separate transaction +tree, which would basically be a tree of all the receipts of each +transaction. This would be part of the bank hash, so basically just a +tree or a monkey tree of all these interaction signatures and statuses, +and would just be hashed into the bank cache. The pros of this are +mainly that it doesn\'t come the way of planting seeders, so a leader +could basically just create the entries, create the block, propagate the +block, and then asynchronously update this data bank cache with the +statuses, so they don\'t need to have it before propagating the block. +Again, this is also a major change, so comes with the challenges of the +first way to implement it, and so this is part of the second method, +which is we were told that implementing changing the bank hash would +somehow come in the way of backless leaders. So if we implement this, +which basically have the state of block n in block n plus 10, you +actually don\'t need to execute the transactions of log n to get the +statuses and then know you have to compute the bank cache, so you could +just do that in an async way. But this also would be a pretty major +change protocol, and that is also something that needs to be heavily +considered if you\'re deciding to go over this. The last one is probably +the most flexible and easy to work with because we are not really +changing any core part of the validator like the consensus. So we are +just using gossip or could be a different network. + +We are even to asynchronously push commercial commitment of all the +transaction receipts, and this could be done, let\'s say every 10 blocks +or so, so you know, it\'s not even doing it every block. And you could +then have validators that pull this change, website against their state +and then push another station saying that hey this is this checks out +with the data that I have now or likely I could just you know pull that +data pull those attestations and just verify if X percent of the stake +has actually confirmed that this is sticker attached and that their +transaction was included in that. The pros of this are that there\'s no +overhead to block production because it\'s done asynchronously and is +not part of the bank cache. There\'s a low risk of liveness failures +because it\'s not part of the core consensus protocol, and it\'s also +easier to implement. + +\"Because it\'s once cause it and not consensus, the only downside would +be that it\'s not making the code protocol, so it would be optional to +implement, and the validators aren\'t really forced to make those +attestations or commitments. So this is where we end this presentation +and want to hear more about the thoughts of the code community and open +to any critics or questions. Thank you, Richie. + +Hey, so not to say thanks a lot for working on this. I think this is a +really nice initiative, and it seems like a rather easy win complexity +and wise for enabling this functionality. I would just like to voice my +preference for going the gossip route or any related method of doing it +this way. And while we don\'t modify the core data structures to support +this feature, I think my main problem with modifying the proof of +History hash would be that it\'s quite a breaking change of the +definition. What the proof of History has currently is where it would +basically go from committing to the block data contents of the current +block and all blocks before that to going to committing to the state +changes that this block induces. And this would matter for fire dancers, +for example, because we might use the POA hash to identify the chain +that fire dancer and Salonlabs are currently on. And let\'s say, for +example, we have a temporary mismatch in the runtime where we derive a +slightly different state on both clients. By redefining the proof of +History hash to potentially differ on both clients, I think it would be +much harder to tolerate such runtime mismatches or even detect them, as +that would basically stop validators from synchronizing entirely rather +than continuing replay with a slightly different state. + +I think going the like Fast delay route doesn\'t seem too elegant to me +because that would basically introduce minimal latency, finally like the +client, where they would need to equip these, you know, n block speed 10 +or so. I think regardless of which way we choose, maybe I thought of +splitting up the proposal into a few separate parts, and then it would +be much easier to vote on each one specifically because it feels like if +we try to incorporate this entire feature into one, there\'s going to be +a bit of discussion on it for quite a while. + +So the first one that I thought of would be just agreeing on how we +actually compute the commitment for the transaction statuses only. So, +you know, given a vector of transaction statuses, what goes into the +hash? Do we just commit the transaction result code, or do we also hash +logs in some way? And then, you know, define what the actual root of the +transaction statuses is. And then we can do a separate sim that says +here\'s how you would propagate it over gossip. And then we might go +back and say, well, in hindsight, gossip was a bad idea. So here\'s +another sim deal of how we would propagate the transaction statuses over +whatever other protocol. And one other way would be of cross-posting it +on-chain itself, which, of course, has the benefit that this kind of +incentivizes validators to actually participate in this transaction +status calculation. Whereas, if I wanted to be mean, I could say, hey, I +don\'t want to implement this feature and fire that, so I\'m not just +gonna propagate it over gossip, which would, I think, decrease the +quality of service. I think we kind of want to incentivize validators to +participate in this feature. Again, thank you so much. I\'d love to hear +your thoughts and on these, and I\'m also gonna post the same feedback +on assembly itself. + +Likewise, I also respect your interest in this, indeed. I think that +gossip is definitely a favorable route because it allows us to test a +lot of the client UX of the electronic, and because it\'s actually +simpler and less of a breaking change, it\'s easier to revert back if we +messed up. If we think that\'s not the right way, then doing that from, +you know, making a change to the bank hash and then like, you know, +thinking that okay, this is not the right way and going back to go +outside. + +I think that regarding like validators being incentivized to participate +in the network, that\'s something that, again, eventually be figured out +and might even be another Sunday. And I think we can, I\'m leading more +towards what Richard said, we\'re going with gossip. So I totally agree +regarding the incentives. I\'d be also curious whether Solana Labs has +any thoughts on this proposal, but would be really cool to see progress +moving on this rather soon. + +I just want to say, first of all, this is a great effort. It\'s gonna +have a lot, I think, to the community. One of the issues I saw regarding +the transaction status is we\'ve generally been moving away from +detailed transaction statuses more into very broad categories of +failure. And the reason for that is, for example, originally we even had +the runtime errors as part of the consensus, luckily that\'s not the +case anymore. So there\'s now only one case for the entire runtime in +the consensus. And the reason why we moved away from that is that the +success cases are already complex, but the failure modes are so much +more complex, and it really narrows down the implementation you can have +to get the exact precise error response of every transaction rise. And +that makes it almost impossible to change anything or to re-implement it +any other way. So just be careful about the transaction results, don\'t +include too much error state or logging into them, otherwise, we will +all be stuck with exactly one implementation. So we really just want to +include like success or failure. There was a suggestion to include like +transaction logs, but we can actually avoid that by just re-executing +the transaction on the client side by fetching the inputs. So we just +want successful failure. I heard some feedback suggesting that we also +add logs. I think logs are pretty scary because right now the truncation +of logs is not well-defined. But maybe that\'s actually an opportunity +to say that there\'s a recommendation to, for example, only do 256-byte +long log lines. The concern I had there was how this would affect +performance if we hash two or more sharp blocks for each program +execution or so. Would that limit the TPS in the future? I don\'t +remember who it was, I think it was Mango. + +\"I don\'t know. I think the only concern was like with the overhead +added by logs, but I think you mentioned that with fire dancers like +implementation of, I think that might not be a problem. And once you +have your Henry\'s go-ahead and start, no, I just wanted to clarify, +Richard, when you actually, for both Richard and Anushk, when you say +the gossip route, you mean that there are going to be no consensus +changes, right? Like, even the validators are voting on regarding the +state or whatever state is necessary would be in a separate smart +contract. It wouldn\'t be part of the block. Yes, correct. It wouldn\'t +even be in a smart contract. It\'s basically just a structure, okay? So +the contact info is a structure that every validator publishes regularly +onto gossip. As far as I know, there\'s already a snapshot hash and then +the accounts hash, so it seems like it would be fairly trivial to fit in +another hash there. So that\'s actually a bit more discussion around +whether we should have an entirely separate hash for this, because +usually if you\'re getting the transaction status commitments, you +probably also want the account sashes. And the problem with doing +separate fields for this is that you\'d basically have all validators +accessing them separately. I think if we fit them all into the same +contact info block, that might not be a problem. They also don\'t need +to be at the same frequency, right? How frequently is this contact info +updated or how frequently do validators publish it? The question for +Solana Labs, I think that\'s an implementation detail anyway, but I just +wanted to be clear that in the gossip route, there are no consensus +changes, right? So even the transaction status or anything, the block +structure doesn\'t really change. So Tiny Dancer won\'t be blocked in +any way on that, right? Yes, okay, just wanted to clarify that. Are +there any other questions for Anushk and Hirsch? Carol, I think you +mentioned that the Mango was\... They suggested using a bi-directional +connection. Max, I think you\'re here. Do you want to speak a little bit +more on that?\" + +\"I\'m sorry, could you repeat on which one should I speak? Corral +mentioned that y\'all suggested you use a bi-directional connection. I +believe this was originally like an experimental feature that y\'all are +implementing on the labs client, and it\'s like a client-specific. +Y\'all want to talk about whether it\'s relevant anymore.\" + +\"I can give some background for the people who are interested. So what +we did is when we run benchmarks on a local network, we\'ll enable a +patch on the TPU side to give us back, basically, a status that +summarizes what happened with the transaction in the scheduler. This is +very similar to, I would say, like request tracing environment that you +would see in a commercial microservice architecture deployed in a +private company. You send something in from a load balancer, right? You +want to trace for a limited amount of the requests in the network, why +aren\'t they getting scheduled right? So you get a per-request +measurement that is actually complete because right now, the +measurements we have are statistical and broad, and it\'s very hard to +say in particular which transaction, like we just see all five out of +100 transactions had this issue, right? But we don\'t know what\'s wrong +with these five transactions, why didn\'t they get into the block? +Right, what were the five transactions that hit the CU limit or what +were the five transactions that hit the 2000 packets per 10 per 100 +milliseconds on a quick connection, right? There are different limits in +a stack, and it\'s very hard to identify why a certain transaction +didn\'t pass. That was the original intention there. But I think +there\'s a lot of pushback against this kind of, I would say, +nice-to-have feature. We really enjoyed having it for local performance +testing because we get more insights, but that\'s it. I\'ve looked at +this a bit, and it wouldn\'t seem that hard to support and finance or +Solana labs, and I think it\'d also be pretty nice to have clients to +get explicit feedback about why the transaction might fail or not. And +it seems like, in financial kind of applications, that would seem like a +basic feature to have, especially if they\'re user-facing. Like if I go +in my wallet and send a transaction and that gets dropped somewhere +along the path, since this is an issue that would seem to directly +affect the user experience of our clients, I think it would make sense +to report that. But it requires a bit of plumbing to get that data back +to the networking layer because usually, at the point where you know +where your transaction gets dropped, you probably already anonymized the +traffic flows, where you don\'t have the IP addresses or quick +connections anymore.\" + +\"So I feel like this should be\... I don\'t see why we wouldn\'t move +to bi-directional connections, just have the ability to do this kind of +reverse flow feedback in the future. And then we could just maybe +publish this in the specifying what the protocol should be for reverse +flow feedback, and then, you know, I think as time goes on, we\'d see +more Solana Labs clients adopt this and finance the clients.\" + +\"What was the specific feedback why this wouldn\'t be a good idea?\" + +\"I think there\'s a couple of performance questions there, right? So I +think the main issue is that it kind of creates risk on the security, +DDoS protection side. So, oh, I think I know where this is going. So +currently, every transaction is a separate unidirectional streaming. +After you send it, it gets closed. So if there\'s reversal feedback, +maybe that can force the client to keep open these streams for longer by +just saying, \'Hey, I dropped these packets. Please send them to me +again.\' But that should be easy to fix by, for example, delivering this +feedback with datagrams or with, you know, a persistent stream. I still +don\'t really see why we shouldn\'t do it if the state is easily +available. I see Galactus from Mango Prices.\" + +\"So it\'s actually what we implemented is more like\... You have this, +like, with Solana client, we cannot really implement this bi-directional +stuff because once you get the\... you need that like connection, it\'s +dropped immediately after reading. So what we implemented is a separate +service where a validator can connect, and then it can say, like, a list +of transactions that it wants feedback for. And so actually, whatever is +executed for X reason is dropped, then we just send the, like, why it +was dropped actually. So it\'s more like a separate service. It\'s not +even in the same\... like, TPU client TPU server.\" + +\"I see. I mean, it would seem a bit cleaner to just deliver +acknowledgments\... So I also would, like, I would also like to have, +like, a bi-directional channel where you just send a transaction and get +feedback like why it was dropped. But, like, on the\... this validator +side, there are, like, a lot of limits, and the connection can be +dropped for, like, a lot of reasons. And it was quite impossible just to +keep the connection up. So we said, like, okay, forget it. We\'ll just +have a new separate service. But,of course, I agree. Like, we can have +this per-directional connection of Quick, and then we\'ll just get +feedback for our transactions. I think if you install a mechanism to +signal optional features in the Quick handshake itself, it should be +pretty easy to trial out features like this on a real cluster without +affecting reliability too much or without introducing breaking changes. +But it seems like it needs a bit more time on finding out what the +actual right protocol is for delivering this feedback. I agree because +Quick right now, like, we have\... We don\'t have, like, a lot of bits +remaining just to have, like, give accurate acknowledgment. I think +while closing or even in the acknowledgment packets, maybe we have to +find maybe another mechanism to get this acknowledgment. I agree with +you.\" + +\"We\'re a few minutes over time, so I think we\'ll have to do as well +to continue the discussion, probably open up the PR that\'s on this as +well as the discussion on a news the 7d52 earlier. But thank you all for +coming this month, and I\'ll see you all next month for the next Cindy +call. Reminder, if you have any agenda items, make sure you do a PR to +the next agenda so that we get it earlier rather than later, and people +can read any additional information beforehand so we have a better +discussion.\"