-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
instructions for scaling up #2
Comments
Actually we scale it slightly differently. We have multiple instances of this stack deployed. Each instance has it's own parameters committed to the repo. The stacks are named like MainnetParity-1, MainnetParity-2, etc. On top of this we run a serveless stack which keeps the list of URLs of the nodes in dynamodb. For each stack it calls It's been a while since I was planing to write it down as an article and open source that serveless component, so it will probably come soon. |
@kowalski thanks for sharing that, I'm looking forward to the article and the serverless component. i'm curious, is there a reason you didn't use/extend the aws blockchain template for ethereum? also, i'm thinking about how I can automate the whole sync-then-switch-instances workflow in cloudformation. I have a prototype (based on this repo), but it still requires a manual step: it syncs with c5.large, creates a snapshot, notifies me via email with the snapshot id, but then i still need to run UpdateStack manually to switch it to the smaller instance. I want the only manual thing to be creating the stack. Do you have an idea of how to accomplish that? Maybe have 2 instances as ECS tasks, the syncing one exits after its done (while the other waits), the post-sync one mounts the volume when it detects the other task is finished and takes over? |
I don't think there is a posibility to refer to external template and extend it. Please correct me if I'm wrong, but I don't think Cloud Formation supports it. As for "sync-then-switch-instances workflow" there has actually been some changes to how we work comparing to when I wrote down that article. Also, somewhere around release Summing up we don't do that much manual actions around this anymore. |
i wasn't being clear, i just meant take that template and adapt it thanks for explaining. For our scenario, we'll be giving the template to our customers to run under their own accounts, and would want to give them the option of using a smaller node post-sync, as they'll be they only ones using it. Most of them won't be going anywhere near 50 requests/s on json-rpc. |
In this case you could consider giving them the snapshot as well. It's easy to share snapshot between AWS accounts so for clients you can skip the initial sync altogether.
Now that I took a second look, I see that AWS used nested stacks and this is what prevented us from using it. Nested stacks is a great idea, but sadly you need to hold all your nested stack on S3 bucket and cannot refer to a template with |
i agree that the nested stacks make things more annoying. However, their stacks are only nested because they try to provide two deployment options: ECS vs "docker-local", the 2nd one being to run everything manually on one EC2 machine with docker. Once docker-local is removed, nesting becomes unnecessary and things can easily be collapsed into one stack. Unfortunately, we can't give a snapshot to our customers to use, as it would be a significant step down in security compared to them syncing themselves. We'll keep working towards an automatic switch-over solution :) |
@mvayngrib you can find WIP version of the article about scaling parity nodes in README of this repo: https://github.com/rumblefishdev/jsonrpc-proxy Please let me know what you think |
@kowalski cool, thanks! Sorry to bug you with more questions: when i update my MainnetParity service, e.g. after pushing a new image, i often get an error like this also, maybe i'm missing something, but with the use of EBS, if AWS at any point places two parity tasks on one machine, will it cause a problem with them both using the same volume? |
@mvayngrib yes we run into the same issue You can find question about this feature here: https://forums.aws.amazon.com/thread.jspa?threadID=284010 In the meantime, since we can only run 1 instance of the task per cluster we simply do the deployment by first changing DesiredCount in ECS to 0 and than reverting it back to 1. It's not ideal but good enough until we can use DAEMON service type. |
@kowalski i'm trying out a slightly different approach, a combination of yours and AWS's :) Happy to discuss/share approaches if you're interested. My goals:
At the moment, I'm not trying to manage a swarm of these stacks yet, like you and Infura. I have a working prototype, now trying to see how stable it is. Happy to share if you want to try it out |
Few comments:
Ok, so I'm not sure this is a good approach. Problem is that when you change parameter of Now that I think about it, I think it's better to completely ignore This allows you to completely drop the notion of snapshots. If you restart the node it just picks up when the previous node has left out. The Volume is not deleted when instance goes down - it just gets detached. The downside of above is that if data gets corrupted for any reason you have to start from scratch. You can mitigate this by adding periodically run lambda which will create a snapshot from your volume. Than in case of data corruption you would delete the whole stack and create a new one specyfing that the new volume is to be created from the last snapshot id. Anyway this is approach we use for other services which are less likely to corrupt their data (like Gitlab or Verdaccio).
sure, we use it for kovan nodes too; that's trival change
We also have indexers but run them on different ECS clusters and have them use the cluster of Parities not the single instance. But yeah, I don't think there is one single right choice of architecture.
I wasn't aware web3 HttpProvider can cope with basic auth. Good to know. |
That's true, though if i increase the timeout for signaling a healthy instance, and only signal after catching up from ChainSnapshotId, ASG will first wait for the new one to catch up, and only then remove the old one, so there should be no down time. Does that make sense?
true, but I like the comfort of having a snapshot :) Some bug in a new version of parity might corrupt the database one day and I don't want to resync from scratch.
right, i'm already doing this, though I use 2 volumes (1 per each of 2 AZs) and have to do a bit more work to map to the volume in the right AZ.
oh, that's not what i meant. I don't expose the json-rpc interface at all. I probably have a slightly different goal for my project. My use case is like this:
My indexer container exposes only the methods I need in my application. Some of those methods are proxies to json-rpc, and some are custom.
makes perfect sense, but might be out of my scope at the moment |
That's interesting idea. I haven't tried something like this before, it's hard to tell if it will were the way you've described. |
awesome template guys, thanks for taking the time to develop it!
i'm curious how this scales. Is it as simple as incrementing
DesiredTaskCount
? Is there a way to have one master node which does all the syncing and additional nodes that only perform json-rpc duties?The text was updated successfully, but these errors were encountered: